drm/amdgpu: Disable GPU reset on SRIOV before remove pci.
authorGavin Wan <Gavin.Wan@amd.com>
Wed, 26 Oct 2022 17:45:25 +0000 (13:45 -0400)
committerAlex Deucher <alexander.deucher@amd.com>
Tue, 1 Nov 2022 15:44:49 +0000 (11:44 -0400)
The recent change brought a bug on SRIOV envrionment. It caused
unloading amdgpu failed on Guest VM. The reason is that the VF
FLR was requested while unloading amdgpu driver, but the VF FLR
of SRIOV sequence is wrong while removing PCI device.

For SRIOV, the guest driver should not trigger the whole XGMI hive
to do the reset. Host driver control how the device been reset.

Fixes: f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

index 3c9fecd..bf2d50c 100644 (file)
@@ -2201,7 +2201,8 @@ amdgpu_pci_remove(struct pci_dev *pdev)
                pm_runtime_forbid(dev->dev);
        }
 
-       if (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2)) {
+       if (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2) &&
+           !amdgpu_sriov_vf(adev)) {
                bool need_to_reset_gpu = false;
 
                if (adev->gmc.xgmi.num_physical_nodes > 1) {