drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
authorshaoyunl <shaoyun.liu@amd.com>
Sun, 14 Nov 2021 17:38:18 +0000 (12:38 -0500)
committerAlex Deucher <alexander.deucher@amd.com>
Thu, 18 Nov 2021 04:04:57 +0000 (23:04 -0500)
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 003ba6a..93e33dd 100644 (file)
@@ -1226,6 +1226,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
        bool hanging;
 
        dqm_lock(dqm);
+       if (!dqm->sched_running) {
+               dqm_unlock(dqm);
+               return 0;
+       }
+
        if (!dqm->is_hws_hang)
                unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
        hanging = dqm->is_hws_hang || dqm->is_resetting;