accel/ivpu: Trigger device recovery on engine reset/resume failure
authorKarol Wachowski <karol.wachowski@intel.com>
Wed, 28 May 2025 15:42:53 +0000 (17:42 +0200)
committerJacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Thu, 5 Jun 2025 12:36:56 +0000 (14:36 +0200)
commita47e36dc5d90dc664cac87304c17d50f1595d634
tree26c6eefe19a20edac9f36b30d5d21dd0cebeffd3
parent98d3f772ca7d6822bdfc8c960f5f909574db97c9
accel/ivpu: Trigger device recovery on engine reset/resume failure

Trigger full device recovery when the driver fails to restore device state
via engine reset and resume operations. This is necessary because, even if
submissions from a faulty context are blocked, the NPU may still process
previously submitted faulty jobs if the engine reset fails to abort them.
Such jobs can continue to generate faults and occupy device resources.
When engine reset is ineffective, the only way to recover is to perform
a full device recovery.

Fixes: dad945c27a42 ("accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250528154253.500556-1-jacek.lawrynowicz@linux.intel.com
drivers/accel/ivpu/ivpu_job.c
drivers/accel/ivpu/ivpu_jsm_msg.c