drm/xe/guc: Don't treat GuC generic CAT error as protocol error
authorMichal Wajdeczko <michal.wajdeczko@intel.com>
Tue, 5 Nov 2024 20:45:57 +0000 (21:45 +0100)
committerMichal Wajdeczko <michal.wajdeczko@intel.com>
Thu, 7 Nov 2024 16:38:13 +0000 (17:38 +0100)
GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any
context. We shouldn't treat that as G2H protocol error that would
justify a GT reset, as it may happen due to some VF activity.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105204557.1991-1-michal.wajdeczko@intel.com
drivers/gpu/drm/xe/xe_guc_fwif.h
drivers/gpu/drm/xe/xe_guc_submit.c

index 08ffe59..057153f 100644 (file)
@@ -17,6 +17,7 @@
 #define G2H_LEN_DW_TLB_INVALIDATE              3
 
 #define GUC_ID_MAX                     65535
+#define GUC_ID_UNKNOWN                 0xffffffff
 
 #define GUC_CONTEXT_DISABLE            0
 #define GUC_CONTEXT_ENABLE             1
index 37d4ad8..9e0f86f 100644 (file)
@@ -2021,6 +2021,15 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 
        guc_id = msg[0];
 
+       if (guc_id == GUC_ID_UNKNOWN) {
+               /*
+                * GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any PF/VF
+                * context. In such case only PF will be notified about that fault.
+                */
+               xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n");
+               return 0;
+       }
+
        q = g2h_exec_queue_lookup(guc, guc_id);
        if (unlikely(!q))
                return -EPROTO;