Matthew Brost [Thu, 31 Oct 2024 18:22:57 +0000 (11:22 -0700)]
drm/xe: Restore system memory GGTT mappings
GGTT mappings reside on the device and this state is lost during suspend
/ d3cold thus this state must be restored resume regardless if the BO is
in system memory or VRAM.
v2:
- Unnecessary parentheses around bo->placements[0] (Checkpatch)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031182257.2949579-1-matthew.brost@intel.com
Thomas Hellström [Thu, 31 Oct 2024 15:37:32 +0000 (16:37 +0100)]
drm/xe: Don't unnecessarily invoke the OOM killer on multiple binds
Multiple single-ioctl binds can be split up into multiple bind
ioctls, reducing the memory required to hold the bind array.
So rather than allowing the OOM killer to be invoked, return
-ENOMEM or -ENOBUFS to user-space to take corrective action.
v2:
- Add __GFP_NOWARN to __GFP_RETRY_MAYFAIL allocations to avoid
spamming the kernel log if a recoverable allocation fails.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2701
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031153732.164995-3-thomas.hellstrom@linux.intel.com
Thomas Hellström [Thu, 31 Oct 2024 15:37:31 +0000 (16:37 +0100)]
drm/xe: Avoid the OOM killer on buffer object memory allocation
Rather than invoking the OOM killer on buffer object memory
allocations and validations, have the allocations fail and
pass the error to user-space if applicable.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2701
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031153732.164995-2-thomas.hellstrom@linux.intel.com
Nirmoy Das [Tue, 29 Oct 2024 12:01:17 +0000 (13:01 +0100)]
drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout
Flush the g2h worker explicitly if TLB timeout happens which is
observed on LNL and that points to the recent scheduling issue with
E-cores on LNL.
This is similar to the recent fix:
commit
e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is E core
scheduling fix.
v2: Add platform check(Himal)
v3: Remove gfx platform check as the issue related to cpu
platform(John)
Use the common WA macro(John) and print when the flush
resolves timeout(Matt B)
v4: Remove the resolves log and do the flush before taking
pending_lock(Matt A)
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-3-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Nirmoy Das [Tue, 29 Oct 2024 12:01:16 +0000 (13:01 +0100)]
drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout
Flush xe ordered_wq in case of ufence timeout which is observed
on LNL and that points to recent scheduling issue with E-cores.
This is similar to the recent fix:
commit
e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is a E-core
scheduling fix for LNL.
v2: Add platform check(Himal)
s/__flush_workqueue/flush_workqueue(Jani)
v3: Remove gfx platform check as the issue related to cpu
platform(John)
v4: Use the Common macro(John) and print when the flush resolves
timeout(Matt B)
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-2-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Nirmoy Das [Tue, 29 Oct 2024 12:01:15 +0000 (13:01 +0100)]
drm/xe: Move LNL scheduling WA to xe_device.h
Move LNL scheduling WA to xe_device.h so this can be used in other
places without needing keep the same comment about removal of this WA
in the future. The WA, which flushes work or workqueues, is now wrapped
in macros and can be reused wherever needed.
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
cc: stable@vger.kernel.org # v6.11+
Suggested-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-1-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Balasubramani Vivekanandan [Tue, 8 Oct 2024 07:36:28 +0000 (13:06 +0530)]
drm/xe: Use the filelist from drm for ccs_mode change
Drop the exclusive client count tracking and use the filelist from the
drm to track the active clients. This also ensures the clients created
internally by the driver won't block changing the ccs mode.
Fixes:
ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-3-balasubramani.vivekanandan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Balasubramani Vivekanandan [Tue, 8 Oct 2024 07:36:27 +0000 (13:06 +0530)]
drm/xe: Set mask bits for CCS_MODE register
CCS_MODE register requires setting mask bits from Xe2+ platforms. Set
the mask bits unconditionally, as those bits are unused for older
platforms.
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Tue, 29 Oct 2024 19:32:58 +0000 (12:32 -0700)]
drm/xe: Move Wa
1607983814 to oob
needs_wa_1607983814() predates wa_oob, so it was not being printed
in /sys/kernel/debug/dri/0/*/workarounds. Port it to OOB rules.
This makes the WA show up in debugfs. For TGL:
OOB Workarounds
1607983814
22012773006
1409600907
Eventually the RTP infra may add support for writing registers in a
loop, which would allow to keep track of the registers as well. But for
now, just listing it as OOB workaround is already an improvement.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029193258.749882-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Everest K.C. [Wed, 23 Oct 2024 23:33:55 +0000 (17:33 -0600)]
drm/xe/guc: Fix dereference before NULL check
The pointer list->list is dereferenced before the NULL check.
Fix this by moving the NULL check outside the for loop, so that
the check is performed before the dereferencing.
The list->list pointer cannot be NULL so this has no effect on runtime.
It's just a correctness issue.
This issue was reported by Coverity Scan.
https://scan7.scan.coverity.com/#/project-view/51525/11354?selectedIssue=
1600335
Fixes:
0f1fdf559225 ("drm/xe/guc: Save manual engine capture into capture list")
Signed-off-by: Everest K.C. <everestkc@everestkc.com.np>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023233356.5479-1-everestkc@everestkc.com.np
Matthew Brost [Fri, 25 Oct 2024 21:43:29 +0000 (14:43 -0700)]
drm/xe: Don't short circuit TDR on jobs not started
Short circuiting TDR on jobs not started is an optimization which is not
required. On LNL we are facing an issue where jobs do not get scheduled
by the GuC if it misses a GGTT page update. When this occurs let the TDR
fire, toggle the scheduling which may get the job unstuck, and print a
warning message. If the TDR fires twice on job that hasn't started,
timeout the job.
v2:
- Add warning message (Paulo)
- Add fixes tag (Paulo)
- Timeout job which hasn't started after TDR firing twice
v3:
- Include local change
v4:
- Short circuit check_timeout on job not started
- use warn level rather than notice (Paulo)
Fixes:
7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
Cc: stable@vger.kernel.org
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Matthew Brost [Wed, 23 Oct 2024 22:12:00 +0000 (15:12 -0700)]
drm/xe: Add mmio read before GGTT invalidate
On LNL without a mmio read before a GGTT invalidate the GuC can
incorrectly read the GGTT scratch page upon next access leading to jobs
not getting scheduled. A mmio read before a GGTT invalidate seems to fix
this. Since a GGTT invalidate is not a hot code path, blindly do a mmio
read before each GGTT invalidate.
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org
Fixes:
dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3164
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023221200.1797832-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Ashutosh Dixit [Tue, 29 Oct 2024 20:01:47 +0000 (13:01 -0700)]
Revert "drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs"
This reverts commit
55858fa7eb2f163f7aa34339fd3399ba4ff564c6.
'
55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist
regs")' was not properly reviewed and also causes dmesg asserts in
CI. Revert it.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295
Fixes:
55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029200147.1476513-1-ashutosh.dixit@intel.com
John Harrison [Thu, 24 Oct 2024 00:25:54 +0000 (17:25 -0700)]
drm/xe/guc: Separate full CTB content from guc_info debugfs
The guc_info debugfs file is meant to be a quick view of the current
software state of the GuC interface. Including the full CTB contents
makes the file as a whole much less human readable and is not
partiular useful in the general case. So don't pollute the info dump
with the full buffers. Instead, move those into a separate debugfs
entry that can be read when that information is actually required.
Also, improve the human readability by adding a few extra blank lines
to delimt the sections.
v2: Hide the internal capture/print params from external callers that
don't need to know (review feedback from Matthew Brost).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241024002554.1983101-3-John.C.Harrison@Intel.com
John Harrison [Thu, 24 Oct 2024 00:25:53 +0000 (17:25 -0700)]
drm/xe/guc: Capture all available bits of GuC timestamp
The extra bits are not hugely useful because the GuC log only uses
32bit time stamps. But they exist so might as well provide them.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241024002554.1983101-2-John.C.Harrison@Intel.com
Jonathan Cavitt [Wed, 23 Oct 2024 20:07:15 +0000 (20:07 +0000)]
drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs
Several OA registers and allowlist registers were missing from the
save/restore list for GuC and could be lost during an engine reset. Add
them to the list.
v2:
- Fix commit message (Umesh)
- Add missing closes (Ashutosh)
v3:
- Add missing fixes (Ashutosh)
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
Fixes:
dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: stable@vger.kernel.org # v6.11+
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023200716.82624-1-jonathan.cavitt@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:52 +0000 (13:03 -0700)]
drm/xe/oa: Allow only certain property changes from config
Whereas all properties can be specified during OA stream open, when the OA
stream is reconfigured only the config_id and syncs can be specified.
v2: Use separate function table for reconfig case (Jonathan)
Change bool function args to enum (Matt B)
v3: s/xe_oa_set_property_funcs/xe_oa_set_property_funcs_open/ (Jonathan)
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Suggested-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-8-ashutosh.dixit@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:51 +0000 (13:03 -0700)]
drm/xe/oa: Add syncs support to OA config ioctl
In addition to stream open, add xe_sync support to the OA config ioctl,
where it is even more useful. This allows e.g. Mesa to replay a workload
repeatedly on the GPU, each time with a different OA configuration, while
precisely controlling (at batch buffer granularity) the workload segment
for which a particular OA configuration is active, without introducing
stalls in the userspace pipeline.
v2: Emit OA config even when config id is same as previous, to ensure
consistent sync behavior (Jose)
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-7-ashutosh.dixit@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:50 +0000 (13:03 -0700)]
drm/xe/oa: Move functions up so they can be reused for config ioctl
No code changes, only code movement so that functions used during stream
open can be reused for the stream reconfiguration
ioctl (DRM_XE_OBSERVATION_IOCTL_CONFIG).
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-6-ashutosh.dixit@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:49 +0000 (13:03 -0700)]
drm/xe/oa: Signal output fences
Introduce 'struct xe_oa_fence' which includes the dma_fence used to signal
output fences in the xe_sync array. The fences are signaled
asynchronously. When there are no output fences to signal, the OA
configuration wait is synchronously re-introduced into the ioctl.
v2: Don't wait in the work, use callback + delayed work (Matt B)
Use a single, not a per-fence spinlock (Matt Brost)
v3: Move ofence alloc before job submission (Matt)
Assert, don't fail, from dma_fence_add_callback (Matt)
Additional dma_fence_get for dma_fence_wait (Matt)
Change dma_fence_wait to non-interruptible (Matt)
v4: Introduce last_fence to prevent uaf if stream is closed with
pending OA config jobs
v5: Remove oa_fence_lock, move spinlock back into xe_oa_fence to
prevent uaf
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-5-ashutosh.dixit@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:48 +0000 (13:03 -0700)]
drm/xe/oa: Add input fence dependencies
Add input fence dependencies which will make OA configuration wait till
these dependencies are met (till input fences signal).
v2: Change add_deps arg to xe_oa_submit_bb from bool to enum (Matt Brost)
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-4-ashutosh.dixit@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:47 +0000 (13:03 -0700)]
drm/xe/oa/uapi: Define and parse OA sync properties
Now that we have laid the groundwork, introduce OA sync properties in the
uapi and parse the input xe_sync array as is done elsewhere in the
driver. Also add DRM_XE_OA_CAPS_SYNCS bit in OA capabilities for userspace.
v2: Fix and document DRM_XE_SYNC_TYPE_USER_FENCE for OA (Matt B)
Add DRM_XE_OA_CAPS_SYNCS bit to OA capabilities (Jose)
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-3-ashutosh.dixit@intel.com
Ashutosh Dixit [Tue, 22 Oct 2024 20:03:46 +0000 (13:03 -0700)]
drm/xe/oa: Separate batch submission from waiting for completion
When we introduce xe_syncs, we don't wait for internal OA programming
batches to complete. That is, xe_syncs are signaled asynchronously. In
anticipation for this, separate out batch submission from waiting for
completion of those batches.
v2: Change return type of xe_oa_submit_bb to "struct dma_fence *" (Matt B)
v3: Retain init "int err = 0;" in xe_oa_submit_bb (Jose)
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-2-ashutosh.dixit@intel.com
Matthew Brost [Mon, 21 Oct 2024 17:57:05 +0000 (10:57 -0700)]
drm/xe: Mark GT work queue with WQ_MEM_RECLAIM
GT ordered work queue can be used to free memory via resets and fence
signaling thus we should allow this work queue to run during reclaim.
Mark with GT ordered work queue with WQ_MEM_RECLAIM appropriately.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021175705.1584521-5-matthew.brost@intel.com
Matthew Brost [Mon, 21 Oct 2024 17:57:04 +0000 (10:57 -0700)]
drm/xe: Mark G2H work queue with WQ_MEM_RECLAIM
G2H work queue can be used to free memory thus we should allow this work
queue to run during reclaim. Mark with G2H work queue with
WQ_MEM_RECLAIM appropriately.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021175705.1584521-4-matthew.brost@intel.com
Matthew Brost [Mon, 21 Oct 2024 17:57:03 +0000 (10:57 -0700)]
drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM
GGTT work queue is used to free memory thus we should allow this work
queue to run during reclaim. Mark with GGTT work queue with
WQ_MEM_RECLAIM appropriately.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021175705.1584521-3-matthew.brost@intel.com
Matthew Brost [Mon, 21 Oct 2024 17:35:12 +0000 (10:35 -0700)]
drm/xe: Take ref to job's fence in arm
Take ref to job's fence in arm rather than run job. This ref is owned by
the drm scheduler so it makes sense to take the ref before handing over
the job to the scheduler. Also removes an atomic from the run job path.
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021173512.1584248-1-matthew.brost@intel.com
Nirmoy Das [Tue, 22 Oct 2024 10:35:55 +0000 (12:35 +0200)]
drm/xe: Don't restart parallel queues multiple times on GT reset
In case of parallel submissions multiple GuC id will point to the
same exec queue and on GT reset such exec queues will get restarted
multiple times which is not desirable.
v2: don't use exec_queue_enabled() which could race,
do the same for xe_guc_submit_stop (Matt B)
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2295
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022103555.731557-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Michal Wajdeczko [Mon, 21 Oct 2024 20:15:06 +0000 (22:15 +0200)]
drm/xe/pf: Show VFs LMEM provisioning summary over debugfs
While we can already view individual VF LMEM provisioning using
lmem_quota debugfs attribute, we want to have unified way to show
summary across all VFs, like we do for GGTT or GuC doorbells/IDs.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021201506.1771-1-michal.wajdeczko@intel.com
Zhanjun Dong [Tue, 22 Oct 2024 01:01:16 +0000 (18:01 -0700)]
drm/xe/guc: Prevent GuC register capture running on VF
GuC based register capture is not supported by VF, thus prevent it
running on VF.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022010116.342240-2-zhanjun.dong@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fei Yang [Thu, 17 Oct 2024 16:27:10 +0000 (09:27 -0700)]
drm/xe: enable lite restore
The lite restore is a performance improvement feature which avoids
unnecessary context switch (flush, save and restore) if the incoming
context has a ContextID matching that of the outgoing context. The
scheduling is done by the GuC firmware, so on the driver side it's
just a matter of setting corresponding GUC_CTL_FEATURE flag.
This is supposed to be enabled by default, thus the flag is set
unconditionally.
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017162710.942553-2-fei.yang@intel.com
Matthew Brost [Fri, 18 Oct 2024 03:00:39 +0000 (20:00 -0700)]
drm/xe: Use __counted_by for flexible arrays
Good practice to use __counted_by in kernel coding for flexible arrays.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241018030039.1077842-1-matthew.brost@intel.com
Nirmoy Das [Wed, 16 Oct 2024 08:23:04 +0000 (10:23 +0200)]
drm/xe/ufence: Warn if mmget_not_zero() fails
This shouldn't happen but seen this while debugging ufence timeout
issue time to time so log it to isolate this particular case.
v2: s/XE_WARN_ON/drm_dbg(Maarten)
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241016082304.66009-3-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Nirmoy Das [Wed, 16 Oct 2024 08:23:03 +0000 (10:23 +0200)]
drm/xe/ufence: Prefetch ufence addr to catch bogus address
access_ok() only checks for addr overflow so also try to read the addr
to catch invalid addr sent from userspace.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241016082304.66009-2-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Shuicheng Lin [Thu, 17 Oct 2024 22:15:47 +0000 (22:15 +0000)]
drm/xe: Handle unreliable MMIO reads during forcewake
In some cases, when the driver attempts to read an MMIO register,
the hardware may return 0xFFFFFFFF. The current force wake path
code treats this as a valid response, as it only checks the BIT.
However, 0xFFFFFFFF should be considered an invalid value, indicating
a potential issue. To address this, we should add a log entry to
highlight this condition and return failure.
The force wake failure log level is changed from notice to err
to match the failure return value.
v2 (Matt Brost):
- set ret value (-EIO) to kick the error to upper layers
v3 (Rodrigo):
- add commit message for the log level promotion from notice to err
v4:
- update reviewed info
Suggested-by: Alex Zuo <alex.zuo@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017221547.1564029-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Vinay Belgaumkar [Tue, 15 Oct 2024 23:44:28 +0000 (16:44 -0700)]
drm/xe/ptl: Apply Wa_14022866841
As part of this WA, GuC will hold a forcewake for certain
MMIO accesses outside the GT/media domains.
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015234428.2004825-1-vinay.belgaumkar@intel.com
Badal Nilawar [Thu, 17 Oct 2024 11:14:10 +0000 (16:44 +0530)]
drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout
In case if g2h worker doesn't get opportunity to within specified
timeout delay then flush the g2h worker explicitly.
v2:
- Describe change in the comment and add TODO (Matt B/John H)
- Add xe_gt_warn on fence done after G2H flush (John H)
v3:
- Updated the comment with root cause
- Clean up xe_gt_warn message (John H)
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017111410.2553784-2-badal.nilawar@intel.com
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:56:01 +0000 (13:26 +0530)]
drm/xe: Change return type to void for xe_force_wake_put
There is no need to return an error from xe_force_wake_put(), as a
failure implicitly indicates that the domain failed to sleep.
v3
- Move kernel-doc to this patch (Badal)
v5
- change parameter to unsigned int in xe_force_wake_put()
v6
- Remove unneccsary wrapping (Michal)
- Remove non required header (Michal)
- Mention timeout(Michal)
v8
- Fix kernel-doc
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-27-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:56:00 +0000 (13:26 +0530)]
drm/xe: Ensure __must_check for xe_force_wake_get() return
Add __must_check attribute for xe_force_wake_get().
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-26-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:59 +0000 (13:25 +0530)]
drm/xe: forcewake debugfs open fails on xe_forcewake_get failure
A failure in xe_force_wake_get() no longer increments the domain's
refcount. Therefore, if xe_force_wake_get() fails during forcewake
debugfs open, return an error. This ensures there are no valid file
descriptors to close via forcewake debugfs, preventing refcount
mismanagement.
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int from xe_force_wake_get()
v6
- Use helper xe_force_wake_ref_has_domain()
to determine the status of the call.
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-25-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:58 +0000 (13:25 +0530)]
drm/xe/vram: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be escalated/considered as
probing error. It internally WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
v7
- Fix commit message
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-24-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:57 +0000 (13:25 +0530)]
drm/xe/query: Update handling of xe_force_wake_get return
With xe_force_wake_get() now returning the refcount-incremented
domain mask, a non-zero return value in the case of XE_FORCEWAKE_ALL
does not necessarily indicate success. Use xe_force_wake_ref_has_domain()
to determine the status of the call.
Modify the return handling of xe_force_wake_get() accordingly and
pass the return value to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
v6
- Use helper Use xe_force_wake_ref_has_domain()
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-23-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:56 +0000 (13:25 +0530)]
drm/xe/xe_reg_sr: Update handling of xe_force_wake_get return
With xe_force_wake_get() now returning the refcount-incremented
domain mask, a non-zero return value in the case of XE_FORCEWAKE_ALL does
not necessarily indicate success. Use xe_force_wake_ref_has_domain()
to determine the status of the call.
Modify the return handling of xe_force_wake_get() accordingly and
pass the return value to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
v6
- use helper xe_force_wake_ref_has_domain()
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-22-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:55 +0000 (13:25 +0530)]
drm/xe/gt_tlb_invalidation_ggtt: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int from xe_force_wake_get()
- remove redundant warns
v7
- Fix commit message
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-21-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:54 +0000 (13:25 +0530)]
drm/xe/pat: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
- don't use xe_assert() to report HW errors (Michal)
v5
- return unsigned int from xe_force_wake_get()
- remove redundant warns
v7
- Fix commit message
- Remove redundant header
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-20-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:53 +0000 (13:25 +0530)]
drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init()
With xe_force_wake_get() now returning the refcount-incremented domain
mask, a non-zero return value in the case of XE_FORCEWAKE_ALL does not
necessarily indicate success. use xe_force_wake_ref_has_domain ()
to determine the status of the call.
Modify the return handling of xe_force_wake_get() accordingly and pass
the return value to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
v6
- Use helper xe_force_wake_ref_has_domain()
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-19-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:52 +0000 (13:25 +0530)]
drm/xe/huc: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int from xe_force_wake_get()
v7
- Fix commit message
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-18-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:51 +0000 (13:25 +0530)]
drm/xe/guc: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Use helper xe_force_wake_ref_has_domain to
verify all domains are initialized or not. Update the return handling of
xe_force_wake_get() to reflect this behavior, and ensure that the return
value is passed as input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
- Remove redundant xe_gt_WARN_ON
v6
- use helper xe_force_wake_ref_has_domain()
v7
- Fix commit message
v9
- Rebase
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-17-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:50 +0000 (13:25 +0530)]
drm/xe/xe_gt_debugfs: Update handling of xe_force_wake_get return
With xe_force_wake_get() now returning the refcount-incremented
domain mask, a non-zero return value in the case of XE_FORCEWAKE_ALL does
not necessarily indicate success. Use xe_force_wake_ref_has_domain()
determine the status of the call.
Modify the return handling of xe_force_wake_get() accordingly and
pass the return value to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int for xe_force_wake_get()
v6
- use helper xe_force_wake_ref_has_domain()
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-16-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:49 +0000 (13:25 +0530)]
drm/xe/xe_drm_client: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Use helper xe_force_wake_ref_has_domain to
verify all domains are initialized or not. Update the return handling of
xe_force_wake_get() to reflect this behavior, and ensure that the return
value is passed as input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
v6
- use xe_force_wake_ref_has_domain()
v7
- Fix commit message
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-15-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:48 +0000 (13:25 +0530)]
drm/xe/mocs: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- don't use xe_assert() to report HW errors (Michal)
v5
- return unsigned int from xe_force_wake_get()
- Remove redundant warn
v7
- Fix commit message
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <Nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-14-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:47 +0000 (13:25 +0530)]
drm/xe/tests/mocs: Update xe_force_wake_get() return handling
With xe_force_wake_get() now returning the refcount-incremented domain
mask, a return value of 0 indicates failure for single domains.
Change assert condition to incorporate this change in return and
pass the return value to xe_force_wake_put()
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int for xe_force_wake_get()
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-13-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:46 +0000 (13:25 +0530)]
drm/xe/devcoredump: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Use helper xe_force_wake_ref_has_domain to
verify all domains are initialized or not. Update the return handling of
xe_force_wake_get() to reflect this behavior, and ensure that the return
value is passed as input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int for xe_force_wake_get()
v6
- use helper xe_force_wake_ref_has_domain()
v7
- Fix commit message
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-12-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:45 +0000 (13:25 +0530)]
drm/xe/xe_gt_idle: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v4
- Rebase fix
v5
- return unsigned int for xe_force_wake_get()
- Remove reudandant WARN calls.
v7
- Fix commit message
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-11-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:44 +0000 (13:25 +0530)]
drm/xe/gt: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Use helper xe_force_wake_ref_has_domain to verify
all domains are initialized or not. Update the return handling of
xe_force_wake_get() to reflect this behavior, and ensure that the return
value is passed as input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v4
- Rebase fix
v5
- return unsigned int for xe_force_wake_get()
- remove redundant XE_WARN_ON()
v6
- use helper for checking all initialized domains are awake or not.
v7
- Fix commit message
v9
- Remove redundant WARN_ON (Badal)
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-10-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:43 +0000 (13:25 +0530)]
drm/xe/gsc: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int for xe_force_wake_get()
- No need to WARN from caller in case of forcewake get failure.
v7
- Fix commit message
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-9-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:42 +0000 (13:25 +0530)]
drm/xe/hdcp: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
v5
- return unsigned int for xe_force_wake_get()
v7
- Fix commit message
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-8-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:41 +0000 (13:25 +0530)]
drm/xe/device: Update handling of xe_force_wake_get return
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be escalated/considered as
probing error. It internally WARNS on domain ack failure.
v5
- return unsigned int xe_force_wake_get()
v7
- Fix commit message(Badal)
v9
- s/uint/unsigned int (Nikula)
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-7-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:40 +0000 (13:25 +0530)]
drm/xe: Modify xe_force_wake_put to handle _get returned mask
Instead of calling xe_force_wake_put on all domains that were input to
xe_force_wake_get, call _put only on the domains whose reference counts
were successfully incremented by the _get call. Since the return value
of _get can be a mask that does not match any specific value in the enum
xe_force_wake_domains, change the input parameter of _put to unsigned int.
v3
- Move WARN to this patch (Badal)
- use xe_gt_WARN instead of XE_WARN (Michal)
- Stop using xe_force_wake_domains for non enum values.
- Remove kernel-doc from this patch (Badal)
-v5
- Fix global awake_domain
-v6
- put all initialized domains in case of FORCEWAKE_ALL.
- Modify ret variable name (Michal)
- Modify input var name (Michal)
- Modify commit message and warn (Badal)
-v9
- Add assert condition.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-6-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:39 +0000 (13:25 +0530)]
drm/xe: Error handling in xe_force_wake_get()
If an acknowledgment timeout occurs for a forcewake domain awake
request, do not increment the reference count for the domain. This
ensures that subsequent _get calls do not incorrectly assume the domain
is awake. The return value is a mask of domains that got refcounted,
and these domains need to be provided for subsequent xe_force_wake_put
call.
While at it, add simple kernel-doc for xe_force_wake_get()
v3
- Use explicit type for mask (Michal/Badal)
- Improve kernel-doc (Michal)
- Use unsigned int instead of abusing enum (Michal)
v5
- Use unsigned int for return (MattB/Badal/Rodrigo)
- use xe_gt_WARN for domain awake ack failure (Badal/Rodrigo)
v6
- Change XE_FORCEWAKE_ALL to single bit, this helps accommodate
actually refcounted domains in return. (Michal)
- Modify commit message and warn message (Badal)
- Remove unnecessary information in kernel-doc (Michal)
v7
- Add assert condition for valid input domains (Badal)
v9
- Update kernel-doc and simplify conditions (Michal)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-5-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:38 +0000 (13:25 +0530)]
drm/xe/forcewake: Add a helper xe_force_wake_ref_has_domain()
The helper xe_force_wake_ref_has_domain() checks if the input domain
has been successfully reference-counted and awakened in the reference.
v2
- Fix commit message and kernel-doc (Michal)
- Remove unnecessary paranthesis (Michal)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-4-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:37 +0000 (13:25 +0530)]
drm/xe/forcewake: Change awake_domain datatype
Change the datatype of awake_domains to unsigned int to accommodate
values that differ from the enum xe_force_wake_domains.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Himal Prasad Ghimiray [Mon, 14 Oct 2024 07:55:36 +0000 (13:25 +0530)]
drm/xe: Add member initialized_domains to xe_force_wake()
This field serves as a bitmask representing all initialized forcewake
domains on the GT.
v2
- Move awake_domains datatype change out of this patch (Michal)
- Rename domain_init to init_domain (Michal)
- optimize alignment (Michal)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Nirmoy Das [Wed, 16 Oct 2024 14:17:17 +0000 (16:17 +0200)]
drm/xe: Add caller info to xe_gt_reset_async
Add caller info to the xe_gt_reset_async() to help debug issues.
v2: s/%pS/%ps(Matt)
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2874
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241016141717.881143-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Shuicheng Lin [Tue, 15 Oct 2024 16:12:07 +0000 (16:12 +0000)]
drm/xe: Enlarge the invalidation timeout from 150 to 500
There are error messages like below that are occurring during stress
testing: "[ 31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global
invalidation timeout". Previously it was hitting this 3 out of 1000
executions of warm reboot. After raising it to 500, 1000 warm reboot
executions passed and it didn't fail.
Due to the way xe_mmio_wait32() is implemented, the timeout is able to
expire early when the register matches the expected value due to the
wait increments starting small. So, the larger timeout value should have
no effect during normal use cases.
v2 (Jonathan):
- rework the commit message
v3 (Lucas):
- add conclusive message for the fail rate and test case
v4:
- add suggested-by
Suggested-by: Jia Yao <jia.yao@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Tested-by: Zongyao Bai <zongyao.bai@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015161207.1373401-1-shuicheng.lin@intel.com
Shekhar Chauhan [Tue, 15 Oct 2024 16:19:38 +0000 (21:49 +0530)]
drm/xe/xe3lpg: Extend Wa_18034896535 to Xe3_LPG.
Extend Wa_18034896535 to Xe3_LPG (IP 30.00), steppings A0 to B0.
BSpec: 56852
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015161938.845996-1-shekhar.chauhan@intel.com
Juha-Pekka Heikkila [Mon, 7 Oct 2024 18:28:41 +0000 (21:28 +0300)]
drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater
On display ver 20 onwards tile4 is not supported with horizontal flip
Bspec: 69853
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241007182841.2104740-1-juhapekka.heikkila@gmail.com
Juha-Pekka Heikkila [Wed, 9 Oct 2024 15:19:47 +0000 (18:19 +0300)]
drm/xe/display: align framebuffers according to hw requirements
Align framebuffers in memory according to hw requirements instead of
default page size alignment.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009151947.2240099-3-juhapekka.heikkila@gmail.com
Juha-Pekka Heikkila [Wed, 9 Oct 2024 15:19:46 +0000 (18:19 +0300)]
drm/xe: add interface to request physical alignment for buffer objects
Add xe_bo_create_pin_map_at_aligned() which augment
xe_bo_create_pin_map_at() with alignment parameter allowing to pass
required alignemnt if it differ from default.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009151947.2240099-2-juhapekka.heikkila@gmail.com
Matthew Auld [Fri, 11 Oct 2024 13:36:34 +0000 (14:36 +0100)]
drm/xe/xe_sync: initialise ufence.signalled
We can incorrectly think that the fence has signalled, if we get a
non-zero value here from the kmalloc, which is quite plausible. Just use
kzalloc to prevent stuff like this.
Fixes:
977e5b82e090 ("drm/xe: Expose user fence from xe_sync_entry")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241011133633.388008-2-matthew.auld@intel.com
Nirmoy Das [Fri, 11 Oct 2024 15:10:29 +0000 (17:10 +0200)]
drm/xe/ufence: ufence can be signaled right after wait_woken
do_comapre() can return success after a timedout wait_woken() which was
treated as -ETIME. The loop calling wait_woken() sets correct err so
there is no need to re-evaluate err.
v2: Remove entire check that reevaluate err at the end(Matt)
Fixes:
e670f0b4ef24 ("drm/xe/uapi: Return correct error code for xe_wait_user_fence_ioctl")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630
Cc: stable@vger.kernel.org # v6.8+
Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241011151029.4160630-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Lucas De Marchi [Fri, 11 Oct 2024 03:56:18 +0000 (20:56 -0700)]
drm/xe/query: Tidy up error EFAULT returns
Move the error handling together in a single branch since all of them
are doing similar thing and return the same error.
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241011035618.1057602-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Fri, 11 Oct 2024 03:56:17 +0000 (20:56 -0700)]
drm/xe/query: Move timestamp reg to hwe_read_timestamp()
__read_timestamps() is actually reading the timestamp from a certain
hwe. Use it as parameter, move register declarations to be inside that
function and rename it.
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241011035618.1057602-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Fri, 11 Oct 2024 03:56:16 +0000 (20:56 -0700)]
drm/xe/query: Increase timestamp width
Starting with Xe2 the timestamp is a full 64 bit counter, contrary to
the 36 bit that was available before. Although 36 should be sufficient
for any reasonable delta calculation (for Xe2, of about 30min), it's
surprising to userspace to get something truncated. Also if the
timestamp being compared to is coming from the GPU and the application
is not careful enough to apply the width there, a delta calculation
would be wrong.
Extend it to full 64-bits starting with Xe2.
v2: Expand width=64 to media gt, as it's just a wrong tagging in the
spec - empirical tests show it goes beyond 36 bits and match the engines
for the main gt
Bspec: 60411
Cc: Szymon Morek <szymon.morek@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241011035618.1057602-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Matthew Brost [Wed, 11 Sep 2024 15:26:22 +0000 (08:26 -0700)]
drm/xe: Use bookkeep slots for external BO's in exec IOCTL
Fix external BO's dma-resv usage in exec IOCTL using bookkeep slots
rather than write slots. This leaves syncing to user space rather than
the KMD blindly enforcing write semantics on every external BO.
Fixes:
dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Kenneth Graunke <kenneth.w.graunke@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reported-by: Simona Vetter <simona.vetter@ffwll.ch>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2673
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911152622.903058-1-matthew.brost@intel.com
Matthew Brost [Thu, 3 Oct 2024 00:16:57 +0000 (17:16 -0700)]
drm/xe: Don't free job in TDR
Freeing job in TDR is not safe as TDR can pass the run_job thread
resulting in UAF. It is only safe for free job to naturally be called by
the scheduler. Rather free job in TDR, add to pending list.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2811
Cc: Matthew Auld <matthew.auld@intel.com>
Fixes:
e275d61c5f3f ("drm/xe/guc: Handle timing out of signaled jobs gracefully")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003001657.3517883-3-matthew.brost@intel.com
Matthew Brost [Thu, 3 Oct 2024 00:16:56 +0000 (17:16 -0700)]
drm/xe: Take job list lock in xe_sched_add_pending_job
A fragile micro optimization in xe_sched_add_pending_job relied on both
the GPU scheduler being stopped and fence signaling stopped to safely
add a job to the pending list without the job list lock in
xe_sched_add_pending_job. Remove this optimization and just take the job
list lock.
Fixes:
7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003001657.3517883-2-matthew.brost@intel.com
Imre Deak [Wed, 9 Oct 2024 19:43:58 +0000 (22:43 +0300)]
drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume
Atm the display HPD interrupts that got disabled during runtime
suspend, are re-enabled only if d3cold is enabled. Fix things by
also re-enabling the interrupts if d3cold is disabled.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009194358.1321200-5-imre.deak@intel.com
Imre Deak [Wed, 9 Oct 2024 19:43:57 +0000 (22:43 +0300)]
drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling
For clarity separate the d3cold and non-d3cold runtime PM handling. The
only change in behavior is disabling polling later during runtime
resume. This shouldn't make a difference, since the poll disabling is
handled from a work, which could run at any point wrt. the runtime
resume handler. The work will also require a runtime PM reference,
syncing it with the resume handler.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009194358.1321200-4-imre.deak@intel.com
Colin Ian King [Wed, 9 Oct 2024 16:05:10 +0000 (17:05 +0100)]
drm/xe/guc: Fix inverted logic on snapshot->copy check
Currently the check to see if snapshot->copy has been allocated is
inverted and ends up dereferencing snapshot->copy when free'ing
objects in the array when it is null or not free'ing the objects
when snapshot->copy is allocated. Fix this by using the correct
non-null pointer check logic.
Fixes:
d8ce1a977226 ("drm/xe/guc: Use a two stage dump for GuC logs and add more info")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009160510.372195-1-colin.i.king@gmail.com
Matthew Auld [Wed, 9 Oct 2024 08:48:10 +0000 (09:48 +0100)]
drm/xe: fix unbalanced rpm put() with declare_wedged()
Technically the or_reset() means we call the action on failure, however
that would lead to unbalanced rpm put(). Move the get() earlier to fix
this. It should be extremely unlikely to ever trigger this in practice.
Fixes:
452bca0edbd0 ("drm/xe: Don't suspend device upon wedge")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009084808.204432-4-matthew.auld@intel.com
Matthew Auld [Wed, 9 Oct 2024 08:48:09 +0000 (09:48 +0100)]
drm/xe: fix unbalanced rpm put() with fence_fini()
Currently we can call fence_fini() twice if something goes wrong when
sending the GuC CT for the tlb request, since we signal the fence and
return an error, leading to the caller also calling fini() on the error
path in the case of stack version of the flow, which leads to an extra
rpm put() which might later cause device to enter suspend when it
shouldn't. It looks like we can just drop the fini() call since the
fence signaller side will already call this for us.
There are known mysterious splats with device going to sleep even with
an rpm ref, and this could be one candidate.
v2 (Matt B):
- Prefer warning if we detect double fini()
Fixes:
0a382f9bc5dc ("drm/xe: Hold a PM ref when GT TLB invalidations are inflight")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009084808.204432-3-matthew.auld@intel.com
Aradhya Bhatia [Wed, 9 Oct 2024 06:55:42 +0000 (12:25 +0530)]
drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg
Add workaround (wa)
15016589081 which applies to Xe2_v3_LPG_MD.
Xe2_v3_LPG_MD is a Lunar Lake platform with GFX version: 20.04.
This wa is type: permanent, and hence is applicable on all steppings.
Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241009065542.283151-1-aradhya.bhatia@intel.com
Gustavo Sousa [Tue, 8 Oct 2024 20:46:26 +0000 (13:46 -0700)]
drm/xe/xe3: Add initial set of workarounds
Implement the initial set of workarounds for Xe3 IPs.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008204626.55802-2-matthew.s.atwood@intel.com
Thomas Hellström [Fri, 4 Oct 2024 14:11:20 +0000 (16:11 +0200)]
drm/xe/tests: Fix the shrinker test compiler warnings.
The xe_bo_shrink_kunit test has an uninitialized value and illegal
integer size conversions on 32-bit. Fix.
v2:
- Use div64_u64 to ensure the u64 division compiles everywhere. (Matt Auld)
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/
20240913195649.GA61514@thelio-3990X/
Fixes:
5a90b60db5e6 ("drm/xe: Add a xe_bo subtest for shrinking / swapping")
Cc: dri-devel@lists.freedesktop.org
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> #v1
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004141121.186177-1-thomas.hellstrom@linux.intel.com
Matthew Auld [Mon, 7 Oct 2024 07:45:42 +0000 (08:45 +0100)]
drm/xe/bmg: improve cache flushing behaviour
The BSpec says that EN_L3_RW_CCS_CACHE_FLUSH must be toggled
on for manual global invalidation to take effect and actually flush
device cache, however this also turns on flushing for things like
pipecontrol, which occurs between submissions for compute/render. This
sounds like massive overkill for our needs, where we already have the
manual flushing on the display side with the global invalidation. Some
observations on BMG:
1. Disabling l2 caching for host writes and stubbing out the driver
global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH enabled, has
no impact on wb-transient-vs-display IGT, which makes sense since the
pipecontrol is now flushing the device cache after the render copy.
Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
expected since device cache is now dirty and display engine can't see
the writes.
2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
invalidation also has no impact on wb-transient-vs-display. This
suggests that the global invalidation still works as expected and is
flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH turned on.
With that drop EN_L3_RW_CCS_CACHE_FLUSH. This helps some workloads since
we no longer flush the device cache between submissions as part of
pipecontrol.
Edit: We now also have clarification from HW side that BSpec was indeed
wrong here.
v2:
- Rebase and update commit message.
BSpec: 71718
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Vitasta Wattal <vitasta.wattal@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241007074541.33937-2-matthew.auld@intel.com
Zhanjun Dong [Fri, 4 Oct 2024 19:34:28 +0000 (12:34 -0700)]
drm/xe/guc: Save manual engine capture into capture list
Save manual engine capture into capture list.
This removes duplicate register definitions across manual-capture vs
guc-err-capture.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-7-zhanjun.dong@intel.com
Zhanjun Dong [Fri, 4 Oct 2024 19:34:27 +0000 (12:34 -0700)]
drm/xe/guc: Plumb GuC-capture into dev coredump
When we decide to kill a job, (from guc_exec_queue_timedout_job), we could
end up with 4 possible scenarios at this starting point of this decision:
1. the guc-captured register-dump is already there.
2. the driver is wedged.mode > 1, so GuC-engine-reset / GuC-err-capture
will not happen.
3. the user has started the driver in execlist-submission mode.
4. the guc-captured register-dump is not ready yet so we force GuC to kill
that context now, but:
A. we don't know yet if GuC will be successful on the engine-reset
and get the guc-err-capture, else kmd will do a manual reset later
OR B. guc will be successful and we will get a guc-err-capture
shortly.
So to accomdate the scenarios of 2 and 4A, we will need to do a manual KMD
capture first(which is not be reliable in guc-submission mode) and decide
later if we need to use that for the cases of 2 or 4A. So this flow is
part of the implementation for this patch.
Provide xe_guc_capture_get_reg_desc_list to get the register dscriptor
list.
Add manual capture by read from hw engine if GuC capture is not ready.
If it becomes ready at later time, GuC sourced data will be used.
Although there may only be a small delay between (1) the check for whether
guc-err-capture is available at the start of guc_exec_queue_timedout_job
and (2) the decision on using a valid guc-err-capture or manual-capture,
lets not take any chances and lock the matching node down so it doesn't
get re-claimed if GuC-Err-Capture subsystem is running out of pre-cached
nodes.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-6-zhanjun.dong@intel.com
Zhanjun Dong [Fri, 4 Oct 2024 19:34:26 +0000 (12:34 -0700)]
drm/xe/guc: Extract GuC error capture lists
Upon the G2H Notify-Err-Capture event, parse through the
GuC Log Buffer (error-capture-subregion) and generate one or
more capture-nodes. A single node represents a single "engine-
instance-capture-dump" and contains at least 3 register lists:
global, engine-class and engine-instance. An internal link
list is maintained to store one or more nodes.
Because the link-list node generation happen before the call
to devcoredump, duplicate global and engine-class register
lists for each engine-instance register dump if we find
dependent-engine resets in a engine-capture-group.
To avoid dynamically allocate the output nodes during gt reset,
pre-allocate a fixed number of empty nodes up front (at the
time of ADS registration) that we can consume from or return to
an internal cached list of nodes.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-5-zhanjun.dong@intel.com
Zhanjun Dong [Fri, 4 Oct 2024 19:34:25 +0000 (12:34 -0700)]
drm/xe/guc: Add capture size check in GuC log buffer
Capture-nodes generated by GuC are placed in the GuC capture ring
buffer which is a sub-region of the larger Guc-Log-buffer.
Add capture output size check before allocating the shared buffer.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-4-zhanjun.dong@intel.com
Zhanjun Dong [Fri, 4 Oct 2024 19:34:24 +0000 (12:34 -0700)]
drm/xe/guc: Add XE_LP steered register lists
Add the ability for runtime allocation and freeing of
steered register list extentions that depend on the
detected HW config fuses.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-3-zhanjun.dong@intel.com
Zhanjun Dong [Fri, 4 Oct 2024 19:34:23 +0000 (12:34 -0700)]
drm/xe/guc: Prepare GuC register list and update ADS size for error capture
Add referenced registers defines and list of registers.
Update GuC ADS size allocation to include space for
the lists of error state capture register descriptors.
Then, populate GuC ADS with the lists of registers we want
GuC to report back to host on engine reset events. This list
should include global, engine-class and engine-instance
registers for every engine-class type on the current hardware.
Ensure we allocate a persistent storage for the register lists
that are populated into ADS so that we don't need to allocate
memory during GT resets when GuC is reloaded and ADS population
happens again.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com
Matt Roper [Tue, 8 Oct 2024 01:35:09 +0000 (18:35 -0700)]
drm/xe/xe3lpm: Add new "instance0" steering table
MCR steering on Xe3 media IP is almost the same as it was on Xe2, except
for one new range (0x38D0D0 - 0x38D0FF) which has changed to an MCR
"MEDIAINF" range on Xe3. Since we can always steer to grpid /
instanceid 0 for MEDIAINF ranges, define a new "INSTANCE0" steering
table for Xe3 media. Xe3 can continue to use the same OADDRM/GPMXMT
table as Xe2.
v2: Merge continuous entries 38D0D0 - 38F0FF
Bspec: 74298
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-7-matthew.s.atwood@intel.com
Haridhar Kalvala [Tue, 8 Oct 2024 01:35:08 +0000 (18:35 -0700)]
drm/xe/ptl: Add PTL platform definition
PTL is an integrated GPU based on the Xe3 architecture.
v2: explicitly turn off display until display patches land.
Bspec: 72574
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-6-matthew.s.atwood@intel.com
Haridhar Kalvala [Tue, 8 Oct 2024 01:35:07 +0000 (18:35 -0700)]
drm/xe/ptl: PTL re-uses Xe2 MOCS table
PTL is Xe3 architecture but there is no difference between LNL and PTL
in MOCS table. So, PTL uses the same MOCS table as LNL.
Bspec: 71582
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-5-matthew.s.atwood@intel.com
Haridhar Kalvala [Tue, 8 Oct 2024 01:35:05 +0000 (18:35 -0700)]
drm/xe/xe3: Define Xe3 feature flags
Define a common set of Xe3 feature flags and definitions that will be
used for all platforms in this family.
The feature flags are inherited unchanged from the Xe2 (XE2_FEATURES)
platform.
Following B-spec details inherited from Xe2 feature flag definition
commit.
v2: reuse graphics_xe2 definition
Bspec: 58695
- dma_mask_size remains 46 (not documented in bspec)
- supports_usm=1 (Bspec 59651)
- has_flatccs=1 (Bspec 58797)
- has_4tile=1 (Bspec 58788)
- has_asid=1 (Bspec 59654, 59265, 60288)
- has_range_tlb_invalidate=1 (Bspec 71126)
- five-level page table (Bspec 59505)
- 1 VD + 1 VE + 1 SFC (Bspec 67103, 70819)
- platform engine mask (Bspec 60149)
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-3-matthew.s.atwood@intel.com
Matt Roper [Tue, 8 Oct 2024 01:35:04 +0000 (18:35 -0700)]
drm/xe/xe3: Xe3 uses the same PAT settings as Xe2
Xe3 platforms use the same PAT tables as Xe2.
Bspec: 71582
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-2-matthew.s.atwood@intel.com
Shekhar Chauhan [Mon, 7 Oct 2024 15:41:44 +0000 (08:41 -0700)]
drm/xe/ptl: L3bank mask is not available on the media GT
On PTL platforms with media version 30.00, the fuse registers for
reporting L3 bank availability to the GT just read out as ~0 and do not
provide proper values. Xe does not use the L3 bank mask for anything
internally; it only passes the mask through to userspace via the GT
topology query.
Since we don't have any way to get the real L3 bank mask, we don't want
to pass garbage to userspace. Passing a zeroed mask or a copy of the
primary GT's L3 bank mask would also be inaccurate and likely to cause
confusion for userspace. The best approach is to simply not include L3
in the list of masks returned by the topology query in cases where we
aren't able to provide a meaningful value. This won't change the
behavior for any existing platforms (where we can always obtain L3 masks
successfully for all GTs), it will only prevent us from mis-reporting
bad information on upcoming platform(s).
There's a good chance this will become a formal workaround in the
future, but for now we don't have a lineage number so "no_media_l3" is
used in place of a lineage as the OOB workaround descriptor.
v2:
- Re-calculate query size to properly match data returned. (Gustavo)
- Update kerneldoc to clarify that the L3bank mask may not be included
in the query results if the hardware doesn't make it available.
(Gustavo)
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Acked-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241007154143.2021124-2-matthew.d.roper@intel.com
John Harrison [Thu, 3 Oct 2024 00:46:11 +0000 (17:46 -0700)]
drm/xe/guc: Add a helper function for dumping GuC log to dmesg
Create a helper function that can be used to dump the GuC log to dmesg
in a manner that is reliable for extraction and decode. The intention
is that calls to this can be added by developers when debugging
specific issues that require a GuC log but do not allow easy capture
of the log - e.g. failures in selftests and failues that lead to
kernel hangs.
Also note that this is really a temporary stop-gap. The aim is to
allow on demand creation and dumping of devcoredump captures (which
includes the GuC log and much more). Currently this is not possible as
much of the devcoredump code requires a 'struct xe_sched_job' and
those are not available at many places that might want to do the dump.
v2: Add kerneldoc - review feedback from Michal W.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-12-John.C.Harrison@Intel.com
John Harrison [Thu, 3 Oct 2024 00:46:10 +0000 (17:46 -0700)]
drm/xe/guc: Add GuC log to devcoredump captures
Include the GuC log in devcoredump captures because they can be useful
with debugging certain types of bug.
v2: Fix kerneldoc
v3: Drop module parameter as now using more compact ascii85 encoding
rather than hexdump (although still not compressed) (review feedback
from Matthew B). Rebase onto recent refactoring of devcoredump code.
v4: Don't move the submission snapshot inside the GuC internals
structure 'cos it really doesn't belong there.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-11-John.C.Harrison@Intel.com