Alex Deucher [Mon, 4 Aug 2025 15:40:20 +0000 (11:40 -0400)]
drm/amdgpu: add missing vram lost check for LEGACY RESET
Legacy resets reset the memory controllers so VRAM contents
may be unreliable after reset.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 30 Jul 2025 15:16:05 +0000 (11:16 -0400)]
drm/amdgpu/discovery: fix fw based ip discovery
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes:
80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 6 Aug 2025 12:45:22 +0000 (18:15 +0530)]
drm/amd/display: Add NULL check for stream before dereference in 'dm_vupdate_high_irq'
Add a NULL check for acrtc->dm_irq_params.stream before
accessing its members.
Fixes below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:623
dm_vupdate_high_irq() warn: variable dereferenced before check
'acrtc->dm_irq_params.stream' (see line 615)
614 if (vrr_active) {
615 bool replay_en = acrtc->dm_irq_params.stream->link->replay_settings.replay_feature_enabled;
^^^^^^^^^^^^^^^^^^^^^^^^^^^
616 bool psr_en = acrtc->dm_irq_params.stream->link->psr_settings.psr_feature_enabled;
^^^^^^^^^^^^^^^^^^^^^^^^^^^ New dereferences
617 bool fs_active_var_en = acrtc->dm_irq_params.freesync_config.state
618 == VRR_STATE_ACTIVE_VARIABLE;
619
620 amdgpu_dm_crtc_handle_vblank(acrtc);
621
622 /* BTR processing for pre-DCE12 ASICs */
623 if (acrtc->dm_irq_params.stream &&
^^^^^^^^^^^^^^^^^^^^^^^^^^^ But the existing code assumed it could be NULL. Someone is wrong.
624 adev->family < AMDGPU_FAMILY_AI) {
625 spin_lock_irqsave(&adev_to_drm(adev)->event_lock, flags);
Fixes:
6d31602a9f57 ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Wed, 6 Aug 2025 08:59:41 +0000 (14:29 +0530)]
drm/amd/pm: Add caching to SMUv13.0.12 temp metric
Add table caching logic to temperature metrics tables in SMUv13.0.12
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Wed, 6 Aug 2025 07:22:47 +0000 (12:52 +0530)]
drm/amd/pm: Add cache logic for temperature metric
Add caching logic for baseboard and gpuboard temperature metrics tables.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Wed, 6 Aug 2025 06:19:59 +0000 (11:49 +0530)]
drm/amd/pm: Remove cache logic from SMUv13.0.12
Remove caching logic of temperature metrics from SMUv13.0.12. The
caching logic needs to be moved to a higher level.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Mon, 4 Aug 2025 07:59:05 +0000 (13:29 +0530)]
drm/amd/pm: Add unique ids for SMUv13.0.6 SOCs
Fetch and store the unique ids for AIDs/XCDs in SMUv13.0.6 SOCs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Mon, 4 Aug 2025 07:43:06 +0000 (13:13 +0530)]
drm/amdgpu: Add helpers to set/get unique ids
Add a struct to store unique id information for each type. Add helper
to fetch the unique id.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 24 Jul 2025 07:35:12 +0000 (13:05 +0530)]
drm/amdgpu: Prevent hardware access in dpc state
Don't allow hardware access while in dpc state.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 5 Aug 2025 12:10:09 +0000 (17:40 +0530)]
drm/amdgpu/vcn: Fix double-free of vcn dump buffer
The buffer is already freed as part of amdgpu_vcn_reg_dump_fini(). The
issue is introduced by below patch series.
Fixes:
de55cbff5ce9 ("drm/amdgpu/vcn: Add regdump helper functions")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Wed, 23 Jul 2025 05:13:00 +0000 (10:43 +0530)]
drm/amdgpu: Log reset source during recovery
To get more context, add reset source to identify the source of gpu
recovery - job timeout, RAS, HWS hang etc.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 4 Aug 2025 14:46:30 +0000 (22:46 +0800)]
drm/amdgpu: Generate BP threshold exceed CPER once threshold exceeded
The bad pages threshold exceed CPER should be generated once threshold
exceeded, no matter the bad_page_threshold setted or not.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Fri, 1 Aug 2025 20:32:29 +0000 (04:32 +0800)]
drm/amd/pm: Enable temperature metrics caps
Enable temperature metrics caps for smu_v13_0_12
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Fri, 1 Aug 2025 20:29:06 +0000 (04:29 +0800)]
drm/amd/pm: Add temperature metrics sysfs entry
Add temperature metrics sysfs entry to expose gpuboard/baseboard
temperature metrics
v2: Removed unused function, rename functions(Lijo)
v3: Remove unnecessary initialization
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Fri, 1 Aug 2025 20:26:13 +0000 (04:26 +0800)]
drm/amd/pm: Fetch and fill temperature metrics
Fetch system metrics table to fill gpuboard/baseboard temperature
metrics data for smu_v13_0_12
v2: Remove unnecessary checks, used separate metrics time for
temperature metrics table(Lijo)
v3: Use cached values for back to back system metrics query(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Fri, 1 Aug 2025 18:41:50 +0000 (02:41 +0800)]
drm/amd/pm: Update pmfw header for smu_v13_0_12
Update pmfw header for smu_v13_0_12 with system temperature metrics
table
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Fri, 1 Aug 2025 18:10:12 +0000 (02:10 +0800)]
drm/amd/pm: Add smu interface for temp metrics
Add smu interface to get baseboard/gpuboard temperature metrics
v2: Rename is_support to is_supported(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Fri, 1 Aug 2025 17:48:09 +0000 (01:48 +0800)]
drm/amd/pm: Add dpm interface for temp metrics
Add dpm interface to get gpuboard/baseboard temperature metrics
v2: Add temperature metrics support check(Lijo)
v3: Return error code in case of operation not supported(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Tue, 5 Aug 2025 14:02:07 +0000 (10:02 -0400)]
drm/amd/display: Fix vupdate_offload_work doc
Fix the following warning in struct documentation:
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:168: warning: expecting prototype for struct dm_vupdate_work. Prototype was for struct vupdate_offload_work instead
Fixes:
c210b757b400 ("drm/amd/display: fix dmub access race condition")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Wed, 28 May 2025 16:38:58 +0000 (12:38 -0400)]
drm/amdkfd: return migration pages from copy function
dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit
should always be set when migration success. cpage includes
src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit
unset pages for both ram and vram when memory is only allocated
without being populated before migration, those ram pages should
be counted as migrate pages and those vram pages should not be
counted as migrate pages. Here migration pages refer to how many
vram pages involved.
-v2 use dst to check MIGRATE_PFN_VALID bit (suggested-by Philip)
-v3 add warning when vram pages is less than migration pages
return migration pages directly from copy function
-v4 correct comments and copy function return mpage (suggested-by Felix)
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Wed, 28 May 2025 15:51:18 +0000 (11:51 -0400)]
drm/amdkfd: remove unused code
upages is assigned under cpages = 0, so it isn't really used in this function.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Philip.Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 24 Jul 2025 05:43:27 +0000 (11:13 +0530)]
drm/amd/pm: Add priority messages for SMU v13.0.6
Certain messages will processed with high priority by PMFW even if it
hasn't responded to a previous message. Send the priority message
regardless of the success/fail status of the previous message. Add
support on SMUv13.0.6 and SMUv13.0.12
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 24 Jul 2025 07:28:10 +0000 (12:58 +0530)]
drm/amdgpu: Set dpc status appropriately
Set the dpc status based on hardware state. Also, clear the status before
reinitialization after a successful reset.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Amber Lin [Fri, 1 Aug 2025 00:45:00 +0000 (20:45 -0400)]
drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 18 Jul 2025 13:20:58 +0000 (18:50 +0530)]
drm/amdgpu: Wait for bootloader after PSPv11 reset
Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead
of checking for C2PMSG_33 status, add the callback wait_for_bootloader.
Wait for bootloader to be back to steady state is already part of the
generic mode-1 reset flow. Increase the retry count for bootloader wait
and also fix the mask to prevent fake pass.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Sat, 2 Aug 2025 01:38:16 +0000 (21:38 -0400)]
drm/amdgpu/gfx9.4.3: remove redundant repeated nested 0 check
The repeated checks on grbm_soft_reset are unnecessary. Remove them.
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Sat, 2 Aug 2025 01:45:41 +0000 (21:45 -0400)]
drm/amdgpu/gfx9: remove redundant repeated nested 0 check
The repeated checks on grbm_soft_reset are unnecessary. Remove them.
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Sat, 2 Aug 2025 01:41:42 +0000 (21:41 -0400)]
drm/amdgpu/gfx10: remove redundant repeated nested 0 check
The repeated checks on grbm_soft_reset are unnecessary. Remove them.
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xaver Hugl [Thu, 31 Jul 2025 22:49:51 +0000 (00:49 +0200)]
amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
With a timeout of only 1 second, my rx 5700XT fails to initialize,
so this increases the timeout to 2s.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alexandre Demers [Sun, 3 Aug 2025 02:27:31 +0000 (22:27 -0400)]
Documentation: Remove VCE support from OLAND's features
OLAND doesn't support VCE at all, but it does support UVD (3 or 4,
depending of the sources).
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Mon, 4 Aug 2025 05:04:06 +0000 (10:34 +0530)]
drm/amd/pm: Make static table support conditional
Add PMFW version check for static table support on SMU v13.0.6 VFs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Thu, 31 Jul 2025 06:54:50 +0000 (14:54 +0800)]
drm/amdgpu: Fix vcn v4.0.3 poison irq call trace on sriov guest
Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini.
[25209.468816] Call Trace:
[25209.468817] <TASK>
[25209.468818] ? srso_alias_return_thunk+0x5/0x7f
[25209.468820] ? show_trace_log_lvl+0x28e/0x2ea
[25209.468822] ? show_trace_log_lvl+0x28e/0x2ea
[25209.468825] ? vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu]
[25209.468936] ? show_regs.part.0+0x23/0x29
[25209.468939] ? show_regs.cold+0x8/0xd
[25209.468940] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469038] ? __warn+0x8c/0x100
[25209.469040] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469135] ? report_bug+0xa4/0xd0
[25209.469138] ? handle_bug+0x39/0x90
[25209.469140] ? exc_invalid_op+0x19/0x70
[25209.469142] ? asm_exc_invalid_op+0x1b/0x20
[25209.469146] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469241] vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu]
[25209.469343] amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu]
[25209.469511] amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu]
Fixes:
4c4a89149608 ("drm/amdgpu: Register aqua vanjaram vcn poison irq")
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Thu, 31 Jul 2025 06:28:26 +0000 (14:28 +0800)]
drm/amdgpu: Fix jpeg v4.0.3 poison irq call trace on sriov guest
Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini.
[25209.467154] Call Trace:
[25209.467156] <TASK>
[25209.467158] ? srso_alias_return_thunk+0x5/0x7f
[25209.467162] ? show_trace_log_lvl+0x28e/0x2ea
[25209.467166] ? show_trace_log_lvl+0x28e/0x2ea
[25209.467171] ? jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu]
[25209.467300] ? show_regs.part.0+0x23/0x29
[25209.467303] ? show_regs.cold+0x8/0xd
[25209.467304] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467403] ? __warn+0x8c/0x100
[25209.467407] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467503] ? report_bug+0xa4/0xd0
[25209.467508] ? handle_bug+0x39/0x90
[25209.467511] ? exc_invalid_op+0x19/0x70
[25209.467513] ? asm_exc_invalid_op+0x1b/0x20
[25209.467518] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467613] ? amdgpu_irq_put+0x5f/0xc0 [amdgpu]
[25209.467709] jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu]
[25209.467805] amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu]
[25209.467971] amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu]
Fixes:
1b2231de4163 ("drm/amdgpu: Register aqua vanjaram jpeg poison irq")
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 24 Jul 2025 07:22:56 +0000 (12:52 +0530)]
drm/amdgpu: Add wrapper function for dpc state
Use wrapper functions to set/indicate dpc status.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 1 Aug 2025 18:12:01 +0000 (23:42 +0530)]
drm/amd/pm: Allow static metrics table query in VF
Allow statics metrics table to be queried on SMUv13.0.6 SOCs in VF mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Mon, 4 Aug 2025 00:43:15 +0000 (08:43 +0800)]
drm/amdgpu: Update SDMA firmware version check for user queue support
This commit fixes a firmware version check for enabling user queue
support in SDMA v7.0. The previous version check (
7836028) was
incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL
commands causing register conflicts between MCU_DBG0 and MCU_DBG1.
Fixes:
8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 11 Jul 2025 06:48:04 +0000 (12:18 +0530)]
drm/amd/pm: Use cached metrics data on arcturus
Cached metrics data validity is 1ms on arcturus. It's not reasonable for
any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 11 Jul 2025 06:45:45 +0000 (12:15 +0530)]
drm/amd/pm: Use cached metrics data on aldebaran
Cached metrics data validity is 1ms on aldebaran. It's not reasonable
for any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 18 Jul 2025 03:55:21 +0000 (09:25 +0530)]
drm/amdgpu: Add NULL check for asic_funcs
If driver load fails too early, asic_funcs pointer remains unassigned.
Add NULL check to sanitize unwind path.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 25 Jul 2025 22:14:58 +0000 (17:14 -0500)]
drm/amd/display: Promote DC to 3.2.344
Summary:
* Add interface to log hw state when underflow happens
* Fix hubp programming of 3dlut fast load
* Avoid Read Remote DPCD Many Times
* More liberal vmin/vmax update for freesync
* Fix dmub access race condition
Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Muhammad Ahmed [Fri, 25 Jul 2025 01:50:25 +0000 (21:50 -0400)]
drm/amd/display: Adding interface to log hw state when underflow happens
[why]
Will help us better debug underflow issues.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Muhammad Ahmed <Muhammad.Ahmed@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ryan Seto [Thu, 24 Jul 2025 18:57:52 +0000 (14:57 -0400)]
drm/amd/display: Toggle for Disable Force Pstate Allow on Disable
[Why & How]
In theory, driver should be able to support disabling force pstate allow
after hardware release however this behavior is not tested yet.
Introducing a new toggle to disable the force on the fly.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reza Amini [Mon, 14 Jul 2025 20:22:38 +0000 (16:22 -0400)]
drm/amd/display: Fixing hubp programming of 3dlut fast load
[why]
HUBP needs to know the size of the lut's destination in MPC.
This is currently defaulted to 17, and needs to be set for specific
lut size.
[how]
Define and apply the missing hubp field. Taking this opportunity
to consolidate the programming of 3dlut into a hubp and mpc function.
Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Signed-off-by: Reza Amini <reza.amini@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jingwen Zhu [Mon, 14 Jul 2025 08:18:19 +0000 (16:18 +0800)]
drm/amd/display: limited pll vco w/a v2
[Why/How]
The w/a will cause reboot black screen issue.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Jingwen Zhu <Jingwen.Zhu@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Thu, 10 Jul 2025 01:42:54 +0000 (21:42 -0400)]
drm/amd/display: Avoid Read Remote DPCD Many Times
Reading remote dpcd is time consuming. Instead of reading each byte
one by one, read 16 bytes together.
Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Mon, 21 Jul 2025 04:39:41 +0000 (23:39 -0500)]
drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
This reverts commit
66abb996999de0d440a02583a6e70c2c24deab45.
This broke custom brightness curves but it wasn't obvious because
of other related changes. Custom brightness curves are always
from a 0-255 input signal. The correct fix was to fix the default
value which was done by [1].
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412
Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel.org/T/#m69f875a7e69aa22df3370b3e3a9e69f4a61fdaf2
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Paul Hsieh [Wed, 23 Jul 2025 03:51:42 +0000 (11:51 +0800)]
drm/amd/display: update dpp/disp clock from smu clock table
[Why]
The reason some high-resolution monitors fail to display properly
is that this platform does not support sufficiently high DPP and
DISP clock frequencies
[How]
Update DISP and DPP clocks from the smu clock table then DML can
filter these mode if not support.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Paul Hsieh <Paul.Hsieh@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Wed, 16 Apr 2025 15:26:54 +0000 (11:26 -0400)]
drm/amd/display: more liberal vmin/vmax update for freesync
[Why]
FAMS2 expects vmin/vmax to be updated in the case when freesync is
off, but supported. But we only update it when freesync is enabled.
[How]
Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called
irrespective of whether freesync is enabled. If freesync is supported,
then there is no harm in updating vmin/vmax registers.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Mon, 21 Jul 2025 15:03:39 +0000 (11:03 -0400)]
drm/amd/display: fix dmub access race condition
Accessing DC from amdgpu_dm is usually preceded by acquisition of
dc_lock mutex. Most of the DC API that DM calls are under a DC lock.
However, there are a few that are not. Some DC API called from interrupt
context end up sending DMUB commands via a DC API, while other threads were
using DMUB. This was apparent from a race between calls for setting idle
optimization enable/disable and the DC API to set vmin/vmax.
Offload the call to dc_stream_adjust_vmin_vmax() to a thread instead
of directly calling them from the interrupt handler such that it waits
for dc_lock.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Duncan Ma [Tue, 22 Jul 2025 16:22:15 +0000 (12:22 -0400)]
drm/amd/display: Adjust AUX-less ALPM setting
[Why & How]
Change ACDS period to support LTTPR.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Duncan Ma <Duncan.Ma@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Siyang Liu [Fri, 4 Jul 2025 03:16:22 +0000 (11:16 +0800)]
drm/amd/display: fix a Null pointer dereference vulnerability
[Why]
A null pointer dereference vulnerability exists in the AMD display driver's
(DC module) cleanup function dc_destruct().
When display control context (dc->ctx) construction fails
(due to memory allocation failure), this pointer remains NULL.
During subsequent error handling when dc_destruct() is called,
there's no NULL check before dereferencing the perf_trace member
(dc->ctx->perf_trace), causing a kernel null pointer dereference crash.
[How]
Check if dc->ctx is non-NULL before dereferencing.
Link: https://lore.kernel.org/r/tencent_54FF4252EDFB6533090A491A25EEF3EDBF06@qq.com
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
(Updated commit text and removed unnecessary error message)
Signed-off-by: Siyang Liu <Security@tencent.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mangesh Gadre [Mon, 21 Jul 2025 15:32:34 +0000 (23:32 +0800)]
drm/amdgpu: Initialize vcn v5_0_1 ras function
Initialize vcn v5_0_1 ras function
Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michel Dänzer [Wed, 30 Jul 2025 08:09:02 +0000 (10:09 +0200)]
drm/amd/display: Add primary plane to commits for correct VRR handling
amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for
the primary plane. If a commit affects a CRTC but not its primary plane,
it would previously not trigger a refresh cycle or affect LFC, violating
current UAPI semantics.
Fixes e.g. atomic commits affecting only the cursor plane being limited
to the minimum refresh rate.
Don't do this for the legacy cursor ioctls though, it would break the
UAPI semantics for those.
Suggested-by: Xaver Hugl <xaver.hugl@kde.org>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yunxiang Li [Fri, 25 Jul 2025 16:56:35 +0000 (12:56 -0400)]
drm/amdgpu: skip mgpu fan boost for multi-vf
On multi-vf setup if the VM have two vf assigned, perhaps from two
different gpus, mgpu fan boost will fail.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mangesh Gadre [Mon, 21 Jul 2025 16:27:52 +0000 (00:27 +0800)]
drm/amdgpu: Initialize jpeg v5_0_1 ras function
Initialize jpeg v5_0_1 ras function
Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 30 Jul 2025 03:07:43 +0000 (11:07 +0800)]
drm/amdgpu: Skip poison aca bank from UE channel
Avoid GFX poison consumption errors logged when fatal error occurs.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Arnd Bergmann [Mon, 14 Jul 2025 08:16:25 +0000 (10:16 +0200)]
drm/amdgpu: fix link error for !PM_SLEEP
When power management is not enabled in the kernel build, the newly
added hibernation changes cause a link failure:
arm-linux-gnueabi-ld: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o: in function `amdgpu_pmops_thaw':
amdgpu_drv.c:(.text+0x1514): undefined reference to `pm_hibernate_is_recovering'
Make the power management code in this driver conditional on
CONFIG_PM and CONFIG_PM_SLEEP
Fixes:
530694f54dd5 ("drm/amdgpu: do not resume device in thaw for normal hibernation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250714081635.4071570-1-arnd@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 27 Jun 2025 14:10:31 +0000 (10:10 -0400)]
drm/amd/display: add more cyan skillfish devices
Add PCI IDs to support display probe for cyan skillfish
family of SOCs.
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 18 Jul 2025 19:53:21 +0000 (15:53 -0400)]
drm/amdgpu: update mmhub 3.3 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
v2: fix typos spotted by David Wu.
v3: fix additional typo spotted by David.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 18 Jul 2025 19:52:04 +0000 (15:52 -0400)]
drm/amdgpu: update mmhub 3.0.1 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 18 Jul 2025 07:53:53 +0000 (13:23 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN2_5
Use generic vcn devcoredump helper functions for VCN2_5 and VCN2_6
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 18 Jul 2025 07:45:00 +0000 (13:15 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN2_0_0
Use generic vcn devcoredump helper functions for VCN2_0_0
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 18 Jul 2025 07:38:00 +0000 (13:08 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN3_0
Use generic vcn devcoredump helper functions for VCN3_0
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 18 Jul 2025 07:27:04 +0000 (12:57 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN4_0_3
Use generic vcn devcoredump helper functions for VCN4_0_3
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 18 Jul 2025 07:08:49 +0000 (12:38 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN4_0_5
Use generic vcn devcoredump helper functions for VCN4_0_5
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 18 Jul 2025 06:48:30 +0000 (12:18 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN4_0_0
Use generic vcn devcoredump helper functions for VCN4_0_0
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Thu, 17 Jul 2025 18:57:50 +0000 (00:27 +0530)]
drm/amdgpu/vcn: Register dump cleanup in VCN5
Use generic vcn devcoredump helper functions for VCN5
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Mon, 28 Jul 2025 11:49:24 +0000 (19:49 +0800)]
drm/amdgpu: Add new error code for VCN/JPEG new chain
Add VIDS and JPEG8/9 S|D chain error code for VCN/JPEG v5.0.1.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Mon, 28 Jul 2025 11:33:50 +0000 (19:33 +0800)]
drm/amdgpu: Fix vcn v5.0.1 poison irq call trace
Why:
[13014.890792] Call Trace:
[13014.890793] <TASK>
[13014.890795] ? show_trace_log_lvl+0x1d6/0x2ea
[13014.890799] ? show_trace_log_lvl+0x1d6/0x2ea
[13014.890800] ? vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu]
[13014.890872] ? show_regs.part.0+0x23/0x29
[13014.890873] ? show_regs.cold+0x8/0xd
[13014.890874] ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
[13014.890934] ? __warn+0x8c/0x100
[13014.890936] ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
[13014.890995] ? report_bug+0xa4/0xd0
[13014.890999] ? handle_bug+0x39/0x90
[13014.891001] ? exc_invalid_op+0x19/0x70
[13014.891003] ? asm_exc_invalid_op+0x1b/0x20
[13014.891005] ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
[13014.891065] ? amdgpu_irq_put+0x63/0xe0 [amdgpu]
[13014.891124] vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu]
[13014.891189] amdgpu_ip_block_hw_fini+0x3b/0x78 [amdgpu]
[13014.891309] amdgpu_device_fini_hw+0x3c1/0x479 [amdgpu]
How:
Add omitted vcn poison irq get call.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Thu, 17 Jul 2025 06:00:52 +0000 (11:30 +0530)]
drm/amdgpu/vcn: Add regdump helper functions
Add generic helper functions for vcn devcoredump support
which can be re-used for all vcn versions.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Meng Li [Fri, 9 May 2025 05:44:24 +0000 (13:44 +0800)]
drm/amd/amdgpu: Release xcp drm memory after unplug
Add a new API amdgpu_xcp_drm_dev_free().
After unplug xcp device, need to release xcp drm memory etc.
Co-developed-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
YuanShang [Wed, 23 Jul 2025 08:44:49 +0000 (16:44 +0800)]
drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
The field job->vm is used in function amdgpu_job_run to get the page
table re-generation counter and decide whether the job should be skipped.
Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use.
For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed,
then the gfx job should be skipped.
Fixes:
26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Tue, 15 Jul 2025 21:24:20 +0000 (16:24 -0500)]
drm/amd: Use drm_*() macros instead of DRM_*() for amdgpu_cs
Some of the IOCTL messages can be called for different GPUs and it might
not be obvious which one called them from a problem. Using the drm_*()
macros the correct device will be shown in the messages.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250715212420.2254925-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yunshui Jiang [Thu, 24 Jul 2025 01:37:53 +0000 (09:37 +0800)]
drm/amdgpu: use kmalloc_array() instead of kmalloc()
Use kmalloc_array() instead of kmalloc() with multiplication.
kmalloc_array() is a safer way because of its multiply overflow check.
Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Mon, 28 Jul 2025 12:57:06 +0000 (18:27 +0530)]
drm/amdgpu: Fix unintended error log in VCN5_0_0
The error log is supposed to be gaurded under if failure condition.
Fixes:
faab5ea08367 ("drm/amdgpu: Check vcn sram load return value")
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Tue, 22 Jul 2025 15:58:30 +0000 (17:58 +0200)]
drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
Apparently, both DCE 6.0 and 6.4 have 3 PLLs, but PLL0 can only
be used for DP. Make sure to initialize the correct amount of PLLs
in DC for these DCE versions and use PLL0 only for DP.
Also, on DCE 6.0 and 6.4, the PLL0 needs to be powered on at
initialization as opposed to DCE 6.1 and 7.x which use a different
clock source for DFS.
The following functions were used as reference from the old
radeon driver implementation of DCE 6.x:
- radeon_atom_pick_pll
- atombios_crtc_set_disp_eng_pll
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Tue, 22 Jul 2025 15:58:29 +0000 (17:58 +0200)]
drm/amd/display: Don't overwrite dce60_clk_mgr
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr
with the dce_clk_mgr, causing incorrect behaviour on DCE6.
Fix it by removing the extra dce_clk_mgr_construct.
Fixes:
62eab49faae7 ("drm/amd/display: hide VGH asic specific structs")
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ce Sun [Sat, 26 Jul 2025 12:16:24 +0000 (20:16 +0800)]
drm/amdgpu: Effective health check before reset
Move amdgpu_device_health_check into amdgpu_device_gpu_recover to
ensure that if the device is present can be checked before reset
The reason is:
1.During the dpc event, the device where the dpc event occurs is not
present on the bus
2.When both dpc event and ATHUB event occur simultaneously,the dpc thread
holds the reset domain lock when detecting error,and the gpu recover thread
acquires the hive lock.The device is simultaneously in the states of
amdgpu_ras_in_recovery and occurs_dpc,so gpu recover thread will not go to
amdgpu_device_health_check.It waits for the reset domain lock held by the
dpc thread, but dpc thread has not released the reset domain lock.In the dpc
callback slot_reset,to obtain the hive lock, the hive lock is held by the
gpu recover thread at this time.So a deadlock occurred
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ce Sun [Sun, 27 Jul 2025 04:06:55 +0000 (12:06 +0800)]
drm/amdgpu: Avoid rma causes GPU duplicate reset
Try to ensure poison creation handle is completed in time
to set device rma value.
Signed-off-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 23 Jul 2025 06:28:35 +0000 (14:28 +0800)]
drm/amdgpu: Update IPID value for bad page threshold CPER
Update the IPID register value for bad page threshold CPER according to
the latest definition.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Mon, 21 Jul 2025 13:34:34 +0000 (19:04 +0530)]
drm/amdgpu: Fix kdoc style in amdgpu_fence.c
The initial comment block before
amdgpu_fence_driver_guilty_force_completion() incorrectly used '/**' but
is not a kernel-doc comment, causing build warnings.
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:742: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Kernel queue reset handling
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Wed, 16 Jul 2025 22:04:28 +0000 (22:04 +0000)]
drm/amdkfd: Fix checkpoint-restore on multi-xcc
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and
restores all the MQDs within the partition.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 28 Jul 2025 15:21:19 +0000 (11:21 -0400)]
Alex Deucher [Mon, 28 Jul 2025 15:15:40 +0000 (11:15 -0400)]
Documentation: update APU and dGPU tables with MP0/1 info
Add MP1 for APUs and MP0 and MP1 details for dGPUs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3905
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 25 Jul 2025 03:12:22 +0000 (22:12 -0500)]
drm/amd: Restore cached manual clock settings during resume
If the SCLK limits have been set before S3 they will not
be restored. The limits are however cached in the driver and so
they can be restored by running a commit sequence during resume.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250725031222.3015095-3-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 25 Jul 2025 03:12:21 +0000 (22:12 -0500)]
drm/amd: Restore cached power limit during resume
The power limit will be cached in smu->current_power_limit but
if the ASIC goes into S3 this value won't be restored.
Restore the value during SMU resume.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250725031222.3015095-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Perry Yuan [Mon, 7 Jul 2025 02:45:28 +0000 (10:45 +0800)]
drm/amdgpu: Fix build error when CONFIG_SUSPEND is disabled
The variable `pm_suspend_target_state` is conditionally defined only when
`CONFIG_SUSPEND` is enabled (see `include/linux/suspend.h`). Directly
referencing it without guarding by `#ifdef CONFIG_SUSPEND` causes build
failures when suspend functionality is disabled (e.g., `CONFIG_SUSPEND=n`).
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 6 Jun 2025 12:13:37 +0000 (14:13 +0200)]
drm/amdgpu: rework how PTE flags are generated v3
Previously we tried to keep the HW specific PTE flags in each mapping,
but for CRIU that isn't sufficient any more since the original value is
needed for the checkpoint procedure.
So rework the whole handling, nuke the early mapping function, keep the
UAPI flags in each mapping instead of the HW flags and translate them to
the HW flags while filling in the PTEs.
Only tested on Navi 23 for now, so probably needs quite a bit of more
work.
v2: fix KFD and SVN handling
v3: one more SVN fix pointed out by Felix
v4: squash in gfx12 fix from David
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yann Dirson [Sun, 20 Jul 2025 14:13:17 +0000 (16:13 +0200)]
drm/amdgpu: fix module parameter description
Fix dcdebugmask description.
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yann Dirson [Sun, 20 Jul 2025 14:13:16 +0000 (16:13 +0200)]
Documentation/amdgpu: fix 'in the amdgfx' formulation
Clarify the mailing list.
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Amber Lin [Thu, 17 Jul 2025 15:39:58 +0000 (11:39 -0400)]
drm/amdgpu: Add chain runlists support to GC9.4.2
Starting from MEC v97, GC 9.4.2 supports chain runlists of XNACK+/XNACK-
processes.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sat, 19 Jul 2025 00:26:12 +0000 (19:26 -0500)]
drm/amd/display: Promote DAL to 3.2.343
Summary:
* Fix caching streams for LT automation
* Fix DMUB command alignment
* Disabling DSC power gating on DCN314
* Add debugfs for Replay
* Add debug option for BW allocation mode
* Removal of unnecessary includes for faster compilation
* Refactor of code, including adding SPDX license to amdgpu_dm
Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Thu, 17 Jul 2025 20:18:58 +0000 (16:18 -0400)]
drm/amd/display: Cache streams targeting link when performing LT automation
[WHY]
Last LT automation update can cause crash by referencing current_state and
calling into dc_update_planes_and_stream which may clobber current_state.
[HOW]
Cache relevant stream pointers and iterate through them instead of relying
on the current_state.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ovidiu Bunea [Tue, 15 Jul 2025 21:26:39 +0000 (17:26 -0400)]
drm/amd/display: Fix dmub_cmd header alignment
[why & how]
Header misalignment in struct dmub_cmd_replay_copy_settings_data and
struct dmub_alpm_auxless_data causes incorrect data read between driver
and dmub.
Fix the misalignment and ensure that everything is aligned to 4-byte
boundaries.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ray Wu [Wed, 11 Jun 2025 06:02:25 +0000 (14:02 +0800)]
drm/amd/display: Add Replay residency in debugfs
[Why]
Users can access the replay residency to get PHY off percentage
[How]
Start capture residency:
sudo echo 1 /sys/kernel/debug/dri/0/eDP-1/replay_residency
Stop and Get replay residency:
sudo cat /sys/kernel/debug/dri/0/eDP-1/replay_residency
Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Wed, 19 Mar 2025 22:04:01 +0000 (18:04 -0400)]
drm/amd/display: Increase AUX Intra-Hop Done Max Wait Duration
[WHY]
In the worst case, AUX intra-hop done can take hundreds of milliseconds as
each retimer in a link might have to wait a full AUX_RD_INTERVAL to send
LT abort downstream.
[HOW]
Wait 300ms for each retimer in a link to allow time to propagate a LT abort
without infinitely waiting on intra-hop done.
For no-retimer case, keep the max duration at 10ms.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cruise Hung [Tue, 15 Jul 2025 08:36:44 +0000 (16:36 +0800)]
drm/amd/display: Add debug option to control BW Allocation mode
[Why & How]
Add debug option to control BW Allocation mode.
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Reviewed-by: PeiChen (Pei-Chen) Huang <peichen.huang@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Relja Vojvodic [Mon, 14 Jul 2025 15:56:50 +0000 (11:56 -0400)]
drm/amd/display: Allow for sharing of some link and audio link functions
[Why&How]
Allow for sharing of some link and audio link functions by removing static
keyword from function definitions.
Expose those functions in the HWSEQ header.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chiang, Richard [Tue, 15 Jul 2025 13:59:54 +0000 (21:59 +0800)]
drm/amd/display: Remove update_planes_and_stream_v1 sequence
[Why]/How]
Remove the update_planes_and_stream_v1 sequence to make the logic the same.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Chiang, Richard <Richard.Chiang@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 16 Jul 2025 20:53:43 +0000 (15:53 -0500)]
drm/amd/display: Rename dcn31 string shown to user
[Why]
DCN31 isn't a product, but DCN312 is. Matching against documentation users
might not understand the code.
[How]
Change DCN 3.1 string to be DCN 3.1.2.
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>