linux-2.6-microblaze.git
3 years agodrm/amd/display: 3.2.135
Aric Cyr [Mon, 3 May 2021 02:04:15 +0000 (22:04 -0400)]
drm/amd/display: 3.2.135

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: fix use_max_lb flag for 420 pixel formats
Dmytro Laktyushkin [Mon, 19 Apr 2021 21:50:53 +0000 (17:50 -0400)]
drm/amd/display: fix use_max_lb flag for 420 pixel formats

Right now the flag simply selects memory config 0 when flag is true
however 420 modes benefit more from memory config 3.

Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Handle potential dpp_inst mismatch with pipe_idx
Anthony Wang [Fri, 30 Apr 2021 13:09:02 +0000 (09:09 -0400)]
drm/amd/display: Handle potential dpp_inst mismatch with pipe_idx

[Why]
In some pipe harvesting configs, we will select the incorrect
dpp_inst when programming DTO. This is because when any intermediate
pipe is fused, resource instances are no longer in 1:1
correspondence with pipe index.

[How]
When looping through pipes to program DTO, get the dpp_inst
associated with each pipe from res_pool.

Signed-off-by: Anthony Wang <anthony1.wang@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Handle pixel format test request
Ilya Bakoulin [Thu, 15 Apr 2021 20:02:25 +0000 (16:02 -0400)]
drm/amd/display: Handle pixel format test request

[Why]
Some DSC tests fail because stream pixel encoding does not change
its value according to the type requested in the DPCD test params.

[How]
Set stream pixel encoding before updating DSC config and configuring
the test pattern.

Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Reviewed-by: Hanghong Ma <Hanghong.Ma@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fix clock table filling logic
Ilya Bakoulin [Mon, 26 Apr 2021 18:27:38 +0000 (14:27 -0400)]
drm/amd/display: Fix clock table filling logic

[Why]
Currently, the code that fills the clock table can miss filling
information about some of the higher voltage states advertised
by the SMU. This, in turn, may cause some of the higher pixel clock
modes (e.g. 8k60) to fail validation.

[How]
Fill the table with one entry per DCFCLK level instead of one entry
per FCLK level. This is needed because the maximum FCLK does not
necessarily need maximum voltage, whereas DCFCLK values from SMU
cover the full voltage range.

Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: minor dp link training refactor
Wenjing Liu [Tue, 13 Apr 2021 22:44:40 +0000 (18:44 -0400)]
drm/amd/display: minor dp link training refactor

[how]
The change includes some dp link training refactors:
1. break down is_ch_eq_done to checking individual conditions in
its own function.
2. update dpcd_set_training_pattern to take in dc_dp_training_pattern
as input.
3. moving lttpr mode struct definition into link_service_types.h

Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: George Shen <George.Shen@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: DETBufferSizeInKbyte variable type modifications
Chaitanya Dhere [Tue, 20 Apr 2021 21:21:17 +0000 (17:21 -0400)]
drm/amd/display: DETBufferSizeInKbyte variable type modifications

[Why]
DETBufferSizeInKByte is not expected to be sub-dividable, hence
unsigned int is a better suited data-type. Change it to an array
as well to satisfy current requirements.

[How]
Change the data-type of DETBufferSizeInKByte to an unsigned int
array. Modify the all the variables like DETBufferSizeY,
DETBufferSizeC that are involved in DETBufferSizeInKByte calculations
to unsigned int in all the display_mode_vba_xx files.

Signed-off-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Add dc log for DP SST DSC enable/disable
Fangzhi Zuo [Fri, 23 Apr 2021 21:00:04 +0000 (17:00 -0400)]
drm/amd/display: Add dc log for DP SST DSC enable/disable

Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Expand DP module training API.
Jimmy Kizito [Sat, 10 Apr 2021 02:23:53 +0000 (22:23 -0400)]
drm/amd/display: Expand DP module training API.

[Why & How]
Add functionality useful for DP link training to public interface.

Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Add fallback and abort paths for DP link training.
Jimmy Kizito [Wed, 7 Apr 2021 22:56:19 +0000 (18:56 -0400)]
drm/amd/display: Add fallback and abort paths for DP link training.

[Why]
When enabling a DisplayPort stream:
- Optionally reducing link bandwidth between failed link training
attempts should progressively relax training requirements.
- Abandoning link training altogether if a sink is unplugged should
avoid unnecessary training attempts.

[How]
- Add fallback parameter to DP link training function and reduce link
bandwidth between failed training attempts as long as stream bandwidth
requirements are met.
- Add training status for sink unplug and abort training when this
status is reported.

Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Update setting of DP training parameters.
Jimmy Kizito [Mon, 5 Apr 2021 22:05:09 +0000 (18:05 -0400)]
drm/amd/display: Update setting of DP training parameters.

[Why]
Some links are dynamically assigned link encoders on stream enablement.

[How]
Update DisplayPort training parameter determination stage that assumes
link encoder statically assigned to link.

Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Update DPRX detection.
Jimmy Kizito [Thu, 1 Apr 2021 16:59:54 +0000 (12:59 -0400)]
drm/amd/display: Update DPRX detection.

[Why]
Some extra provisions are required during DPRX detection for links which
lack physical HPD and AUX/DDC pins.

[How]
Avoid attempting to access nonexistent physical pins during DPRX
detection.

Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: covert ras status to kernel errno
Dennis Li [Mon, 10 May 2021 03:04:59 +0000 (11:04 +0800)]
drm/amdgpu: covert ras status to kernel errno

The original codes use ras status and kernl errno together in the same
function, which is a wrong code style.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Avoid HPD IRQ in GPU reset state
Zhan Liu [Sun, 9 May 2021 23:30:36 +0000 (19:30 -0400)]
drm/amd/display: Avoid HPD IRQ in GPU reset state

[Why]
If GPU is in reset state, force enabling link will cause
unexpected behaviour.

[How]
Avoid handling HPD IRQ when GPU is in reset state.

Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nikola Cornij <nikola.cornij@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Quit RAS initialization earlier if RAS is disabled
Oak Zeng [Thu, 6 May 2021 16:24:17 +0000 (11:24 -0500)]
drm/amdgpu: Quit RAS initialization earlier if RAS is disabled

If RAS is disabled through amdgpu_ras_enable kernel parameter,
we should quit the RAS initialization eariler to avoid initialization
of some RAS data structure such as sysfs etc.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: handle errors returned by svm_migrate_copy_to_vram/ram
Philip Yang [Wed, 28 Apr 2021 22:57:57 +0000 (18:57 -0400)]
drm/amdkfd: handle errors returned by svm_migrate_copy_to_vram/ram

If migration copy failed because process is killed, or out of VRAM or
system memory, pass error code back to caller to handle error
gracefully.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Export ras_*_enabled to debugfs
Luben Tuikov [Tue, 4 May 2021 06:32:20 +0000 (02:32 -0400)]
drm/amdgpu: Export ras_*_enabled to debugfs

Export the runtime-set "ras_hw_enabled" and
"ras_enabled" to debugfs, for debugging.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Rename to ras_*_enabled
Luben Tuikov [Tue, 4 May 2021 06:25:29 +0000 (02:25 -0400)]
drm/amdgpu: Rename to ras_*_enabled

Rename,
  ras_hw_supported --> ras_hw_enabled, and
  ras_features     --> ras_enabled,
to show that ras_enabled is a subset of
ras_hw_enabled, which itself is a subset
of the ASIC capability.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Move up ras_hw_supported
Luben Tuikov [Tue, 4 May 2021 00:43:00 +0000 (20:43 -0400)]
drm/amdgpu: Move up ras_hw_supported

Move ras_hw_supported into struct amdgpu_dev.
The dependency is:
struct amdgpu_ras <== struct amdgpu_dev <== ASIC,
read as "struct amdgpu_ras depends on struct
amdgpu_dev, which depends on the hardware."

This can be loosely understood as, "if RAS is
supported, which is property of the ASIC (struct
amdgpu_dev), then we can access struct
amdgpu_ras."

v2: Fix a typo: must binary AND in ternary cond
    in amdgpu_ras.c

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Remove redundant ras->supported
Luben Tuikov [Tue, 4 May 2021 00:02:22 +0000 (20:02 -0400)]
drm/amdgpu: Remove redundant ras->supported

Remove redundant ras->supported, as this value
is also stored in adev->ras_features.

Use adev->ras_features, as that supercedes "ras",
since the latter is its member.

The dependency goes like this:
ras <== adev->ras_features <== hw_supported,
and is read as "ras depends on ras_features, which
depends on hw_supported." The arrows show the flow
of information, i.e. the dependency update.

"hw_supported" should also live in "adev".

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update vcn1.0 Non-DPG suspend sequence
Sathishkumar S [Mon, 3 May 2021 18:27:31 +0000 (23:57 +0530)]
drm/amdgpu: update vcn1.0 Non-DPG suspend sequence

update suspend register settings in Non-DPG mode.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update the shader to clear specific SGPRs
Dennis Li [Thu, 6 May 2021 05:26:28 +0000 (13:26 +0800)]
drm/amdgpu: update the shader to clear specific SGPRs

Add shader codes to explicitly clear specific SGPRs, such as
flat_scratch_lo, flat_scratch_hi and so on. And also correct the
allocation size of SGPRs in PGM_RSRC1.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Enable TCP channel hashing for Aldebaran
Mukul Joshi [Tue, 9 Mar 2021 18:42:33 +0000 (13:42 -0500)]
drm/amdgpu: Enable TCP channel hashing for Aldebaran

Enable TCP channel hashing to match DF hash settings for Aldebaran.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: add ACPI SRAT parsing for topology
Eric Huang [Fri, 23 Apr 2021 19:05:17 +0000 (15:05 -0400)]
drm/amdkfd: add ACPI SRAT parsing for topology

In NPS4 BIOS we need to find the closest numa node when creating
topology io link between cpu and gpu, if PCI driver doesn't set
it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/pm: add documentation for pp_od_clock_voltage for vangogh
Alex Deucher [Fri, 30 Apr 2021 16:40:19 +0000 (12:40 -0400)]
drm/amdgpu/pm: add documentation for pp_od_clock_voltage for vangogh

Vangogh follows other APUs, but also allows core clock adjustments.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/pm: add documentation for pp_od_clock_voltage for APUs
Alex Deucher [Fri, 30 Apr 2021 16:21:46 +0000 (12:21 -0400)]
drm/amdgpu/pm: add documentation for pp_od_clock_voltage for APUs

APUs only support adjusting the SCLK domain.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: expose pmfw attached timestamp on Aldebaran
Evan Quan [Mon, 26 Apr 2021 07:20:04 +0000 (15:20 +0800)]
drm/amd/pm: expose pmfw attached timestamp on Aldebaran

Available with 68.18.0 and later PMFWs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: new gpu_metrics structure for pmfw attached timestamp
Evan Quan [Mon, 26 Apr 2021 07:14:22 +0000 (15:14 +0800)]
drm/amd/pm: new gpu_metrics structure for pmfw attached timestamp

Supported by some latest ASICs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Use device specific BO size & stride check.
Bas Nieuwenhuizen [Tue, 4 May 2021 09:43:34 +0000 (11:43 +0200)]
drm/amdgpu: Use device specific BO size & stride check.

The builtin size check isn't really the right thing for AMD
modifiers due to a couple of reasons:

1) In the format structs we don't do set any of the tilesize / blocks
etc. to avoid having format arrays per modifier/GPU
2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even
for tiled ...
3) The pitch for the DCC planes is really the pixel pitch of the main
surface that would be covered by it ...

Note that we only handle GFX9+ case but we do this after converting
the implicit modifier to an explicit modifier, so on GFX9+ all
framebuffers should be checked here.

There is a TODO about DCC alignment, but it isn't worse than before
and I'd need to dig a bunch into the specifics. Getting this out in
a reasonable timeframe to make sure it gets the appropriate testing
seemed more important.

Finally as I've found that debugging addfb2 failures is a pita I was
generous adding explicit error messages to every failure case.

Fixes: f258907fdd83 ("drm/amdgpu: Verify bo size can fit framebuffer size on init.")
Tested-by: Simon Ser <contact@emersion.fr>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.
Bas Nieuwenhuizen [Wed, 5 May 2021 01:27:49 +0000 (03:27 +0200)]
drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.

Otherwise tiling modes that require the values form this field
(In particular _*_X) would be corrupted upon video decode.

Copied from the VCN v2 code.

Fixes: 99541f392b4d ("drm/amdgpu: add mc resume DPG mode for VCN3.0")
Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: change the default timeout for kernel compute queues
Alex Deucher [Tue, 4 May 2021 15:00:42 +0000 (11:00 -0400)]
drm/amdgpu: change the default timeout for kernel compute queues

Change to 60s.  This matches what we already do in virtualization.
Infinite timeout can lead to deadlocks in the kernel.

Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: set vcn mgcg flag for picasso
Sathishkumar S [Mon, 3 May 2021 07:04:10 +0000 (12:34 +0530)]
drm/amdgpu: set vcn mgcg flag for picasso

enable vcn mgcg flag for picasso.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Update L1 and add L2/3 cache information
Mike Li [Fri, 26 Mar 2021 19:14:12 +0000 (15:14 -0400)]
drm/amdkfd: Update L1 and add L2/3 cache information

The L1 cache information has been updated and the L2/L3
information has been added. The changes have been made
for Vega10 and newer ASICs. There are no changes
for the older ASICs before Vega10.

Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/amdgpu/amdgpu_drv.c: Replace drm_modeset_lock_all with drm_modeset_lock
Fabio M. De Francesco [Tue, 27 Apr 2021 09:44:49 +0000 (11:44 +0200)]
drm/amd/amdgpu/amdgpu_drv.c: Replace drm_modeset_lock_all with drm_modeset_lock

drm_modeset_lock_all() is not needed here, so it is replaced with
drm_modeset_lock(). The crtc list around which we are looping never
changes, therefore the only lock we need is to protect access to
crtc->state.

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: initialize variable
Tom Rix [Fri, 30 Apr 2021 17:16:54 +0000 (10:16 -0700)]
drm/amd/pm: initialize variable

Static analysis reports this problem

amdgpu_pm.c:478:16: warning: The right operand of '<' is a garbage value
  for (i = 0; i < data.nums; i++) {
                ^ ~~~~~~~~~

In some cases data is not set.  Initialize to 0 and flag not setting
data as an error with the existing check.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/radeon: Avoid power table parsing memory leaks
Kees Cook [Mon, 3 May 2021 05:06:08 +0000 (22:06 -0700)]
drm/radeon: Avoid power table parsing memory leaks

Avoid leaving a hanging pre-allocated clock_info if last mode is
invalid, and avoid heap corruption if no valid modes are found.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537
Fixes: 6991b8f2a319 ("drm/radeon/kms: fix segfault in pm rework")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/radeon: Fix off-by-one power_state index heap overwrite
Kees Cook [Mon, 3 May 2021 05:06:07 +0000 (22:06 -0700)]
drm/radeon: Fix off-by-one power_state index heap overwrite

An out of bounds write happens when setting the default power state.
KASAN sees this as:

[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
==================================================================
BUG: KASAN: slab-out-of-bounds in
radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
Write of size 4 at addr ffff88810178d858 by task systemd-udevd/157

CPU: 0 PID: 157 Comm: systemd-udevd Not tainted 5.12.0-E620 #50
Hardware name: eMachines        eMachines E620  /Nile       , BIOS V1.03 09/30/2008
Call Trace:
 dump_stack+0xa5/0xe6
 print_address_description.constprop.0+0x18/0x239
 kasan_report+0x170/0x1a8
 radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
 radeon_atombios_get_power_modes+0x144/0x1888 [radeon]
 radeon_pm_init+0x1019/0x1904 [radeon]
 rs690_init+0x76e/0x84a [radeon]
 radeon_device_init+0x1c1a/0x21e5 [radeon]
 radeon_driver_load_kms+0xf5/0x30b [radeon]
 drm_dev_register+0x255/0x4a0 [drm]
 radeon_pci_probe+0x246/0x2f6 [radeon]
 pci_device_probe+0x1aa/0x294
 really_probe+0x30e/0x850
 driver_probe_device+0xe6/0x135
 device_driver_attach+0xc1/0xf8
 __driver_attach+0x13f/0x146
 bus_for_each_dev+0xfa/0x146
 bus_add_driver+0x2b3/0x447
 driver_register+0x242/0x2c1
 do_one_initcall+0x149/0x2fd
 do_init_module+0x1ae/0x573
 load_module+0x4dee/0x5cca
 __do_sys_finit_module+0xf1/0x140
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Without KASAN, this will manifest later when the kernel attempts to
allocate memory that was stomped, since it collides with the inline slab
freelist pointer:

invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 PID: 781 Comm: openrc-run.sh Tainted: G        W 5.10.12-gentoo-E620 #2
Hardware name: eMachines        eMachines E620  /Nile , BIOS V1.03       09/30/2008
RIP: 0010:kfree+0x115/0x230
Code: 89 c5 e8 75 ea ff ff 48 8b 00 0f ba e0 09 72 63 e8 1f f4 ff ff 41 89 c4 48 8b 45 00 0f ba e0 10 72 0a 48 8b 45 08 a8 01 75 02 <0f> 0b 44 89 e1 48 c7 c2 00 f0 ff ff be 06 00 00 00 48 d3 e2 48 c7
RSP: 0018:ffffb42f40267e10 EFLAGS: 00010246
RAX: ffffd61280ee8d88 RBX: 0000000000000004 RCX: 000000008010000d
RDX: 4000000000000000 RSI: ffffffffba1360b0 RDI: ffffd61280ee8d80
RBP: ffffd61280ee8d80 R08: ffffffffb91bebdf R09: 0000000000000000
R10: ffff8fe2c1047ac8 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100
FS:  00007fe80eff6b68(0000) GS:ffff8fe339c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe80eec7bc0 CR3: 0000000038012000 CR4: 00000000000006f0
Call Trace:
 __free_fdtable+0x16/0x1f
 put_files_struct+0x81/0x9b
 do_exit+0x433/0x94d
 do_group_exit+0xa6/0xa6
 __x64_sys_exit_group+0xf/0xf
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe80ef64bea
Code: Unable to access opcode bytes at RIP 0x7fe80ef64bc0.
RSP: 002b:00007ffdb1c47528 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe80ef64bea
RDX: 00007fe80ef64f60 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 00007fe80ee2c620 R11: 0000000000000246 R12: 00007fe80eff41e0
R13: 00000000ffffffff R14: 0000000000000024 R15: 00007fe80edf9cd0
Modules linked in: radeon(+) ath5k(+) snd_hda_codec_realtek ...

Use a valid power_state index when initializing the "flags" and "misc"
and "misc2" fields.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537
Reported-by: Erhard F. <erhard_f@mailbox.org>
Fixes: a48b9b4edb8b ("drm/radeon/kms/pm: add asic specific callbacks for getting power state (v2)")
Fixes: 79daedc94281 ("drm/radeon/kms: minor pm cleanups")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: drop the GCR packet from the emit_ib frame for sdma5.0
Alex Deucher [Tue, 4 May 2021 13:58:26 +0000 (09:58 -0400)]
drm/amdgpu: drop the GCR packet from the emit_ib frame for sdma5.0

It's not needed here and has been added to the proper place
in the previous patch.  This aligns with what we do for sdma 5.2.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add graphics cache rinse packet for sdma 5.0
Alex Deucher [Tue, 20 Apr 2021 19:26:46 +0000 (15:26 -0400)]
drm/amdgpu: Add graphics cache rinse packet for sdma 5.0

Add emit mem sync callback for sdma_v5_0

In amdgpu sync object test, three threads created jobs
to send GFX IB and SDMA IB in sequence. After the first
GFX thread joined, sometimes the third thread will reuse
the same physical page to store the SDMA IB. There will
be a risk that SDMA will read GFX IB in the previous physical
page. So it's better to flush the cache before commit sdma IB.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoMAINTAINERS: fix a few more amdgpu tree links
Alex Deucher [Mon, 3 May 2021 13:43:21 +0000 (09:43 -0400)]
MAINTAINERS: fix a few more amdgpu tree links

Switch to gitlab.

Fixes: 101c2fae5108d7 ("MAINTAINERS: update radeon/amdgpu/amdkfd git trees")
Cc: David Ward <david.ward@gatech.edu>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add debugfs node to read private buffer
Lijo Lazar [Thu, 15 Apr 2021 07:29:22 +0000 (15:29 +0800)]
drm/amd/pm: Add debugfs node to read private buffer

Add debugfs interface to read region allocated for FW private buffer

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add interface to get FW private buffer
Lijo Lazar [Thu, 15 Apr 2021 07:22:28 +0000 (15:22 +0800)]
drm/amd/pm: Add interface to get FW private buffer

v1: Add new interface to get FW private buffer details
v2: Drop domain check
v3: Use amdgpu_bo_kmap to get cpu address

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: fix no atomics settings in the kfd topology
Jonathan Kim [Fri, 30 Apr 2021 19:31:42 +0000 (15:31 -0400)]
drm/amdkfd: fix no atomics settings in the kfd topology

To account for various PCIe and xGMI setups, check the no atomics settings
for a device in relation to every direct peer.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: 3.2.134
Aric Cyr [Sun, 25 Apr 2021 08:22:23 +0000 (04:22 -0400)]
drm/amd/display: 3.2.134

This version brings improvements across DP, eDP, DMUB, DSC, etc

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: [FW Promotion] Release 0.0.64
Anthony Koo [Sat, 24 Apr 2021 14:42:47 +0000 (10:42 -0400)]
drm/amd/display: [FW Promotion] Release 0.0.64

Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Set stream_count to 0 when dc_resource_state_destruct.
Paul Wu [Fri, 16 Apr 2021 05:33:26 +0000 (13:33 +0800)]
drm/amd/display: Set stream_count to 0 when dc_resource_state_destruct.

[Why]
When hardware need to be reset, driver need to reset stream objects but
dc_resource_state_destruct function omit resetting stream_count. It will
lead page fault if some logic will touch stream object.

[How]
Set stream_count to 0 when dc_resource_state_destruct.

Signed-off-by: Paul Wu <paul.wu@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Filter out YCbCr420 timing if VSC SDP not supported
George Shen [Fri, 16 Apr 2021 21:35:07 +0000 (17:35 -0400)]
drm/amd/display: Filter out YCbCr420 timing if VSC SDP not supported

[Why]
Per DP specification, YCbCr420 shall use VSC SDP.

[How]
For YCbCr420 timings, fail DP mode timing validation
if DPCD caps do not indicate VSC SDP colorimetry
support.

Signed-off-by: George Shen <george.shen@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: remove checking sink in is_timing_changed
Calvin Hou [Wed, 7 Apr 2021 19:07:45 +0000 (15:07 -0400)]
drm/amd/display: remove checking sink in is_timing_changed

[Why]
Sometimes, such as sleep wake, the link->local sink pointer changed,
but the dc_stream_state->sink pointer is not changed. The checking
of timing_changed reports wrong result, lead to link tear down
unexpected wrongly.

[How]
SST compare local sink, MST compare proper remote link.

Signed-off-by: Calvin Hou <Calvin.Hou@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Add audio support for DFP type of active branch is DP case
Dale Zhao [Mon, 19 Apr 2021 08:38:55 +0000 (16:38 +0800)]
drm/amd/display: Add audio support for DFP type of active branch is DP case

[Why]
Per DP spec, for active protocol convertor adaptor, DP source should enable audio
for DFP type is DP, HDMI or DP++. Current is_dp_active_dongle() checking is not
precise, which treat branch device default as active dongle. As a result, we will
mistakenly disable audio for DFP type is DP case.

[How]
Make is_dp_active_dongle() checking more precise for active dongle types.
Rename active diongle type as SST branch device in case confusion.

Signed-off-by: Dale Zhao <dale.zhao@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Support for DMUB AUX
Jude Shih [Tue, 20 Apr 2021 02:19:37 +0000 (10:19 +0800)]
drm/amd/display: Support for DMUB AUX

[WHY]
To process AUX transactions with DMUB using inbox1 and outbox1 mail boxes.

[How]
1) Added inbox1 command DMUB_CMD__DP_AUX_ACCESS to issue AUX commands
   to DMUB in dc_process_dmub_aux_transfer_async(). DMUB processes AUX cmd
   with DCN and sends reply back in an outbox1 message triggering an
   outbox1 interrupt to driver.
2) In existing driver implementation, AUX commands are processed
   synchronously by configuring DCN reg. But in DMUB AUX, driver sends an
   inbox1 message and waits for a conditional variable (CV) which will be
   signaled by outbox1 ISR.
3) DM will retrieve Outbox1 message and send back reply to upper layer
   and complete the AUX command

Signed-off-by: Jude Shih <shenshih@amd.com>
Reviewed-by: Hanghong Ma <Hanghong.Ma@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: update DCN to use new surface programming
Paul Hsieh [Wed, 21 Apr 2021 03:01:25 +0000 (11:01 +0800)]
drm/amd/display: update DCN to use new surface programming

[Why]
The split pipe config is updated due to antoher stream bandwidth
validataion. Driver doesn't reprogram the split pipe config
to signle pipe cause SW use signel pipe but HW still use pipe split.

[How]
track global updates and update any hw that isn't
related to current stream being updated.

Signed-off-by: Paul Hsieh <paul.hsieh@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Extend DMUB HW params to allow DM to specify boot options
Eric Yang [Fri, 16 Apr 2021 19:30:04 +0000 (15:30 -0400)]
drm/amd/display: Extend DMUB HW params to allow DM to specify boot options

[Why & How]
Add the field to HW params to allow DM dynamically pass down debug and
boot options as needed.

Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/dc: Revert commit "treat memory as a single-channel"
Aric Cyr [Tue, 20 Apr 2021 16:28:20 +0000 (12:28 -0400)]
drm/amdgpu/dc: Revert commit "treat memory as a single-channel"

This reverts commit "dc: treat memory as a single-channel for
asymmetric memory".

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: multi-eDP backlight support
Mikita Lipski [Wed, 14 Apr 2021 18:51:02 +0000 (14:51 -0400)]
drm/amd/display: multi-eDP backlight support

[why]
Currently the assumption is that we are using a single eDP
connector so there will only be one backlight object. Need changes
to allow brightness update and reading for multiple eDP connectors.

[how]
- register a single device
- turn backlight link from a pointer to an array of pointers
- update brightness of all eDP links at the same time when request
is registered
- read brightness level only of the primary eDP panel
- turn current_backlight_pwm and targer_backlight_pwm debugfs enteries
into per connector enteries.

Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Wayne Lin <waynelin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: force enable gfx ras for vega20 ws
Stanley.Yang [Thu, 29 Apr 2021 01:32:12 +0000 (09:32 +0800)]
drm/amdgpu: force enable gfx ras for vega20 ws

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Avoid gpio conflict on MST branch
Nikola Cornij [Fri, 30 Apr 2021 23:34:29 +0000 (19:34 -0400)]
drm/amd/display: Avoid gpio conflict on MST branch

[Why]
Similar to SST branch, gpio conflict also needs to be avoided on
MST. Without doing so, there is a chance that gpio conflict will
occur if multiple gpio interrupts arrive simultaneously.

[How]
By mutex locking/unlocking &aconnector->hpd_lock,
we won't get gpio conflict when handling hpd.

Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Bhawanpreet Lakha <bhawanpreet.lakha@amd.com>
Acked-by: Zhan Liu <zhan.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: add dsc stream overhead for dp only
Wenjing Liu [Thu, 22 Apr 2021 18:01:25 +0000 (14:01 -0400)]
drm/amd/display: add dsc stream overhead for dp only

[why]
Based on hardware team recommendation this additional dsc overhead
is only required for DP DSC.

[how]
Add a check for is_dp and only apply the overhead if this flag is set.

Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: fix potential gpu reset deadlock
Roman Li [Mon, 19 Apr 2021 15:47:00 +0000 (11:47 -0400)]
drm/amd/display: fix potential gpu reset deadlock

[Why]
In gpu reset dc_lock acquired in dm_suspend().
Asynchronously handle_hpd_rx_irq can also be called
through amdgpu_dm_irq_suspend->flush_work, which also
tries to acquire dc_lock. That causes a deadlock.

[How]
Check if amdgpu executing reset before acquiring dc_lock.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Make svm_migrate_put_sys_page static
Felix Kuehling [Fri, 30 Apr 2021 09:07:18 +0000 (05:07 -0400)]
drm/amdkfd: Make svm_migrate_put_sys_page static

This function is only used in this source file.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: remove excess function parameter
Nirmoy Das [Fri, 30 Apr 2021 11:35:26 +0000 (13:35 +0200)]
drm/amdgpu: remove excess function parameter

Fix below htmldocs build warning:

"warning: Excess function parameter 'vm_context' description in 'amdgpu_vm_init'"

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: flush TLB after updating GPU page table
Philip Yang [Thu, 15 Apr 2021 14:08:58 +0000 (10:08 -0400)]
drm/amdkfd: flush TLB after updating GPU page table

To workaround the situation that vm retry fault keep coming after page
table update. We are investigating the root cause, but once this issue
happens, application will stuck and sometimes have to reboot to recover.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Rename the flags to eliminate ambiguity v2
Peng Ju Zhou [Thu, 29 Apr 2021 06:16:52 +0000 (14:16 +0800)]
drm/amdgpu: Rename the flags to eliminate ambiguity v2

The flags vf_reg_access_* may cause confusion,
rename the flags to make it more clear.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add new MC firmware for Polaris12 32bit ASIC
Evan Quan [Wed, 28 Apr 2021 04:00:20 +0000 (12:00 +0800)]
drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC

Polaris12 32bit ASIC needs a special MC firmware.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add Aldebaran virtualization support
Zhigang Luo [Thu, 29 Apr 2021 19:37:31 +0000 (15:37 -0400)]
drm/amdgpu: Add Aldebaran virtualization support

1. add Aldebaran in virtualization detection list.
2. disable Aldebaran virtual display support as there is no GFX
   engine in Aldebaran.
3. skip TMR loading if Aldebaran is in virtualizatin mode as it
   shares the one host loaded.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Add Aldebaran virtualization support
Zhigang Luo [Thu, 29 Apr 2021 19:10:15 +0000 (15:10 -0400)]
drm/amdkfd: Add Aldebaran virtualization support

update kfd_supported_devices to enable Aldebaran virtualization support

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add a new device ID for Aldebaran
Zhigang Luo [Thu, 29 Apr 2021 18:15:48 +0000 (14:15 -0400)]
drm/amdgpu: Add a new device ID for Aldebaran

It is Aldebaran VF device ID, for virtualization support.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: report the numa weight between host and device over xgmi
Jonathan Kim [Wed, 21 Apr 2021 19:08:18 +0000 (15:08 -0400)]
drm/amdkfd: report the numa weight between host and device over xgmi

GPUs connected to CPUs over xGMI are bidirectional so set weight by a
single hop both ways.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: report atomics support in io_links over xgmi
Jonathan Kim [Wed, 21 Apr 2021 19:05:49 +0000 (15:05 -0400)]
drm/amdkfd: report atomics support in io_links over xgmi

Link atomics support over xGMI should be reported independently of PCIe.
Do not set NO_ATOMICS flags on devices that support xGMI but that do not
have atomics support over PCIe.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Remove duplicate declaration of dc_state
Wan Jiabing [Thu, 29 Apr 2021 12:38:36 +0000 (20:38 +0800)]
drm/amd/display: Remove duplicate declaration of dc_state

There are two declarations of struct dc_state here.
Remove the later duplicate more secure.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Remove duplicate include of hubp.h
Wan Jiabing [Thu, 29 Apr 2021 03:04:01 +0000 (11:04 +0800)]
drm/amd/display: Remove duplicate include of hubp.h

In commit 482812d56698e ("drm/amd/display: Set max TTU on
DPG enable"), "hubp.h" was added which caused the duplicate include.
To be on the safe side, remove the later duplicate include.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fix two cursor duplication when using overlay
Rodrigo Siqueira [Wed, 14 Apr 2021 00:06:04 +0000 (20:06 -0400)]
drm/amd/display: Fix two cursor duplication when using overlay

Our driver supports overlay planes, and as expected, some userspace
compositor takes advantage of these features. If the userspace is not
enabling the cursor, they can use multiple planes as they please.
Nevertheless, we start to have constraints when userspace tries to
enable hardware cursor with various planes. Basically, we cannot draw
the cursor at the same size and position on two separated pipes since it
uses extra bandwidth and DML only run with one cursor.

For those reasons, when we enable hardware cursor and multiple planes,
our driver should accept variations like the ones described below:

  +-------------+   +--------------+
  | +---------+ |   |              |
  | |Primary  | |   | Primary      |
  | |         | |   | Overlay      |
  | +---------+ |   |              |
  |Overlay      |   |              |
  +-------------+   +--------------+

In this scenario, we can have the desktop UI in the overlay and some
other framebuffer attached to the primary plane (e.g., video). However,
userspace needs to obey some rules and avoid scenarios like the ones
described below (when enabling hw cursor):

                                      +--------+
                                      |Overlay |
 +-------------+    +-----+-------+ +-|        |--+
 | +--------+  | +--------+       | | +--------+  |
 | |Overlay |  | |Overlay |       | |             |
 | |        |  | |        |       | |             |
 | +--------+  | +--------+       | |             |
 | Primary     |    | Primary     | | Primary     |
 +-------------+    +-------------+ +-------------+

 +-------------+   +-------------+
 |     +--------+  |  Primary    |
 |     |Overlay |  |             |
 |     |        |  |             |
 |     +--------+  | +--------+  |
 | Primary     |   | |Overlay |  |
 +-------------+   +-|        |--+
                     +--------+

If the userspace violates some of the above scenarios, our driver needs
to reject the commit; otherwise, we can have unexpected behavior. Since
we don't have a proper driver validation for the above case, we can see
some problems like a duplicate cursor in applications that use multiple
planes. This commit fixes the cursor issue and others by adding adequate
verification for multiple planes.

Change since V1 (Harry and Sean):
- Remove cursor verification from the equation.

Cc: Louis Li <Ching-shih.Li@amd.com>
Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Hersen Wu <hersenxs.wu@amd.com>
Cc: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable gfx ras in aldebran by default
Hawking Zhang [Thu, 29 Apr 2021 06:28:13 +0000 (14:28 +0800)]
drm/amdgpu: enable gfx ras in aldebran by default

gfx ras now can be enabled by default in aldebaran

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: switch to mmhub ras callback for ras fini
Hawking Zhang [Wed, 28 Apr 2021 15:10:36 +0000 (23:10 +0800)]
drm/amdgpu: switch to mmhub ras callback for ras fini

invoke callback function for mmhub ras fini

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: retired reset_ras_error_count from hdp callbacks
Hawking Zhang [Wed, 28 Apr 2021 08:45:35 +0000 (16:45 +0800)]
drm/amdgpu: retired reset_ras_error_count from hdp callbacks

It was moved to hdp ras callbacks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable ras error count query and reset for HDP
Hawking Zhang [Wed, 28 Apr 2021 14:51:22 +0000 (22:51 +0800)]
drm/amdgpu: enable ras error count query and reset for HDP

add hdp block ras error query and reset support in
amdgpu ras error count query and reset interface

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: init/fini hdp v4_0 ras
Hawking Zhang [Wed, 28 Apr 2021 14:20:00 +0000 (22:20 +0800)]
drm/amdgpu: init/fini hdp v4_0 ras

invoke hdp v4_0 ras init in gmc late_init phase
while ras fini in gmc sw_fini phase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: initialize hdp v4_0 ras functions
Hawking Zhang [Wed, 28 Apr 2021 13:21:08 +0000 (21:21 +0800)]
drm/amdgpu: initialize hdp v4_0 ras functions

hdp v4_0 support ras features

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: implement hdp v4_0 ras functions
Hawking Zhang [Wed, 28 Apr 2021 14:59:14 +0000 (22:59 +0800)]
drm/amdgpu: implement hdp v4_0 ras functions

implement hdp v4_0 ras functions, including
ras init/fini, query/reset_error_counter

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add helpers for hdp ras init/fini
Hawking Zhang [Wed, 28 Apr 2021 13:07:41 +0000 (21:07 +0800)]
drm/amdgpu: add helpers for hdp ras init/fini

hdp ras init/fini are common functions that
can be shared among hdp generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add hdp ras structures
Hawking Zhang [Wed, 28 Apr 2021 13:22:46 +0000 (21:22 +0800)]
drm/amdgpu: add hdp ras structures

centralize all hdp ras operation to ras_funcs

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoamdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create
Simon Ser [Wed, 21 Apr 2021 09:16:35 +0000 (11:16 +0200)]
amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create

This error code-path is missing a drm_gem_object_put call. Other
error code-paths are fine.

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 1769152ac64b ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fix build warnings
Guenter Roeck [Wed, 21 Apr 2021 16:18:02 +0000 (09:18 -0700)]
drm/amd/display: Fix build warnings

Fix the following build warnings.

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:
In function ‘dm_update_mst_vcpi_slots_for_dsc’:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6242:46:
warning: variable ‘old_con_state’ set but not used

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:
In function ‘amdgpu_dm_commit_cursors’:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7709:44:
warning: variable ‘new_plane_state’ set but not used

The variables were introduced to be used in iterators, but not used.
Use other iterators which don't require the unused variables.

Fixes: 8ad278062de4e ("drm/amd/display: Disable cursors before disabling planes")
Fixes: 29b9ba74f6384 ("drm/amd/display: Recalculate VCPI slots for new DSC connectors")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/amdgpu: Fix errors in documentation of function parameters
Fabio M. De Francesco [Fri, 23 Apr 2021 11:38:46 +0000 (13:38 +0200)]
drm/amd/amdgpu: Fix errors in documentation of function parameters

In the function documentation, I removed the excess parameters,
described the undocumented ones, and fixed the syntax errors.

Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/display: add documentation for dmcub_trace_event_en
Alex Deucher [Fri, 23 Apr 2021 20:46:19 +0000 (16:46 -0400)]
drm/amdgpu/display: add documentation for dmcub_trace_event_en

Was missing when this structure was updated.

Fixes: 46a83eba276cd3 ("drm/amd/display: Add debugfs to control DMUB trace buffer events")
Reviewed-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm/powerplay/hwmgr: Fix kernel-doc syntax in documentation
Fabio M. De Francesco [Sat, 24 Apr 2021 22:19:00 +0000 (00:19 +0200)]
drm/amd/pm/powerplay/hwmgr: Fix kernel-doc syntax in documentation

Fixed kernel-doc syntax errors in documentation of functions.

Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Register VGA clients after init can no longer fail
Kai-Heng Feng [Mon, 26 Apr 2021 10:50:00 +0000 (18:50 +0800)]
drm/amdgpu: Register VGA clients after init can no longer fail

When an amdgpu device fails to init, it makes another VGA device cause
kernel splat:
kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu: probe of 0000:08:00.0 failed with error -110
...
kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G        W         5.12.0-rc8+ #12
kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020
kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu]
kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0
kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000
kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38
kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246
kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000
kernel: FS:  00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  vga_arbiter_notify_clients.part.0+0x4a/0x80
kernel:  vga_get+0x17f/0x1c0
kernel:  vga_arb_write+0x121/0x6a0
kernel:  ? apparmor_file_permission+0x1c/0x20
kernel:  ? security_file_permission+0x30/0x180
kernel:  vfs_write+0xca/0x280
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x38/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f93041e02f7
kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7
kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f
kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0
kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808
kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980
kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic
kernel: CR2: 0000000000000018
kernel: ---[ end trace 76d04313d4214c51 ]---

Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init
failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(),
so the VGA clients remain registered. So when
vga_arbiter_notify_clients() iterates over registered clients, it causes
NULL pointer dereference.

Since there's no reason to register VGA clients that early, so solve
the issue by putting them after all the goto cleanups.

v2:
 - Remove redundant vga_switcheroo cleanup in failed: label.

Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown
Pavan Kumar Ramayanam [Tue, 27 Apr 2021 04:51:18 +0000 (10:21 +0530)]
drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown

The runtime resume PM op disregards the return value from
amdgpu_device_resume(), masking errors for failed resumes at the PM
layer.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pavan Kumar Ramayanam <pavan.ramayanam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: fix r initial values
Victor Zhao [Tue, 27 Apr 2021 09:52:56 +0000 (17:52 +0800)]
drm/amdgpu: fix r initial values

Sriov gets suspend of IP block <dce_virtual> failed as return
value was not initialized.

v2: return 0 directly to align original code semantic before this
was broken out into a separate helper function instead of setting
initial values

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: fix no full coverage issue for gprs initialization
Dennis Li [Tue, 27 Apr 2021 14:21:03 +0000 (22:21 +0800)]
drm/amdgpu: fix no full coverage issue for gprs initialization

The wave's number per simd in aldebaran is changed to 8, so it is
impossible to use old algorithm to initiate all sgprs with one
threadgroup. The new algorithm firstly use three threadgroups to
initiate most sgprs simultaneously and then use another threadgroup with
4 waves to cover other uninitiated sgprs.

v2:
Add more description about the new algorithm to clear sgprs and add some
comment for shader binaries

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Add Aldebaran gws support
Harish Kasiviswanathan [Wed, 21 Apr 2021 01:09:10 +0000 (21:09 -0400)]
drm/amdkfd: Add Aldebaran gws support

v2: updated MEC FW version after validating gws with debugger

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: enable subsequent retry fault
Philip Yang [Tue, 20 Apr 2021 19:13:59 +0000 (15:13 -0400)]
drm/amdkfd: enable subsequent retry fault

After draining the stale retry fault, or failed to validate the range
to recover, have to remove the fault address from fault filter ring, to
be able to handle subsequent retry interrupt on same address. Otherwise
the retry fault will not be processed to recover until timeout passed.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: address remove from fault filter
Philip Yang [Tue, 20 Apr 2021 14:05:44 +0000 (10:05 -0400)]
drm/amdgpu: address remove from fault filter

Add interface to remove address from fault filter ring by resetting
fault ring entry key, then future vm fault on the address will be
processed to recover.

Define fault key as atomic64_t type to use atomic read/set/cmpxchg key
to protect fault ring access by interrupt handler and interrupt deferred
work for vg20. Change fault->timestamp to 48-bit to share same uint64_t
with 8-bit fault->next, it is enough for 48bit IH timestamp.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: handle stale retry fault
Philip Yang [Tue, 20 Apr 2021 01:19:57 +0000 (21:19 -0400)]
drm/amdkfd: handle stale retry fault

Retry fault interrupt maybe pending in IH ring after GPU page table
is updated to recover the vm fault, because each page of the range
generate retry fault interrupt. There is race if application unmap
range to remove and free the range first and then retry fault work
restore_pages handle the retry fault interrupt, because range can not be
found, this vm fault can not be recovered and report incorrect GPU vm
fault to application.

Before unmap to remove and free range, drain retry fault interrupt
from IH ring1 to ensure no retry fault comes after the range is removed.

Drain retry fault interrupt skip the range which is on deferred list
to remove, or the range is child range, which is split by unmap, does
not add to svms and have interval notifier.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: return IH ring drain finished if ring is empty
Philip Yang [Sun, 18 Apr 2021 02:37:21 +0000 (22:37 -0400)]
drm/amdgpu: return IH ring drain finished if ring is empty

Sometimes IH do not setup ring wptr overflow flag after wptr exceed
rptr. As a workaround, if IH rptr equals to wptr, ring is empty,
return true to indicate IH ring checkpoint is processed, IH ring drain
is finished.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: retry validation to recover range
Philip Yang [Tue, 20 Apr 2021 01:51:27 +0000 (21:51 -0400)]
drm/amdkfd: retry validation to recover range

GPU vm retry fault recover range need retry validation if

1. range is split in parallel by unmap while recover
2. range migrate to system memory and range is updated in system
memory while recover

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: fix spelling mistake in packet manager
Jonathan Kim [Mon, 26 Apr 2021 19:38:18 +0000 (15:38 -0400)]
drm/amdkfd: fix spelling mistake in packet manager

The plural of 'process' should be 'processes'.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/amdgpu/sriov disable all ip hw status by default
Jack Zhang [Tue, 27 Apr 2021 09:08:47 +0000 (17:08 +0800)]
drm/amd/amdgpu/sriov disable all ip hw status by default

Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.

The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.

Without this change, sriov tdr have gfx IB test fail.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Review-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: restructure amdgpu_vram_mgr_new
Christian König [Mon, 26 Apr 2021 08:06:13 +0000 (10:06 +0200)]
drm/amdgpu: restructure amdgpu_vram_mgr_new

Merge the two loops, loosen the restriction for big allocations.
This reduces the CPU overhead in the good case, but increases
it a bit under memory pressure.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-and-Tested-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: fix double free device pgmap resource
Philip Yang [Mon, 26 Apr 2021 18:25:37 +0000 (14:25 -0400)]
drm/amdkfd: fix double free device pgmap resource

Use devm_memunmap_pages instead of memunmap_pages to release pgmap
and remove pgmap from device action, to avoid double free pgmap when
unloading driver module.

Release device memory region if failed to create device memory pages
structure.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered"
Colin Ian King [Mon, 26 Apr 2021 12:13:04 +0000 (13:13 +0100)]
drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered"

There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>