git.monstr.eu Git - linux-2.6-microblaze.git/log

drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET

This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature
comes from the value returned by this ioctl which is the offset into the
device fd which userpace uses with mmap(2).

mmap_gtt was our initial mmap_offset implementation, this extends
our CPU mmap support to allow additional fault handlers that depends on
the object's backing pages.

Note that we multiplex mmap_gtt and mmap_offset through the same ioctl,
and use the zero extending behaviour of drm to differentiate between
them, when we inspect the flags.

To support multiple mmap types on an object we need to support multiple
mmap_offsets for an object (each offset in the global device address
space corresponding to a unique instance of the object for a file + mmap
type). As we drop the simplified drm core idea of a single mmap_offset,
we need to provide replacement hooks for the dumb mmap interface as
well.

Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675
Testcase: igt/gem_mmap_offset
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk

drm/i915/perf: drop pointless static qualifier in i915_perf_add_config_ioctl()

There is no need to have the 'T *v' variable static
since new value always be assigned before use it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204010154.152396-1-maowenan@huawei.com

drm/i915: Make intel_crtc_arm_fifo_underrun() functional on gen2

Assuming intel_crtc_arm_fifo_underrun() only gets called when
there's no pending plane updates we can utilize it on gen2 by
checking the active_planes bitmask so that we only re-enable
underrun reporting if some planes are active.
i915_fifo_underrun_reset_write() seems to have the necessary
hw_done/flip_done waits in place.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-8-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915: Nuke intel_pre_disable_primary_noatomic()

Let's just inline intel_pre_disable_primary_noatomic() into
intel_plane_disable_noatomic(). The CxSR disable we can do
regardless of which plane we're disabling, and while at it we can
make the gen2 underrun w/a accurate by consulting the active_planes
bitmask.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-7-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915: Clean up the gen2 "no planes -> underrun" workaround

We have the active_planes bitmask now so use it to properly
determine when some planes are visible for the gen2 underrun
workaround.

This let's us almost eliminate intel_post_enable_primary().
The manual underrun checks we can simply move into
intel_atomic_commit_tail() since they loop over all the pipes
already. No point in repeating the checks multiple times when
there are multiple pipes in the commit.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-6-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915: Clean up intel_{pre,post}_plane_update()

Change the calling convention to just pass the state+crtc and
switch to intel_ types throughout.

We'll also do a quick s/if (old_primary_state)/if (new_primary_state)/
so that we'll be able to eliminate old_primary_state later. This
is fine since we always have either both old and new state or neither.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-5-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915: s/pipe_config/new_crtc_state/ intel_{pre,post}_plane_update()

Replace the old world 'pipe_config' variable name with the new thing.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-4-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915: Pass dev_priv to ilk_disable_lp_wm()

Get rid of another 'dev' usage by passing dev_priv instead.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-3-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915: Clean up arguments to nv12/scaler w/a funcs

Don't pass the redundant dev_priv to needs_nv12_wa() and
needs_scalerclk_wa().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127190556.1574-2-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>

drm/i915/gt: Set the PD again for Haswell

And Haswell still occasionally forgets it is meant to be using a new
page directory, so repeat ourselves a little louder.

<7> [509.919864] heartbeat rcs0 heartbeat {prio:-2147483645} not ticking
<7> [509.919895] heartbeat Awake? 8
<7> [509.919903] heartbeat Barriers?: no
<7> [509.919912] heartbeat Heartbeat: 3008 ms ago
<7> [509.919930] heartbeat Reset count: 0 (global 0)
<7> [509.919937] heartbeat Requests:
<7> [509.921008] heartbeat active  a7eb:56e1*  @ 5847ms:
<7> [509.921157] heartbeat ring->start:  0x00001000
<7> [509.921164] heartbeat ring->head:   0x00001610
<7> [509.921170] heartbeat ring->tail:   0x000023d8
<7> [509.921176] heartbeat ring->emit:   0x000023d8
<7> [509.921182] heartbeat ring->space:  0x00002570
<7> [509.921189] heartbeat ring->hwsp:   0x7fffe100
<7> [509.921197] heartbeat [head 1628, postfix 1738, tail 1750, batch 0xffffffff_ffffffff]:
<7> [509.921289] heartbeat [0000] 7a000002 00100002 00000000 00000000 7a000002 01154c1e 7ffff080 00000000
<7> [509.921299] heartbeat [0020] 11000001 00002220 ffffffff 12400001 00002220 7ffff000 00000000 11000001
<7> [509.921308] heartbeat [0040] 00002228 6e900000 7a000002 00100002 00000000 00000000 7a000002 01154c1e
<7> [509.921317] heartbeat [0060] 7ffff080 00000000 12400001 00002228 7ffff000 00000000 7a000002 00100002
<7> [509.921326] heartbeat [0080] 00000000 00000000 7a000002 01154c1e 7ffff080 00000000 7a000002 001010a1
<7> [509.921335] heartbeat [00a0] 7ffff080 00000000 04000000 11000005 00022050 00010001 00012050 00010001
<7> [509.921345] heartbeat [00c0] 0001a050 00010001 00000000 0c000000 459a110c 00000000 11000005 00022050
<7> [509.921354] heartbeat [00e0] 00010000 00012050 00010000 0001a050 00010000 12400001 0001a050 7ffff000
<7> [509.921363] heartbeat [0100] 00000000 04000001 18802100 00000000 7a000002 011050a1 7fffe100 000056e1
<7> [509.921370] heartbeat [0120] 01000000 00000000
<7> [509.921538] heartbeat MMIO base:  0x00002000
<7> [509.921682] heartbeat CCID: 0x3fa0110d
<7> [509.922342] heartbeat RING_START: 0x00001000
<7> [509.922353] heartbeat RING_HEAD:  0x00001628
<7> [509.922366] heartbeat RING_TAIL:  0x000023d8
<7> [509.922381] heartbeat RING_CTL:   0x00003001
<7> [509.922396] heartbeat RING_MODE:  0x00004000
<7> [509.922408] heartbeat RING_IMR: ffffffde
<7> [509.922421] heartbeat ACTHD:  0x00000000_30e01628
<7> [509.922434] heartbeat BBADDR: 0x00000000_00004004
<7> [509.922446] heartbeat DMA_FADDR: 0x00000000_00002800
<7> [509.922458] heartbeat IPEIR: 0x00000000
<7> [509.922470] heartbeat IPEHR: 0x780c0000
<7> [509.922642] heartbeat PP_DIR_BASE: 0x6e700000
<7> [509.922652] heartbeat PP_DIR_BASE_READ: 0x00000000
<7> [509.922662] heartbeat PP_DIR_DCLV: 0xffffffff
<7> [509.922678] heartbeat E  a7eb:56e1*  @ 5849ms:
<7> [509.922689] heartbeat E  a7eb:56e2-  @ 5849ms:
<7> [509.922698] heartbeat E  a7eb:56e3  @ 5848ms:
<7> [509.922707] heartbeat E  a7eb:56e4  @ 5848ms:
<7> [509.922715] heartbeat E  a7eb:56e5  @ 5847ms:
<7> [509.922724] heartbeat E  a7eb:56e6  @ 5846ms:
<7> [509.922735] heartbeat E  a7eb:56e7  @ 5846ms:
<7> [509.922744] heartbeat ...skipping 4 executing requests...
<7> [509.922754] heartbeat E  a7eb:56ec  @ 3010ms:
<7> [509.922796] heartbeat HWSP:
<7> [509.922807] heartbeat [0000] 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [509.922817] heartbeat [0020] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [509.922826] heartbeat *
<7> [509.922836] heartbeat [0100] 000056e0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [509.922845] heartbeat [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [509.922851] heartbeat *
<7> [509.922870] heartbeat Idle? no
<7> [509.922878] heartbeat Signals:
<7> [509.923000] heartbeat [a7eb:56e2] @ 5850ms

Here, we have a failed context restore after the PD switch, but note
that the PP_DIR_BASE register does not match the LRI in the ring.

Bump it to 8^W 4 loops, and with that Baytrail starts passing the sanity
checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203211631.3167430-1-chris@chris-wilson.co.uk

drm/i915/gem: Avoid parking the vma as we unbind

In order to avoid keeping a reference on the i915_vma (which is long
overdue!) we have to coordinate all the possible lifetimes and only use
the vma while we know it is alive. In this episode, we are reminded that
while idle, the closed vma are destroyed. So if the GT idles while we are
working with the vma, the vma itself becomes invalid.

First class i915_vma here we come, but in the meantime keep piling on
the straw.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203155032.3137263-1-chris@chris-wilson.co.uk

drm/i915/display/mst: Move DPMS_OFF call to post_disable

Moving just to simplify handling as there is no change in behavior.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202222513.337777-3-jose.souza@intel.com

drm/i915/dp: Power down sink before disable pipe/transcoder clock

Disabling pipe/transcoder clock before power down sink could cause
sink lost signal, causing it to trigger a hotplug to notify source
that link signal was lost.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202222513.337777-2-jose.souza@intel.com

drm/i915/display: Check the old state to find port sync slave

If the CRTC is going from enabled to disabled and it is a port sync
slave, it needs to check to the old state to be disabled before the
port sync master.

Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202222513.337777-1-jose.souza@intel.com

drm/i915/irq: Refactor gen11 display interrupt handling

Let's move handling and reset for gen11 display IRQs to their own
functions, similar to how we deal with GT interrupts. This will make
the top-level functions a bit easier to read and potentially make things
easier to deal with in the future if new platforms wind up needing
different display handling logic.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202171608.3361125-1-matthew.d.roper@intel.com
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>

drm/i915/gt: Track the context validity explicitly

Rather than assume if and only if the engine->default_state is not set
that the context is invalid, instead track when we know the context has
valid state -- either because we have copied the default_state or we
have completed a context switch to save the HW state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203124155.3019926-1-chris@chris-wilson.co.uk

drm/i915/execlists: Skip nested spinlock for validating pending

Only along the submission path can we guarantee that the locked request
is indeed from a foreign engine, and so the nesting of engine/rq is
permissible. On the submission tasklet (process_csb()), we may find
ourselves competing with the normal nesting of rq/engine, invalidating
our nesting. As we only use the spinlock for debug purposes, skip the
debug if we cannot acquire the spinlock for safe validation - catching
99% of the bugs is better than causing a hard lockup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: c95d31c3df1b ("drm/i915/execlists: Lock the request while validating it during promotion")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-2-chris@chris-wilson.co.uk

drm/i915/execlists: Add a couple more validity checks to assert_pending()

Check the pending request submission is valid: that it at least has a
reference for the submission and that the request is on the active list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-1-chris@chris-wilson.co.uk

drm/i915: Lift i915_vma_pin() out of intel_renderstate_emit()

Once inside a request, inside the timeline->mutex, pinning is verboten.

<4> [896.032829] ======================================================
<4> [896.032831] WARNING: possible circular locking dependency detected
<4> [896.032835] 5.4.0-rc8-CI-Patchwork_15533+ #1 Tainted: G     U
<4> [896.032838] ------------------------------------------------------
<4> [896.032841] gem_exec_parall/3720 is trying to acquire lock:
<4> [896.032844] ffff888401863270 (&kernel#2){+.+.}, at: i915_request_create+0x16/0x1c0 [i915]
<4> [896.032915]
but task is already holding lock:
<4> [896.032917] ffff8883ec1c93c0 (&vm->mutex){+.+.}, at: i915_vma_pin+0xf3/0x11c0 [i915]
<4> [896.032952]
which lock already depends on the new lock.

<4> [896.032954]
the existing dependency chain (in reverse order) is:
<4> [896.032956]
-> #1 (&vm->mutex){+.+.}:
<4> [896.032961]        __mutex_lock+0x9a/0x9d0
<4> [896.032995]        i915_vma_pin+0xf3/0x11c0 [i915]
<4> [896.033033]        intel_renderstate_emit+0xb9/0x9e0 [i915]
<4> [896.033081]        i915_gem_init+0x5a9/0xa50 [i915]
<4> [896.033112]        i915_driver_probe+0xb00/0x15f0 [i915]
<4> [896.033144]        i915_pci_probe+0x43/0x1c0 [i915]
<4> [896.033149]        pci_device_probe+0x9e/0x120
<4> [896.033154]        really_probe+0xea/0x420
<4> [896.033158]        driver_probe_device+0x10b/0x120
<4> [896.033161]        device_driver_attach+0x4a/0x50
<4> [896.033164]        __driver_attach+0x97/0x130
<4> [896.033168]        bus_for_each_dev+0x74/0xc0
<4> [896.033171]        bus_add_driver+0x142/0x220
<4> [896.033174]        driver_register+0x56/0xf0
<4> [896.033178]        do_one_initcall+0x58/0x2ff
<4> [896.033183]        do_init_module+0x56/0x1f8
<4> [896.033187]        load_module+0x243e/0x29f0
<4> [896.033190]        __do_sys_finit_module+0xe9/0x110
<4> [896.033194]        do_syscall_64+0x4f/0x210
<4> [896.033197]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [896.033200]
-> #0 (&kernel#2){+.+.}:
<4> [896.033206]        __lock_acquire+0x1328/0x15d0
<4> [896.033209]        lock_acquire+0xa7/0x1c0
<4> [896.033213]        __mutex_lock+0x9a/0x9d0
<4> [896.033255]        i915_request_create+0x16/0x1c0 [i915]
<4> [896.033287]        intel_engine_flush_barriers+0x4c/0x100 [i915]
<4> [896.033327]        ggtt_flush+0x37/0x60 [i915]
<4> [896.033366]        i915_gem_evict_something+0x46b/0x5a0 [i915]
<4> [896.033407]        i915_gem_gtt_insert+0x21d/0x6a0 [i915]
<4> [896.033449]        i915_vma_pin+0xb36/0x11c0 [i915]
<4> [896.033488]        gen6_ppgtt_pin+0xd5/0x170 [i915]
<4> [896.033523]        ring_context_pin+0x2e/0xc0 [i915]
<4> [896.033554]        __intel_context_do_pin+0x6b/0x190 [i915]
<4> [896.033591]        i915_gem_do_execbuffer+0x1814/0x26c0 [i915]
<4> [896.033627]        i915_gem_execbuffer2_ioctl+0x11b/0x460 [i915]
<4> [896.033632]        drm_ioctl_kernel+0xa7/0xf0
<4> [896.033635]        drm_ioctl+0x2e1/0x390
<4> [896.033638]        do_vfs_ioctl+0xa0/0x6f0
<4> [896.033641]        ksys_ioctl+0x35/0x60
<4> [896.033644]        __x64_sys_ioctl+0x11/0x20
<4> [896.033647]        do_syscall_64+0x4f/0x210
<4> [896.033650]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

Lift the object allocation and pin prior to the request construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202204316.2665847-1-chris@chris-wilson.co.uk

drm/i915/gem: Take runtime-pm wakeref prior to unbinding

Some machines require ACPI for runtime resume, and ACPI is quite kmalloc
happy. We cannot handle kmalloc from inside the vm->mutex, as they are
used by the shrinker, and so we must ensure the global runtime-pm is
awake prior to unbinding to avoid the potential inversion.

<4> [57.121748] ======================================================
<4> [57.121750] WARNING: possible circular locking dependency detected
<4> [57.121753] 5.4.0-rc8-CI-CI_DRM_7466+ #1 Tainted: G     U
<4> [57.121754] ------------------------------------------------------
<4> [57.121756] i915_pm_rpm/1105 is trying to acquire lock:
<4> [57.121758] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [57.121766]
but task is already holding lock:
<4> [57.121768] ffff888475a593c0 (&vm->mutex){+.+.}, at: i915_vma_unbind+0x21/0x50 [i915]
<4> [57.121868]
which lock already depends on the new lock.

<4> [57.121869]
the existing dependency chain (in reverse order) is:
<4> [57.121871]
-> #1 (&vm->mutex){+.+.}:
<4> [57.121951]        i915_gem_shrinker_taints_mutex+0xa2/0xd0 [i915]
<4> [57.122028]        i915_address_space_init+0xa9/0x170 [i915]
<4> [57.122104]        i915_ggtt_init_hw+0x47/0x130 [i915]
<4> [57.122150]        i915_driver_probe+0xbb4/0x15f0 [i915]
<4> [57.122197]        i915_pci_probe+0x43/0x1c0 [i915]
<4> [57.122202]        pci_device_probe+0x9e/0x120
<4> [57.122206]        really_probe+0xea/0x420
<4> [57.122209]        driver_probe_device+0x10b/0x120
<4> [57.122212]        device_driver_attach+0x4a/0x50
<4> [57.122214]        __driver_attach+0x97/0x130
<4> [57.122217]        bus_for_each_dev+0x74/0xc0
<4> [57.122220]        bus_add_driver+0x142/0x220
<4> [57.122222]        driver_register+0x56/0xf0
<4> [57.122226]        do_one_initcall+0x58/0x2ff
<4> [57.122230]        do_init_module+0x56/0x1f8
<4> [57.122233]        load_module+0x243e/0x29f0
<4> [57.122236]        __do_sys_finit_module+0xe9/0x110
<4> [57.122239]        do_syscall_64+0x4f/0x210
<4> [57.122242]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [57.122244]
-> #0 (fs_reclaim){+.+.}:
<4> [57.122249]        __lock_acquire+0x1328/0x15d0
<4> [57.122251]        lock_acquire+0xa7/0x1c0
<4> [57.122254]        fs_reclaim_acquire.part.117+0x24/0x30
<4> [57.122257]        __kmalloc+0x48/0x320
<4> [57.122261]        acpi_ns_internalize_name+0x44/0x9b
<4> [57.122264]        acpi_ns_get_node_unlocked+0x6b/0xd3
<4> [57.122267]        acpi_ns_get_node+0x3b/0x50
<4> [57.122271]        acpi_get_handle+0x8a/0xb4
<4> [57.122274]        acpi_has_method+0x1c/0x40
<4> [57.122278]        acpi_pci_set_power_state+0x40/0xe0
<4> [57.122281]        pci_platform_power_transition+0x3e/0x90
<4> [57.122284]        pci_set_power_state+0x83/0xf0
<4> [57.122287]        pci_restore_standard_config+0x22/0x40
<4> [57.122289]        pci_pm_runtime_resume+0x23/0xc0
<4> [57.122293]        __rpm_callback+0xb1/0x110
<4> [57.122296]        rpm_callback+0x1a/0x70
<4> [57.122299]        rpm_resume+0x50e/0x790
<4> [57.122302]        __pm_runtime_resume+0x42/0x80
<4> [57.122357]        __intel_runtime_pm_get+0x15/0x60 [i915]
<4> [57.122435]        ggtt_unbind_vma+0x24/0x60 [i915]
<4> [57.122514]        __i915_vma_unbind.part.39+0xb5/0x500 [i915]
<4> [57.122593]        i915_vma_unbind+0x2d/0x50 [i915]
<4> [57.122668]        i915_gem_object_unbind+0x11c/0x260 [i915]
<4> [57.122740]        i915_gem_object_set_cache_level+0x32/0x90 [i915]
<4> [57.122810]        i915_gem_set_caching_ioctl+0x1f7/0x2f0 [i915]
<4> [57.122815]        drm_ioctl_kernel+0xa7/0xf0
<4> [57.122818]        drm_ioctl+0x2e1/0x390
<4> [57.122822]        do_vfs_ioctl+0xa0/0x6f0
<4> [57.122825]        ksys_ioctl+0x35/0x60
<4> [57.122828]        __x64_sys_ioctl+0x11/0x20
<4> [57.122830]        do_syscall_64+0x4f/0x210
<4> [57.122833]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

Closes: https://gitlab.freedesktop.org/drm/intel/issues/711
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203101347.2836057-1-chris@chris-wilson.co.uk

drm/i915: Serialise i915_active_wait() with its retirement

As the i915_active.retire() may be running on another CPU as we detect
that the i915_active is idle, we may not wait for the retirement itself.
Wait for the remote callback by waiting for the retirement worker.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112424
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202140133.2444217-2-chris@chris-wilson.co.uk

drm/i915: Specialise i915_active.work lock classes

Similar to for i915_active.mutex, we require each class of i915_active
to have distinct lockdep chains as some, but by no means all,
i915_active are used within the shrinker and so have much more severe
usage constraints. By using a lockclass local to i915_active_init() all
i915_active workers have the same lock class, and we may generate false
positives when waiting for the i915_active. If we push the lockclass
into the caller, each class of i915_active will have distinct lockdep
chains.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202140133.2444217-1-chris@chris-wilson.co.uk

drm/i915/gem: Unbind all current vma on changing cache-level

Avoid dangerous race handling of destroyed vma by unbinding all vma
instead. Unfortunately, this stops us from trying to be clever and only
doing the minimal change required, so on first use of scanout we may
encounter an annoying stall as it transitions to a new cache level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112413
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202174310.2630302-1-chris@chris-wilson.co.uk

drm/i915/gt: Simplify rc6 w/a application

Quite simply we only need to check for prior corruption on enabling rc6
on module load and resume, so by hooking into the common entry points.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202110836.2342685-2-chris@chris-wilson.co.uk

drm/i915/gt: Use soft-rc6 for w/a protection

Now that we have soft-rc6 in place, we can use that instead of the
forcewake to disable rc6 while active; preferred by a few
microbenchmarks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202110836.2342685-1-chris@chris-wilson.co.uk

drm/i915/bios: assume vbt is 4-byte aligned into oprom

The unaligned ioread32() will make us read byte by byte looking for the
vbt. We could just as well have done a ioread8() + a shift and avoid the
extra confusion on how we are looking for "$VBT".

However when using ACPI it's guaranteed the VBT is 4-byte aligned
per spec, so we can probably assume it here as well.

v2: do not try to simplify the loop by eliminating the auxiliary counter
(Jani and Ville)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191126225110.8127-4-lucas.demarchi@intel.com

drm/i915/bios: fold pci rom map/unmap into copy function

We don't need to keep the pci rom mapped during the entire
intel_bios_init() anymore. Move it to the previous copy_vbt() function
and rename it to oprom_get_vbt() since now it's responsible to to all
operations related to get the vbt from the oprom.

v2: fix double __iomem attribute detected by sparse
v3: fix missing unmap on success (Ville)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191126225110.8127-3-lucas.demarchi@intel.com

drm/i915/bios: do not discard address space

When we map the VBT through pci_map_rom() we may not be allowed
to simply discard the address space and go on reading the memory.
That doesn't work on my test system, but by dumping the rom via
sysfs I can can get the correct vbt. So change our find_vbt() to do
the same as done by pci_read_rom(), i.e. use memcpy_fromio().

v2: the just the minimal changes by not bothering with the unaligned io
reads: this can be done on top (from Ville and Jani)

v3: drop const in function return since now we are copying the vbt,
rather than just finding it

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191126225110.8127-2-lucas.demarchi@intel.com

drm/i915/display: Suspend MST topology manager before destroy fbdev

MST topology needs to be suspended so we don't have any calls to
fbdev after it's finalized. MST will be destroyed later as part of
drm_mode_config_cleanup().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109964
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127021609.162700-1-jose.souza@intel.com

drm/i915/vbt: Parse power conservation features block

From VBT 228+ this is block that PSR and other power saving
features configuration should be read from.

v3:
Using DRRS from this new block

v4:
Using BIT()
Fixing DRRS comment in parse_power_conservation_features()

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128014852.214135-5-jose.souza@intel.com

drm/i915/psr: Check if sink PSR capability changed

eDP specification states that sink can have its PSR capability
changed, I have never found any panel doing that but lets add that
for completeness.
For now it is not reading back the PSR capabilities and if possible
re-enabling PSR, this will be added if a panel is found using this
feature.

v4:
Cleaning DP_PSR_CAPS_CHANGE

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128014852.214135-4-jose.souza@intel.com

drm/i915/psr: Enable ALPM lock timeout error interruption

When this error happens sink link is not stable after the required
FW_EXIT_LATENCY period so it will miss the selective update.
As the other PSR errors, for now we are not trying to recover from
it.

Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128014852.214135-3-jose.souza@intel.com

drm/i915/psr: Refactor psr short pulse handler

eDP spec states that when sink enconters a problem that prevents it
to keep PSR running it should set PSR status to internal error and
set the reason why it happen to PSR_ERROR_STATUS but it is not how it
was implemented.
But also I don't want to change this behavior, who knows if there is
a panel out there that only set the PSR_ERROR_STATUS.

So here refactoring the code a bit to make more easy to read what was
state above as more checks will be added to this function.

v2:
returning a int instead of a bool in psr_get_status_and_error_status()

Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128014852.214135-2-jose.souza@intel.com

drm/i915/psr: Add bits per pixel limitation

PSR2 HW only support a limited number of bits per pixel, if mode has
more than supported PSR2 should not be enabled.

BSpec: 50422
BSpec: 7713
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128014852.214135-1-jose.souza@intel.com

drm/i915/dsb: fix cmd_buf being wrongly set

The "err" label is not really "err", but rather "out" since the return
path is shared between error condition and normal path. This broke when
commit 03cea61076f0 ("drm/i915/dsb: fix extra warning on error path
handling") added a "dsb->cmd_buf = NULL;" there, making DSB to stop
working since now all writes would pass-through via mmio.

Remove the set to NULL since it's actually not needed: we only set it if
all steps are successful. While at it, rename the label so this confusion
doesn't happen again.

Fixes: 03cea61076f0 ("drm/i915/dsb: fix extra warning on error path handling")
Resolves: https://gitlab.freedesktop.org/drm/intel/issues/8
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127221119.384754-1-lucas.demarchi@intel.com

drm/i915: Stop using connector->encoder and encoder->crtc links in i915_display_info

Migrate away from the legacy encoder->crtc and connector->encoder links
in the debugfs display_info code. Other users still remain so can't kill
these off yet.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-10-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Dump both the uapi and hw states for crtcs and planes

Let's make the display info more useful by dumping both
the uapi and hw states for each crtc/plane.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-9-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Use the canonical [CRTC:%d:%s]/etc. format in i915_display_info

Use the canonical "[CRTC:%d:%s]" format for the obj id/name
in the debugfs display_info dump. Everyone should already be
familiar with the format since it's used in the debug logs
extensively.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-8-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Use drm_modeset_lock_all() in debugfs display info

Make out life easier by just grabbing all modeset locks around
the display_info dump.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-7-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Dump the mode for the crtc just the once

No point in repeating the crtc mode for each cloned encoder.
Just print it once, and avoid using multiple lines for it.
And while at let's polish the fixed mode print to fit on
one line as well.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-6-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Refactor debugfs display info code

Pull the crtc dumping stuff into a nice function so the
loop over the crtcs doesn't look like crap.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-5-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Reorganize plane/fb dump in debugfs

Eliminate the special cases for the primary and cursor planes and just
dump all the information consistently for all the planes.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-4-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Switch to intel_ types in debugfs display_info

Switch to using intel_ types in the debugfs display_info code.
Should make it easier to handle bigjoiner etc. in the future.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-3-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Use drm_rect to simplify plane {crtc,src}_{x,y,w,h} printing

Use DRM_RECT_FMT & co. to simpify the code.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129185434.25549-2-ville.syrjala@linux.intel.com
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

drm/i915: Switch intel_crtc_disable_noatomic() to intel_ types

It's hard to see what is going on when the function mixes drm_
and intel_ types. Switch to intel_ types.

v2: Deal with another use of 'intel_crtc' being introduced

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191105171447.22111-2-ville.syrjala@linux.intel.com
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Program SHPD_FILTER_CNT on CNP+

The bspec tells us 'Program SHPD_FILTER_CNT with the "500 microseconds
adjusted" value before enabling hotplug detection' on CNP+. We haven't
been touching this register at all thus far, but we should probably
follow the bspec's guidance.

The register also exists on LPT and SPT, but there isn't any specific
guidance I can find on how we should be programming it there so let's
leave it be for now.

Bspec: 4342
Bspec: 31297
Bspec: 8407
Bspec: 49305
Bspec: 50473

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127221314.575575-3-matthew.d.roper@intel.com

drm/i915/ehl: Make icp_digital_port_connected() use phy instead of port

When looking at SDEISR to determine the connection status of combo
outputs, we should use the phy index rather than the port index.
Although they're usually the same thing, EHL's DDI-D (port D) is
attached to PHY-A and SDEISR doesn't even have bits for a "D" output.
It's also possible that future platforms may map DDIs (the internal
display engine programming units) to PHYs (the output handling on the IO
side) in ways where port!=phy, so let's look at the PHY index by
default.

v2: Rename to intel_combo_phy_connected. (Lucas)

Fixes: 719d24002602 ("drm/i915/ehl: Enable DDI-D")
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127221314.575575-2-matthew.d.roper@intel.com

drm/i915: Handle SDEISR according to PCH rather than platform

The South Display is part of the PCH so we should technically be basing
our port detection logic off the PCH in use rather than the platform
generation.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127221314.575575-1-matthew.d.roper@intel.com

drm/i915: Use the correct PCH transcoder for LPT/WPT in intel_sanitize_frame_start_delay()

LPT/WPT only have PCH transcoder A. Make sure we poke at its
chicken register instead of some non-existent register when
FDI is being driven by pipe B or C.

Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128182358.14477-1-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>

drm/i915: Refactor gen6_flush_pd()

As the gen6 page directory is written on binding and after every update,
the code ended up duplicated. Refactor the code into a single routine to
share the locking and serialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191201140916.2128905-1-chris@chris-wilson.co.uk

drm/i915: Serialise access to GFX_FLSH_CNTL

Now that many threads may try to use the same mmio to flush the global
buffers after updating the PTE, serialise access to the mmio to prevent
concurrent access on gen7.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191130162320.1683424-1-chris@chris-wilson.co.uk

drm/i915/gt: Push the flush_pd before the set-context

Move our "wait for the PD load to complete" paranoia before the
MI_SET_CONTEXT just in case the context restore tries to access local
addresses.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191130120503.1609483-1-chris@chris-wilson.co.uk

drm/i915/gen7: Re-enable full-ppgtt for ivb & hsw

After much hair pulling, resort to preallocating the ppGTT entries on
init to circumvent the apparent lack of PD invalidate following the
write to PP_DCLV upon switching mm between contexts (and here the same
context after binding new objects). However, the details of that PP_DCLV
invalidate are still unknown, and it appears we need to reload the mm
twice to cover over a timing issue. Worrying.

Fixes: 3dc007fe9b2b ("drm/i915/gtt: Downgrade gen7 (ivb, byt, hsw) back to aliasing-ppgtt")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129201328.1398583-1-chris@chris-wilson.co.uk

drm/i915/selftests: Keep engine awake during live_coherency

Keep the engine awake and so avoid frequent cycling in and out of
powersaving mode to eliminate the unnecessary overhead and speed up the
testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129222702.1456292-1-chris@chris-wilson.co.uk

drm/i915/execlists: Ensure the tasklet is decoupled upon shutdown

As we only cancel the timers asynchronously, they may
still be running on another CPU as we shutdown, raising one last
softirq. So be safe and make sure the tasklet is flushed before
destroying the engine's memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129172542.1222810-1-chris@chris-wilson.co.uk

drm/i915/gem: Take timeline->mutex to walk list-of-requests

Though the context is closed and so no more requests can be added to the
timeline, retirement can still be removing requests. It can even be
removing the very request we are inspecting and so cause us to wander
into dead links.

Serialise with the retirement by taking the timeline->mutex used for
guarding the timeline->requests list.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112404
Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129151845.1092933-1-chris@chris-wilson.co.uk

drm/i915: Don't set undefined bits in dirty_pipes

skl_commit_modeset_enables() straight up compares dirty_pipes
with a bitmask of already committed pipes. If we set bits in
dirty_pipes for non-existent pipes that comparison will never
work right. So let's limit ourselves to bits that exist.

And we'll do the same for the active_pipes_changed bitmask.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011200949.7839-5-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

Revert "drm/i915: use a separate context for gpu relocs"

Since commit c45e788d95b4 ("drm/i915/tgl: Suspend pre-parser across GTT
invalidations"), we now disable the advanced preparser on Tigerlake for the
invalidation phase at the start of the batch, we no longer need to emit
the GPU relocations from a second context as they are now flushed inlined.

References: 8a9a982767b7 ("drm/i915: use a separate context for gpu relocs")
References: c45e788d95b4 ("drm/i915/tgl: Suspend pre-parser across GTT invalidations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129124846.949100-1-chris@chris-wilson.co.uk

drm/i915/selftests: Wait only on the expected barrier

Wait on only the last request on the kernel_context after emitting a
barrier so that we do not wait for everything in general and by doing so
cause an accidental emission of the barrier!

Bugzilla; https://bugs.freedesktop.org/show_bug.cgi?id=112405
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129103455.744389-1-chris@chris-wilson.co.uk

drm/i915/selftests: Always lock the drm_mm around insert/remove

Be paranoid and make sure the drm_mm is locked whenever we insert/remove
our own nodes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129095659.665381-1-chris@chris-wilson.co.uk

drm/i915/selftests: Use sgt_iter for huge_pages_free

Use the normal sgt_iter to walk the pages scatterlist on free so that we
handle the error path correctly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112225
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128232946.546831-1-chris@chris-wilson.co.uk

drm/i915/tgl: Implement Wa_1604555607

Implement Wa_1604555607 (set the DS pairing timer to 128 cycles).
FF_MODE2 is part of the register state context, that's why it is
implemented here.

At TGL A0 stepping, FF_MODE2 register read back is broken, hence
disabling the WA verification.

v2: Rebased on top of the WA refactoring (Oscar)
v3: Correctly add to ctx_workarounds_init (Michel)
v4:
  uncore read is used [Tvrtko]
  Macros as used for MASK definition [Chris]
v5:
  Skip the Wa_1604555607 verification [Ram]
  i915 ptr retrieved from engine. [Tvrtko]
v6:
  Added wa_add as a wrapper for __wa_add [Chris]
  wa_add is directly called instead of new wrapper [tvrtko]

BSpec: 19363
HSDES: 1604555607
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Ramalingam C <ramlingam.c@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v5]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128021005.3350-1-ramalingam.c@intel.com

drm/i915/selftests: Drop local vm reference!

After obtaining a local reference to the vm from the context, remember
to drop it before it goes out of scope!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128185402.110678-1-chris@chris-wilson.co.uk

drm/i915/selftests: Count the number of engines used

Don't rely on the RUNTIME_INFO() when we loop over a particular context
and only run on a filtered set of engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127223252.3777141-1-chris@chris-wilson.co.uk

drm/i915/selftests: Try to show where the pulse went

We have a case of a mysteriously absent pulse, so dump the engine
details to see if we can find out what happened to it.

References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128102546.3857140-1-chris@chris-wilson.co.uk

drm/i915/gem: Excise the per-batch whitelist from the context

One does not lightly add a new hidden struct_mutex dependency deep within
the execbuf bowels! The immediate suspicion in seeing the whitelist
cached on the context, is that it is intended to be preserved between
batches, as the kernel is quite adept at caching small allocations
itself. But no, it's sole purpose is to serialise command submission in
order to save a kmalloc on a slow, slow path!

By removing the whitelist dependency from the context, our freedom to
chop the big struct_mutex is greatly augmented.

v2: s/set_bit/__set_bit/ as the whitelist shall never be accessed
concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128113424.3885958-1-chris@chris-wilson.co.uk

drm/i915/gt: Defer breadcrumb processing to after the irq handler

The design of our interrupt handlers is that we ack the receipt of the
interrupt first, inside the critical section where the master interrupt
control is off and other cpus cannot start processing the next
interrupt; and then process the interrupt events afterwards. However,
Icelake introduced a whole new set of banked GT_IIR that are inherently
serialised and slow to retrieve the IIR and must be processed within the
critical section. We can still push our breadcrumbs out of this critical
section by using our irq_worker. On bdw+, this should not make too much
of a difference as we only slightly defer the breadcrumbs, but on icl+
this should make a big difference to our throughput of interrupts from
concurrently executing engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127115813.3345823-1-chris@chris-wilson.co.uk

drm/i915: Serialise i915_active_fence_set() with itself

The expected downside to commit 58b4c1a07ada ("drm/i915: Reduce nested
prepare_remote_context() to a trylock") was that it would need to return
-EAGAIN to userspace in order to resolve potential mutex inversion. Such
an unsightly round trip is unnecessary if we could atomically insert a
barrier into the i915_active_fence, so make it happen.

Currently, we use the timeline->mutex (or some other named outer lock)
to order insertion into the i915_active_fence (and so individual nodes
of i915_active). Inside __i915_active_fence_set, we only need then
serialise with the interrupt handler in order to claim the timeline for
ourselves.

However, if we remove the outer lock, we need to ensure the order is
intact between not only multiple threads trying to insert themselves
into the timeline, but also with the interrupt handler completing the
previous occupant. We use xchg() on insert so that we have an ordered
sequence of insertions (and each caller knows the previous fence on
which to wait, preserving the chain of all fences in the timeline), but
we then have to cmpxchg() in the interrupt handler to avoid overwriting
the new occupant. The only nasty side-effect is having to temporarily
strip off the RCU-annotations to apply the atomic operations, otherwise
the rules are much more conventional!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402
Fixes: 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context() to a trylock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk

drm/i915/gt: Manual rc6 entry upon parking

Now that we rapidly park the GT when the GPU idles, we often find
ourselves idling faster than the RC6 promotion timer. Thus if we tell
the GPU to enter RC6 manually as we park, we can do so quicker (by
around 50ms, half an EI on average) and marginally increase our
powersaving across all execlists platforms.

v2: Now with a selftest to check we can enter RC6 manually

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127095657.3209854-1-chris@chris-wilson.co.uk

drm/i915: Disable display interrupts during display IRQ handler

During the Display Interrupt Service routine the Display Interrupt
Enable bit must be disabled, The interrupts handled, then the
Display Interrupt Enable bit must be set to prevent possible missed
interrupts.

Bspec: 49212
V2: Change Title to remove SDE reference.
V3: Fix TAB spacing.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121201455.2558-1-clinton.a.taylor@intel.com

drm/i915/dp: fix DP audio for PORT_A on gen12+

Starting with gen12, PORT_A can be connected to a transcoder
with audio support. Modify the existing logic that disabled
audio on PORT_A unconditionally.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125125313.17584-1-kai.vehmanen@linux.intel.com

drm/i915: Support more QGV points

According to BSpec 53998, there is a mask of
max 8 SAGV/QGV points we need to support.

Bumping this up to keep the CI happy(currently
preventing tests to run), until all SAGV
changes land.

v2: Fix second plane where QGV points were
    hardcoded as well.

v3: Change the naming of I915_NUM_SAGV_POINTS
    to be I915_NUM_QGV_POINTS, as more meaningful
    (Ville Syrjälä)

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112189
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125160800.14740-1-stanislav.lisovskiy@intel.com
[vsyrjala: Add missing braces around else (checkpatch), fix Bugzilla tag]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Reduce nested prepare_remote_context() to a trylock

On context retiring, we may invoke the kernel_context to unpin this
context. Elsewhere, we may use the kernel_context to modify this
context. This currently leads to an AB-BA lock inversion, so we need to
back-off from the contended lock, and repeat.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111732
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: a9877da2d629 ("drm/i915/oa: Reconfigure contexts on the fly")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191126065521.2331017-1-chris@chris-wilson.co.uk

drm/i915: Default to a more lenient forced preemption timeout

Based on a sampling of a number of benchmarks across platforms, by
default opt for a much more lenient timeout so that we should not
adversely affect existing "good" clients.

640ms ought to be enough for anyone.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112169
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125162737.2161069-1-chris@chris-wilson.co.uk

drm/i915/selftests: Move mock_vma to the heap to reduce stack_frame

An i915_vma struct on the stack may push the frame over the limit, if
set conservatively, so move it to the heap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125124856.1761176-1-chris@chris-wilson.co.uk

drm/i915/gt: Schedule request retirement when timeline idles

The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX
corruption WA") is that it disables RC6 while Skylake (and friends) is
active, and we do not consider the GPU idle until all outstanding
requests have been retired and the engine switched over to the kernel
context. If userspace is idle, this task falls onto our background idle
worker, which only runs roughly once a second, meaning that userspace has
to have been idle for a couple of seconds before we enable RC6 again.
Naturally, this causes us to consume considerably more energy than
before as powersaving is effectively disabled while a display server
(here's looking at you Xorg) is running.

As execlists will get a completion event as each context is completed,
we can use this interrupt to queue a retire worker bound to this engine
to cleanup idle timelines. We will then immediately notice the idle
engine (without userspace intervention or the aid of the background
retire worker) and start parking the GPU. Thus during light workloads,
we will do much more work to idle the GPU faster... Hopefully with
commensurate power saving!

v2: Watch context completions and only look at those local to the engine
when retiring to reduce the amount of excess work we perform.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk

drm/i915/gt: Adapt engine_park synchronisation rules for engine_retire

In the next patch, we will introduce a new asynchronous retirement
worker, fed by execlists CS events. Here we may queue a retirement as
soon as a request is submitted to HW (and completes instantly), and we
also want to process that retirement as early as possible and cannot
afford to postpone (as there may not be another opportunity to retire it
for a few seconds). To allow the new async retirer to run in parallel
with our submission, pull the __i915_request_queue (that passes the
request to HW) inside the timelines spinlock so that the retirement
cannot release the timeline before we have completed the submission.

v2: Actually to play nicely with engine_retire, we have to raise the
timeline.active_lock before releasing the HW. intel_gt_retire_requsts()
is still serialised by the outer lock so they cannot see this
intermediate state, and engine_retire is serialised by HW submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-2-chris@chris-wilson.co.uk

drm/i915: Serialise with engine-pm around requests on the kernel_context

As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk

drm/i915/execlists: Fixup cancel_port_requests()

I rushed a last minute correction to cancel_port_requests() to prevent
the snooping of *execlists->active as the inflight array was being
updated, without noticing we iterated the inflight array starting from
active! Oops.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387
Fixes: 331bf9059157 ("drm/i915/gt: Mark the execlists->active as the primary volatile access")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk

drm/i915/gt: Mark the execlists->active as the primary volatile access

Since we want to do a lockless read of the current active request, and
that request is written to by process_csb also without serialisation, we
need to instruct gcc to take care in reading the pointer itself.

Otherwise, we have observed execlists_active() to report 0x40.

[ 2400.760381] igt/para-4098    1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4
[ 2400.760826] igt/para-4098    1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000
[ 2400.761271] igt/para-4098    1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 }
[ 2400.761726] igt/para-4097    0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940

which is impossible!

The answer is that as we keep the existing execlists->active pointing
into the array as we copy over that array, the unserialised read may see
a partial pointer value.

Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk

drm/i915: Switch kunmap() to take the page not vaddr

On converting from kunmap_atomic() to kunamp() one must remember the
latter takes the struct page, the former the vaddr.

Fixes: 48715f700174 ("drm/i915: Avoid atomic context for error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125091409.1630385-1-chris@chris-wilson.co.uk

drm/i915/selftests: Include the subsubtest name for live_parallel_engines

Include the name of the failing subsubtest, should it fails.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191123191547.925360-1-chris@chris-wilson.co.uk

drm/i915/query: Align flavour of engine data lookup

Commit 750e76b4f9f6 ("drm/i915/gt: Move the [class][inst] lookup for
engines onto the GT") changed the engine query to iterate over uabi
engines but left the buffer size calculation look at the physical engine
count. Difference has no practical consequence but it is nicer to align
both queries.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 750e76b4f9f6 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122104115.29610-1-tvrtko.ursulin@linux.intel.com

drm/i915: coffeelake supports hdcp2.2

This includes other platforms that utilize the same gen graphics as
CFL: AML, WHL and CML.

Signed-off-by: Juston Li <juston.li@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011181918.29618-1-juston.li@intel.com

drm/i915/selftests: Flush the active callbacks

Before checking the current i915_active state for the asynchronous work
we submitted, flush any ongoing callback. This ensures that our sampling
is robust and does not sporadically fail due to bad timing as the work
is running on another cpu.

v2: Drop the fence callback sync, retiring under the lock should be good
enough to synchronize with engine_retire() and the
intel_gt_retire_requests() background worker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122132404.690440-1-chris@chris-wilson.co.uk

drm/i915/selftests: Force bonded submission to overlap

Bonded request submission is designed to allow requests to execute in
parallel as laid out by the user. If the master request is already
finished before its bonded pair is submitted, the pair were not destined
to run in parallel and we lose the information about the master engine
to dictate selection of the secondary. If the second request was
required to be run on a particular engine in a virtual set, that should
have been specified, rather than left to the whims of a random
unconnected requests!

In the selftest, I made the mistake of not ensuring the master would
overlap with its bonded pairs, meaning that it could indeed complete
before we submitted the bonds. Those bonds were then free to select any
available engine in their virtual set, and not the one expected by the
test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122112152.660743-1-chris@chris-wilson.co.uk

drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request

As we start peeking into requests for longer and longer, e.g.
incorporating use of spinlocks when only protected by an
rcu_read_lock(), we need to be careful in how we reset the request when
recycling and need to preserve any barriers that may still be in use as
the request is reset for reuse.

Quoting Linus Torvalds:

> If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?

  .. because the object can be accessed (by RCU) after the refcount has
  gone down to zero, and the thing has been released.

  That's the whole and only point of SLAB_TYPESAFE_BY_RCU.

  That flag basically says:

  "I may end up accessing this object *after* it has been free'd,
  because there may be RCU lookups in flight"

  This has nothing to do with constructors. It's ok if the object gets
  reused as an object of the same type and does *not* get
  re-initialized, because we're perfectly fine seeing old stale data.

  What it guarantees is that the slab isn't shared with any other kind
  of object, _and_ that the underlying pages are free'd after an RCU
  quiescent period (so the pages aren't shared with another kind of
  object either during an RCU walk).

  And it doesn't necessarily have to have a constructor, because the
  thing that a RCU walk will care about is

    (a) guaranteed to be an object that *has* been on some RCU list (so
    it's not a "new" object)

    (b) the RCU walk needs to have logic to verify that it's still the
    *same* object and hasn't been re-used as something else.

  In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
  immediately, but because it gets reused as the same kind of object,
  the RCU walker can "know" what parts have meaning for re-use, in a way
  it couidn't if the re-use was random.

  That said, it *is* subtle, and people should be careful.

> So the re-use might initialize the fields lazily, not necessarily using a ctor.

  If you have a well-defined refcount, and use "atomic_inc_not_zero()"
  to guard the speculative RCU access section, and use
  "atomic_dec_and_test()" in the freeing section, then you should be
  safe wrt new allocations.

  If you have a completely new allocation that has "random stale
  content", you know that it cannot be on the RCU list, so there is no
  speculative access that can ever see that random content.

  So the only case you need to worry about is a re-use allocation, and
  you know that the refcount will start out as zero even if you don't
  have a constructor.

  So you can think of the refcount itself as always having a zero
  constructor, *BUT* you need to be careful with ordering.

  In particular, whoever does the allocation needs to then set the
  refcount to a non-zero value *after* it has initialized all the other
  fields. And in particular, it needs to make sure that it uses the
  proper memory ordering to do so.

  NOTE! One thing to be very worried about is that re-initializing
  whatever RCU lists means that now the RCU walker may be walking on the
  wrong list so the walker may do the right thing for this particular
  entry, but it may miss walking *other* entries. So then you can get
  spurious lookup failures, because the RCU walker never walked all the
  way to the end of the right list. That ends up being a much more
  subtle bug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk

drm/i915/selftests: Shorten infinite wait for sseu

Use our more regular igt_flush_test() to bind the wait-for-idle and
error out instead of waiting around forever on critical failure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121233021.507400-1-chris@chris-wilson.co.uk

drm/i915/selftests: Always hold a reference on a waited upon request

Whenever we wait on a request, make sure we actually hold a reference to
it and that it cannot be retired/freed on another CPU!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-4-chris@chris-wilson.co.uk

drm/i915: Mark intel_wakeref_get() as a sleeper

Assume that intel_wakeref_get() may take the mutex, and perform other
sleeping actions in the course of its callbacks and so use might_sleep()
to ensure that all callers abide. Anything that cannot sleep has to use
e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the
non-atomic paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121130528.309474-1-chris@chris-wilson.co.uk

drm/i915/execlists: Lock the request while validating it during promotion

Since the request is already on the HW as we perform its validation, it
and even its subsequent barrier may be concurrently retired before we
process the assertions. If it is retired already and so off the HW, our
assertions become void and we need to ignore them.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112363
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121103546.146487-1-chris@chris-wilson.co.uk

drm/i915/gt: Hold request reference while waiting for w/a verification

As we wait upon a request, we must be holding a reference to it, and be
wary that i915_request_add() consumes the passed in reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121093326.134774-1-chris@chris-wilson.co.uk

drm/i915: Serialise with remote retirement

Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.

Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk

Revert "drm/i915/gt: Wait for new requests in intel_gt_retire_requests()"

From inside an active timeline in the execbuf ioctl, we may try to
reclaim some space in the GGTT. We need GGTT space for all objects on
!full-ppgtt platforms, and for context images everywhere. However, to
free up space in the GGTT we may need to remove some pinned objects
(e.g. context images) that require flushing the idle barriers to remove.
For this we use the big hammer of intel_gt_wait_for_idle()

However, commit 7936a22dd466 ("drm/i915/gt: Wait for new requests in
intel_gt_retire_requests()") will continue spinning on the wait if a
timeline is active but lacks requests, as is the case during execbuf
reservation. Spinning forever is quite time consuming, so revert that
commit and start again.

In practice, the effect commit 7936a22dd466 was trying to achieve is
accomplished by commit 1683d24c1470 ("drm/i915/gt: Move new timelines
to the end of active_list"), so there is no immediate rush to replace
the looping.

Testcase: igt/gem_exec_reloc/basic-range
Fixes: 7936a22dd466 ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()")
References: 1683d24c1470 ("drm/i915/gt: Move new timelines to the end of active_list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-1-chris@chris-wilson.co.uk

drm/i915: Use intel_gt_pm_put_async in GuC submission path

GuC submission path can be called from an interrupt context
and so should use a worker to avoid holding a mutex.

References: 07779a76ee1f ("drm/i915: Mark up the calling context for intel_wakeref_put()")
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120211321.88021-1-stuart.summers@intel.com

drm/i915/gt: Fixup config ifdeffery for pm_suspend_target_state

pm_suspend_target_state is declared under CONFIG_PM_SLEEP but only
defined under CONFIG_SUSPEND. Play safe and only use the symbol if it is
both declared and defined.

Reported-by: kbuild-all@lists.01.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: a70a9e998e8e ("drm/i915: Defer rc6 shutdown to suspend_late")
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120182209.3967833-1-chris@chris-wilson.co.uk

drm/i915/gt: Declare timeline.lock to be irq-free

Now that we never allow the intel_wakeref callbacks to be invoked from
interrupt context, we do not need the irqsafe spinlock for the timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120170858.3965380-1-chris@chris-wilson.co.uk

drm/i915/gt: Unlock engine-pm after queuing the kernel context switch

In commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to
the backend"), I erroneously concluded that we last modify the engine
inside __i915_request_commit() meaning that we could enable concurrent
submission for userspace as we enqueued this request. However, this
falls into a trap with other users of the engine->kernel_context waking
up and submitting their request before the idle-switch is queued, with
the result that the kernel_context is executed out-of-sequence most
likely upsetting the GPU and certainly ourselves when we try to retire
the out-of-sequence requests.

As such we need to hold onto the effective engine->kernel_context mutex
lock (via the engine pm mutex proxy) until we have finish queuing the
request to the engine.

v2: Serialise against concurrent intel_gt_retire_requests()
v3: Describe the hairy locking scheme with intel_gt_retire_requests()
for future reference.
v4: Combine timeline->lock and engine pm release; it's hairy.

Fixes: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-2-chris@chris-wilson.co.uk

drm/i915/gt: Close race between engine_park and intel_gt_retire_requests

The general concept was that intel_timeline.active_count was locked by
the intel_timeline.mutex. The exception was for power management, where
the engine->kernel_context->timeline could be manipulated under the
global wakeref.mutex.

This was quite solid, as we always manipulated the timeline only while
we held an engine wakeref.

And then we started retiring requests outside of struct_mutex, only
using the timelines.active_list and the timeline->mutex. There we
started manipulating intel_timeline.active_count outside of an engine
wakeref, and so introduced a race between __engine_park() and
intel_gt_retire_requests(), a race that could result in the
engine->kernel_context not being added to the active timelines and so
losing requests, which caused us to keep the system permanently powered
up [and unloadable].

The race would be easy to close if we could take the engine wakeref for
the timeline before we retire -- except timelines are not bound to any
engine and so we would need to keep all active engines awake. The
alternative is to guard intel_timeline_enter/intel_timeline_exit for use
outside of the timeline->mutex.

Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk

drm/i915: Mark up the calling context for intel_wakeref_put()

Previously, we assumed we could use mutex_trylock() within an atomic
context, falling back to a worker if contended. However, such trickery
is illegal inside interrupt context, and so we need to always use a
worker under such circumstances. As we normally are in process context,
we can typically use a plain mutex, and only defer to a work when we
know we are being called from an interrupt path.

Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
References: https://bugs.freedesktop.org/show_bug.cgi?id=111626
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk