drm/i915/gt: Allow temporary suspension of inflight requests
authorChris Wilson <chris@chris-wilson.co.uk>
Thu, 16 Jan 2020 18:47:53 +0000 (18:47 +0000)
committerChris Wilson <chris@chris-wilson.co.uk>
Thu, 16 Jan 2020 19:56:16 +0000 (19:56 +0000)
commit32ff621fd74496f0c33644125fb69ff175859b1f
treebdba6cd088a09dc15a60a1f588ddc3ecefbe20a2
parent672c368f9398042b629740cc9816e8e939eff2db
drm/i915/gt: Allow temporary suspension of inflight requests

In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/intel_engine_cs.c
drivers/gpu/drm/i915/gt/intel_engine_types.h
drivers/gpu/drm/i915/gt/intel_lrc.c
drivers/gpu/drm/i915/gt/selftest_lrc.c
drivers/gpu/drm/i915/i915_request.h