drm/i915/execlists: Offline error capture
authorChris Wilson <chris@chris-wilson.co.uk>
Thu, 16 Jan 2020 18:47:54 +0000 (18:47 +0000)
committerChris Wilson <chris@chris-wilson.co.uk>
Thu, 16 Jan 2020 19:56:17 +0000 (19:56 +0000)
commit748317386afb235e11616098d2c7772e49776b58
tree64e9a71ab3007872de71095060ad09f242bed2bc
parent32ff621fd74496f0c33644125fb69ff175859b1f
drm/i915/execlists: Offline error capture

Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/intel_lrc.c