drm/xe: fix UAF around queue destruction
authorMatthew Auld <matthew.auld@intel.com>
Mon, 23 Sep 2024 14:56:48 +0000 (15:56 +0100)
committerMatthew Auld <matthew.auld@intel.com>
Thu, 26 Sep 2024 13:30:30 +0000 (14:30 +0100)
commit861108666cc0e999cffeab6aff17b662e68774e3
treecffe863328eccc703ed4d78bae2fb53cf44aa6ed
parentd28af0b6b9580b9f90c265a7da0315b0ad20bbfd
drm/xe: fix UAF around queue destruction

We currently do stuff like queuing the final destruction step on a
random system wq, which will outlive the driver instance. With bad
timing we can teardown the driver with one or more work workqueue still
being alive leading to various UAF splats. Add a fini step to ensure
user queues are properly torn down. At this point GuC should already be
nuked so queue itself should no longer be referenced from hw pov.

v2 (Matt B)
 - Looks much safer to use a waitqueue and then just wait for the
   xa_array to become empty before triggering the drain.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2317
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923145647.77707-2-matthew.auld@intel.com
drivers/gpu/drm/xe/xe_device.c
drivers/gpu/drm/xe/xe_device_types.h
drivers/gpu/drm/xe/xe_guc_submit.c
drivers/gpu/drm/xe/xe_guc_types.h