habanalabs: Fix reset upon device release bug
authorfarah kassabri <fkassabri@habana.ai>
Thu, 17 Feb 2022 14:15:26 +0000 (16:15 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Mon, 28 Feb 2022 12:22:06 +0000 (14:22 +0200)
commita78b07dcae2f9d6fafadb05540d8152f575d7e59
tree109786d8612ad31b375addcfa85382b5f38035b1
parente8458e20e0a3c426ed5ed3ce590c05718c8b8e8e
habanalabs: Fix reset upon device release bug

In case user application was interrupted while some cs still in-flight
or in the middle of completion handling in driver, the
last refcount of the kernel private data for the user process
will not be put in the fd close flow, but in the cs completion
workqueue context.

This means that the device reset-upon-device-release will be called
from that context. During the reset flow, the driver flushes all the cs
workqueue to ensure that any scheduled work has run to completion,
and since we are running from the completion context we will
have deadlock.

Therefore, we need to skip flushing the workqueue in those cases.
It is safe to do it because the user won't be able to release the device
unless the workqueues are already empty.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/misc/habanalabs/common/command_submission.c
drivers/misc/habanalabs/common/device.c
drivers/misc/habanalabs/common/habanalabs.h