habanalabs: soft-reset device if context-switch fails
authorOded Gabbay <oded.gabbay@gmail.com>
Thu, 28 Feb 2019 08:46:21 +0000 (10:46 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 28 Feb 2019 12:07:52 +0000 (13:07 +0100)
commitaf5f7eea45e1b177db961c4706625f4cf545c063
treea59b3f71bb2aa0dbd1f7e39129f5e30b75a11a4e
parentefaa281219fd37cb1ee5cdef483aa67a16b0a087
habanalabs: soft-reset device if context-switch fails

This patch fix a bug in the driver, where if the TPC or MME remains in
non-IDLE even after all the command submissions are done (due to user bug
or malicious user), then future command submissions will fail in the
context-switch stage and the driver will remain in "stuck" mode.

The fix is to do a soft-reset of the device in case the context-switch
fails, because the device should be IDLE during context-switch. If it is
not IDLE, then something is wrong and we should reset the compute engines.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/misc/habanalabs/command_submission.c
drivers/misc/habanalabs/goya/goya.c