habanalabs: reset device if still in use when released
authorTomer Tayar <ttayar@habana.ai>
Wed, 9 Nov 2022 16:08:38 +0000 (18:08 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Wed, 23 Nov 2022 14:13:48 +0000 (16:13 +0200)
If the device file is released while a context is still held, it won't
be possible to reopen it until the context is eventually released.
If that doesn't happen, only a device reset will revert it back to an
operational state, i.e. need to wait for a CS timeout or an error, or to
wait for an external intervention of injecting a reset via sysfs.

At this stage, after the device was released by user, context is held
either because of CS which were left running on the device and are not
relevant anymore, or due to missing cleanup steps from user side.

All of this is in any case handled in the device reset flow, so initiate
the reset at this point instead of waiting for it.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/misc/habanalabs/common/device.c

index 708db0f..49640c8 100644 (file)
@@ -504,9 +504,10 @@ static int hl_device_release(struct inode *inode, struct file *filp)
 
        hdev->compute_ctx_in_release = 1;
 
-       if (!hl_hpriv_put(hpriv))
-               dev_notice(hdev->dev,
-                       "User process closed FD but device still in use\n");
+       if (!hl_hpriv_put(hpriv)) {
+               dev_notice(hdev->dev, "User process closed FD but device still in use\n");
+               hl_device_reset(hdev, HL_DRV_RESET_HARD);
+       }
 
        hdev->last_open_session_duration_jif =
                jiffies - hdev->last_successful_open_jif;