habanalabs: expose state dump
authorYuri Nudelman <ynudelman@habana.ai>
Sun, 6 Jun 2021 07:28:51 +0000 (10:28 +0300)
committerOded Gabbay <ogabbay@kernel.org>
Sun, 29 Aug 2021 06:47:46 +0000 (09:47 +0300)
commit938b793fdede518e10c67dc1dcfba1804faf98a6
tree640575a63c1b0c204cdbacab5f2d0bb1c4cce3d7
parente79e745b208b3c709cc5e689e587d04a4c0fe1bb
habanalabs: expose state dump

To improve the user's ability to debug the case where a workload that
is part of executing training/inference of a topology is getting stuck,
we need to add a 'core dump' each time a CS times-out. The 'core dump'
shall contain all relevant Sync Manager information and corresponding
fence values.

The most recent dumps shall be accessible via debugfs, under
'state_dump' node. Reading from the node will provide the oldest dump
available. Writing an integer value X will discard X dumps, starting
with the oldest one, i.e. subsequent read will now return newer
dumps.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Documentation/ABI/testing/debugfs-driver-habanalabs
drivers/misc/habanalabs/common/Makefile
drivers/misc/habanalabs/common/command_submission.c
drivers/misc/habanalabs/common/debugfs.c
drivers/misc/habanalabs/common/device.c
drivers/misc/habanalabs/common/habanalabs.h
drivers/misc/habanalabs/common/state_dump.c [new file with mode: 0644]
drivers/misc/habanalabs/gaudi/gaudi.c
drivers/misc/habanalabs/goya/goya.c