summaryrefslogtreecommitdiffstats
path: root/Documentation/ABI/testing
diff options
context:
space:
mode:
authorYuri Nudelman <ynudelman@habana.ai>2021-06-06 10:28:51 +0300
committerOded Gabbay <ogabbay@kernel.org>2021-08-29 09:47:46 +0300
commit938b793fdede518e10c67dc1dcfba1804faf98a6 (patch)
tree640575a63c1b0c204cdbacab5f2d0bb1c4cce3d7 /Documentation/ABI/testing
parente79e745b208b3c709cc5e689e587d04a4c0fe1bb (diff)
downloadlinux-stable-938b793fdede518e10c67dc1dcfba1804faf98a6.tar.gz
linux-stable-938b793fdede518e10c67dc1dcfba1804faf98a6.tar.bz2
linux-stable-938b793fdede518e10c67dc1dcfba1804faf98a6.zip
habanalabs: expose state dump
To improve the user's ability to debug the case where a workload that is part of executing training/inference of a topology is getting stuck, we need to add a 'core dump' each time a CS times-out. The 'core dump' shall contain all relevant Sync Manager information and corresponding fence values. The most recent dumps shall be accessible via debugfs, under 'state_dump' node. Reading from the node will provide the oldest dump available. Writing an integer value X will discard X dumps, starting with the oldest one, i.e. subsequent read will now return newer dumps. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Diffstat (limited to 'Documentation/ABI/testing')
-rw-r--r--Documentation/ABI/testing/debugfs-driver-habanalabs11
1 files changed, 11 insertions, 0 deletions
diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs
index a5c28f606865..e29156511388 100644
--- a/Documentation/ABI/testing/debugfs-driver-habanalabs
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs
@@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of
"0" means device will be reset in case some CS has timed out,
otherwise it will not be reset.
+What: /sys/kernel/debug/habanalabs/hl<n>/state_dump
+Date: Oct 2021
+KernelVersion: 5.15
+Contact: ynudelman@habana.ai
+Description: Gets the state dump occurring on a CS timeout or failure.
+ State dump is used for debug and is created each time in case of
+ a problem in a CS execution, before reset.
+ Reading from the node returns the newest state dump available.
+ Writing an integer X discards X state dumps, so that the
+ next read would return X+1-st newest state dump.
+
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
Date: Mar 2020
KernelVersion: 5.6