diff options
author | Steven Rostedt <rostedt@goodmis.org> | 2024-08-22 21:39:07 -0400 |
---|---|---|
committer | Steven Rostedt (Google) <rostedt@goodmis.org> | 2024-08-26 13:54:08 -0400 |
commit | 2fcd5aff92aab479a9a89cfce2dbc9c6a9455b4f (patch) | |
tree | 492e95a2adc00b821118544feccb6e9366a6baad /Documentation/trace | |
parent | ef2bd81d0c95616fab718738be48d7cc9b23e33d (diff) | |
download | linux-stable-2fcd5aff92aab479a9a89cfce2dbc9c6a9455b4f.tar.gz linux-stable-2fcd5aff92aab479a9a89cfce2dbc9c6a9455b4f.tar.bz2 linux-stable-2fcd5aff92aab479a9a89cfce2dbc9c6a9455b4f.zip |
tracing/Documentation: Start a document on how to debug with tracing
Add a new document Documentation/trace/debugging.rst that will hold
various ways to debug tracing.
This initial version mentions trace_printk and how to create persistent
buffers that can last across bootups.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Jonathan Corbet" <corbet@lwn.net>
Link: https://lore.kernel.org/20240823014019.702433486@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/debugging.rst | 159 |
1 files changed, 159 insertions, 0 deletions
diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst new file mode 100644 index 000000000000..54fb16239d70 --- /dev/null +++ b/Documentation/trace/debugging.rst @@ -0,0 +1,159 @@ +============================== +Using the tracer for debugging +============================== + +Copyright 2024 Google LLC. + +:Author: Steven Rostedt <rostedt@goodmis.org> +:License: The GNU Free Documentation License, Version 1.2 + (dual licensed under the GPL v2) + +- Written for: 6.12 + +Introduction +------------ +The tracing infrastructure can be very useful for debugging the Linux +kernel. This document is a place to add various methods of using the tracer +for debugging. + +First, make sure that the tracefs file system is mounted:: + + $ sudo mount -t tracefs tracefs /sys/kernel/tracing + + +Using trace_printk() +-------------------- + +trace_printk() is a very lightweight utility that can be used in any context +inside the kernel, with the exception of "noinstr" sections. It can be used +in normal, softirq, interrupt and even NMI context. The trace data is +written to the tracing ring buffer in a lockless way. To make it even +lighter weight, when possible, it will only record the pointer to the format +string, and save the raw arguments into the buffer. The format and the +arguments will be post processed when the ring buffer is read. This way the +trace_printk() format conversions are not done during the hot path, where +the trace is being recorded. + +trace_printk() is meant only for debugging, and should never be added into +a subsystem of the kernel. If you need debugging traces, add trace events +instead. If a trace_printk() is found in the kernel, the following will +appear in the dmesg:: + + ********************************************************** + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** + ** ** + ** trace_printk() being used. Allocating extra memory. ** + ** ** + ** This means that this is a DEBUG kernel and it is ** + ** unsafe for production use. ** + ** ** + ** If you see this message and you are not debugging ** + ** the kernel, report this immediately to your vendor! ** + ** ** + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** + ********************************************************** + +Debugging kernel crashes +------------------------ +There is various methods of acquiring the state of the system when a kernel +crash occurs. This could be from the oops message in printk, or one could +use kexec/kdump. But these just show what happened at the time of the crash. +It can be very useful in knowing what happened up to the point of the crash. +The tracing ring buffer, by default, is a circular buffer than will +overwrite older events with newer ones. When a crash happens, the content of +the ring buffer will be all the events that lead up to the crash. + +There are several kernel command line parameters that can be used to help in +this. The first is "ftrace_dump_on_oops". This will dump the tracing ring +buffer when a oops occurs to the console. This can be useful if the console +is being logged somewhere. If a serial console is used, it may be prudent to +make sure the ring buffer is relatively small, otherwise the dumping of the +ring buffer may take several minutes to hours to finish. Here's an example +of the kernel command line:: + + ftrace_dump_on_oops trace_buf_size=50K + +Note, the tracing buffer is made up of per CPU buffers where each of these +buffers is broken up into sub-buffers that are by default PAGE_SIZE. The +above trace_buf_size option above sets each of the per CPU buffers to 50K, +so, on a machine with 8 CPUs, that's actually 400K total. + +Persistent buffers across boots +------------------------------- +If the system memory allows it, the tracing ring buffer can be specified at +a specific location in memory. If the location is the same across boots and +the memory is not modified, the tracing buffer can be retrieved from the +following boot. There's two ways to reserve memory for the use of the ring +buffer. + +The more reliable way (on x86) is to reserve memory with the "memmap" kernel +command line option and then use that memory for the trace_instance. This +requires a bit of knowledge of the physical memory layout of the system. The +advantage of using this method, is that the memory for the ring buffer will +always be the same:: + + memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M + +The memmap above reserves 12 megabytes of memory at the physical memory +location 0x284500000. Then the trace_instance option will create a trace +instance "boot_map" at that same location with the same amount of memory +reserved. As the ring buffer is broke up into per CPU buffers, the 12 +megabytes will be broken up evenly between those CPUs. If you have 8 CPUs, +each per CPU ring buffer will be 1.5 megabytes in size. Note, that also +includes meta data, so the amount of memory actually used by the ring buffer +will be slightly smaller. + +Another more generic but less robust way to allocate a ring buffer mapping +at boot is with the "reserve_mem" option:: + + reserve_mem=12M:4096:trace trace_instance=boot_map@trace + +The reserve_mem option above will find 12 megabytes that are available at +boot up, and align it by 4096 bytes. It will label this memory as "trace" +that can be used by later command line options. + +The trace_instance option creates a "boot_map" instance and will use the +memory reserved by reserve_mem that was labeled as "trace". This method is +more generic but may not be as reliable. Due to KASLR, the memory reserved +by reserve_mem may not be located at the same location. If this happens, +then the ring buffer will not be from the previous boot and will be reset. + +Sometimes, by using a larger alignment, it can keep KASLR from moving things +around in such a way that it will move the location of the reserve_mem. By +using a larger alignment, you may find better that the buffer is more +consistent to where it is placed:: + + reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace + +On boot up, the memory reserved for the ring buffer is validated. It will go +through a series of tests to make sure that the ring buffer contains valid +data. If it is, it will then set it up to be available to read from the +instance. If it fails any of the tests, it will clear the entire ring buffer +and initialize it as new. + +The layout of this mapped memory may not be consistent from kernel to +kernel, so only the same kernel is guaranteed to work if the mapping is +preserved. Switching to a different kernel version may find a different +layout and mark the buffer as invalid. + +Using trace_printk() in the boot instance +----------------------------------------- +By default, the content of trace_printk() goes into the top level tracing +instance. But this instance is never preserved across boots. To have the +trace_printk() content, and some other internal tracing go to the preserved +buffer (like dump stacks), either set the instance to be the trace_printk() +destination from the kernel command line, or set it after boot up via the +trace_printk_dest option. + +After boot up:: + + echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest + +From the kernel command line:: + + reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace + +If setting it from the kernel command line, it is recommended to also +disable tracing with the "traceoff" flag, and enable tracing after boot up. +Otherwise the trace from the most recent boot will be mixed with the trace +from the previous boot, and may make it confusing to read. |