diff options
-rw-r--r-- | Documentation/trace/ftrace.txt | 508 | ||||
-rw-r--r-- | kernel/trace/ftrace.c | 6 | ||||
-rw-r--r-- | kernel/trace/trace.c | 142 | ||||
-rw-r--r-- | kernel/trace/trace_kprobe.c | 9 | ||||
-rw-r--r-- | kernel/trace/trace_stack.c | 6 | ||||
-rw-r--r-- | tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc | 36 | ||||
-rw-r--r-- | tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc | 28 | ||||
-rw-r--r-- | tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc | 4 |
8 files changed, 605 insertions, 134 deletions
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index fff8ff6d4893..d4601df6e72e 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -5,10 +5,11 @@ Copyright 2008 Red Hat Inc. Author: Steven Rostedt <srostedt@redhat.com> License: The GNU Free Documentation License, Version 1.2 (dual licensed under the GPL v2) -Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, - John Kacur, and David Teigland. +Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, + John Kacur, and David Teigland. Written for: 2.6.28-rc2 Updated for: 3.10 +Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt Introduction ------------ @@ -26,9 +27,11 @@ a task is woken to the task is actually scheduled in. One of the most common uses of ftrace is the event tracing. Through out the kernel is hundreds of static event points that -can be enabled via the debugfs file system to see what is +can be enabled via the tracefs file system to see what is going on in certain parts of the kernel. +See events.txt for more information. + Implementation Details ---------------------- @@ -39,34 +42,47 @@ See ftrace-design.txt for details for arch porters and such. The File System --------------- -Ftrace uses the debugfs file system to hold the control files as +Ftrace uses the tracefs file system to hold the control files as well as the files to display output. -When debugfs is configured into the kernel (which selecting any ftrace -option will do) the directory /sys/kernel/debug will be created. To mount +When tracefs is configured into the kernel (which selecting any ftrace +option will do) the directory /sys/kernel/tracing will be created. To mount this directory, you can add to your /etc/fstab file: - debugfs /sys/kernel/debug debugfs defaults 0 0 + tracefs /sys/kernel/tracing tracefs defaults 0 0 Or you can mount it at run time with: - mount -t debugfs nodev /sys/kernel/debug + mount -t tracefs nodev /sys/kernel/tracing For quicker access to that directory you may want to make a soft link to it: - ln -s /sys/kernel/debug /debug + ln -s /sys/kernel/tracing /tracing + + *** NOTICE *** + +Before 4.1, all ftrace tracing control files were within the debugfs +file system, which is typically located at /sys/kernel/debug/tracing. +For backward compatibility, when mounting the debugfs file system, +the tracefs file system will be automatically mounted at: + + /sys/kernel/debug/tracing -Any selected ftrace option will also create a directory called tracing -within the debugfs. The rest of the document will assume that you are in -the ftrace directory (cd /sys/kernel/debug/tracing) and will only concentrate -on the files within that directory and not distract from the content with -the extended "/sys/kernel/debug/tracing" path name. +All files located in the tracefs file system will be located in that +debugfs file system directory as well. + + *** NOTICE *** + +Any selected ftrace option will also create the tracefs file system. +The rest of the document will assume that you are in the ftrace directory +(cd /sys/kernel/tracing) and will only concentrate on the files within that +directory and not distract from the content with the extended +"/sys/kernel/tracing" path name. That's it! (assuming that you have ftrace configured into your kernel) -After mounting debugfs, you can see a directory called -"tracing". This directory contains the control and output files +After mounting tracefs you will have access to the control and output files of ftrace. Here is a list of some of the key files: @@ -92,10 +108,20 @@ of ftrace. Here is a list of some of the key files: writing to the ring buffer, the tracing overhead may still be occurring. + The kernel function tracing_off() can be used within the + kernel to disable writing to the ring buffer, which will + set this file to "0". User space can re-enable tracing by + echoing "1" into the file. + + Note, the function and event trigger "traceoff" will also + set this file to zero and stop tracing. Which can also + be re-enabled by user space using this file. + trace: This file holds the output of the trace in a human - readable format (described below). + readable format (described below). Note, tracing is temporarily + disabled while this file is being read (opened). trace_pipe: @@ -109,7 +135,8 @@ of ftrace. Here is a list of some of the key files: will not be read again with a sequential read. The "trace" file is static, and if the tracer is not adding more data, it will display the same - information every time it is read. + information every time it is read. This file will not + disable tracing while being read. trace_options: @@ -128,12 +155,14 @@ of ftrace. Here is a list of some of the key files: tracing_max_latency: Some of the tracers record the max latency. - For example, the time interrupts are disabled. - This time is saved in this file. The max trace - will also be stored, and displayed by "trace". - A new max trace will only be recorded if the - latency is greater than the value in this - file. (in microseconds) + For example, the maximum time that interrupts are disabled. + The maximum time is saved in this file. The max trace will also be + stored, and displayed by "trace". A new max trace will only be + recorded if the latency is greater than the value in this file + (in microseconds). + + By echoing in a time into this file, no latency will be recorded + unless it is greater than the time in this file. tracing_thresh: @@ -152,32 +181,34 @@ of ftrace. Here is a list of some of the key files: that the kernel uses for allocation, usually 4 KB in size). If the last page allocated has room for more bytes than requested, the rest of the page will be used, - making the actual allocation bigger than requested. + making the actual allocation bigger than requested or shown. ( Note, the size may not be a multiple of the page size due to buffer management meta-data. ) + Buffer sizes for individual CPUs may vary + (see "per_cpu/cpu0/buffer_size_kb" below), and if they do + this file will show "X". + buffer_total_size_kb: This displays the total combined size of all the trace buffers. free_buffer: - If a process is performing the tracing, and the ring buffer - should be shrunk "freed" when the process is finished, even - if it were to be killed by a signal, this file can be used - for that purpose. On close of this file, the ring buffer will - be resized to its minimum size. Having a process that is tracing - also open this file, when the process exits its file descriptor - for this file will be closed, and in doing so, the ring buffer - will be "freed". + If a process is performing tracing, and the ring buffer should be + shrunk "freed" when the process is finished, even if it were to be + killed by a signal, this file can be used for that purpose. On close + of this file, the ring buffer will be resized to its minimum size. + Having a process that is tracing also open this file, when the process + exits its file descriptor for this file will be closed, and in doing so, + the ring buffer will be "freed". It may also stop tracing if disable_on_free option is set. tracing_cpumask: - This is a mask that lets the user only trace - on specified CPUs. The format is a hex string - representing the CPUs. + This is a mask that lets the user only trace on specified CPUs. + The format is a hex string representing the CPUs. set_ftrace_filter: @@ -190,6 +221,9 @@ of ftrace. Here is a list of some of the key files: to be traced. Echoing names of functions into this file will limit the trace to only those functions. + The functions listed in "available_filter_functions" are what + can be written into this file. + This interface also allows for commands to be used. See the "Filter commands" section for more details. @@ -202,7 +236,14 @@ of ftrace. Here is a list of some of the key files: set_ftrace_pid: - Have the function tracer only trace a single thread. + Have the function tracer only trace the threads whose PID are + listed in this file. + + If the "function-fork" option is set, then when a task whose + PID is listed in this file forks, the child's PID will + automatically be added to this file, and the child will be + traced by the function tracer as well. This option will also + cause PIDs of tasks that exit to be removed from the file. set_event_pid: @@ -217,17 +258,28 @@ of ftrace. Here is a list of some of the key files: set_graph_function: - Set a "trigger" function where tracing should start - with the function graph tracer (See the section - "dynamic ftrace" for more details). + Functions listed in this file will cause the function graph + tracer to only trace these functions and the functions that + they call. (See the section "dynamic ftrace" for more details). + + set_graph_notrace: + + Similar to set_graph_function, but will disable function graph + tracing when the function is hit until it exits the function. + This makes it possible to ignore tracing functions that are called + by a specific function. available_filter_functions: - This lists the functions that ftrace - has processed and can trace. These are the function - names that you can pass to "set_ftrace_filter" or - "set_ftrace_notrace". (See the section "dynamic ftrace" - below for more details.) + This lists the functions that ftrace has processed and can trace. + These are the function names that you can pass to + "set_ftrace_filter" or "set_ftrace_notrace". + (See the section "dynamic ftrace" below for more details.) + + dyn_ftrace_total_info: + + This file is for debugging purposes. The number of functions that + have been converted to nops and are available to be traced. enabled_functions: @@ -250,12 +302,21 @@ of ftrace. Here is a list of some of the key files: an 'I' will be displayed on the same line as the function that can be overridden. + If the architecture supports it, it will also show what callback + is being directly called by the function. If the count is greater + than 1 it most likely will be ftrace_ops_list_func(). + + If the callback of the function jumps to a trampoline that is + specific to a the callback and not the standard trampoline, + its address will be printed as well as the function that the + trampoline calls. + function_profile_enabled: When set it will enable all functions with either the function - tracer, or if enabled, the function graph tracer. It will + tracer, or if configured, the function graph tracer. It will keep a histogram of the number of functions that were called - and if run with the function graph tracer, it will also keep + and if the function graph tracer was configured, it will also keep track of the time spent in those functions. The histogram content can be displayed in the files: @@ -283,12 +344,11 @@ of ftrace. Here is a list of some of the key files: printk_formats: This is for tools that read the raw format files. If an event in - the ring buffer references a string (currently only trace_printk() - does this), only a pointer to the string is recorded into the buffer - and not the string itself. This prevents tools from knowing what - that string was. This file displays the string and address for - the string allowing tools to map the pointers to what the - strings were. + the ring buffer references a string, only a pointer to the string + is recorded into the buffer and not the string itself. This prevents + tools from knowing what that string was. This file displays the string + and address for the string allowing tools to map the pointers to what + the strings were. saved_cmdlines: @@ -298,6 +358,22 @@ of ftrace. Here is a list of some of the key files: comms for events. If a pid for a comm is not listed, then "<...>" is displayed in the output. + If the option "record-cmd" is set to "0", then comms of tasks + will not be saved during recording. By default, it is enabled. + + saved_cmdlines_size: + + By default, 128 comms are saved (see "saved_cmdlines" above). To + increase or decrease the amount of comms that are cached, echo + in a the number of comms to cache, into this file. + + saved_tgids: + + If the option "record-tgid" is set, on each scheduling context switch + the Task Group ID of a task is saved in a table mapping the PID of + the thread to its TGID. By default, the "record-tgid" option is + disabled. + snapshot: This displays the "snapshot" buffer and also lets the user @@ -336,6 +412,9 @@ of ftrace. Here is a list of some of the key files: # cat trace_clock [local] global counter x86-tsc + The clock with the square brackets around it is the one + in effect. + local: Default clock, but may not be in sync across CPUs global: This clock is in sync with all CPUs but may @@ -448,6 +527,23 @@ of ftrace. Here is a list of some of the key files: See events.txt for more information. + set_event: + + By echoing in the event into this file, will enable that event. + + See events.txt for more information. + + available_events: + + A list of events that can be enabled in tracing. + + See events.txt for more information. + + hwlat_detector: + + Directory for the Hardware Latency Detector. + See "Hardware Latency Detector" section below. + per_cpu: This is a directory that contains the trace per_cpu information. @@ -539,13 +635,25 @@ Here is the list of current tracers that may be configured. to draw a graph of function calls similar to C code source. + "blk" + + The block tracer. The tracer used by the blktrace user + application. + + "hwlat" + + The Hardware Latency tracer is used to detect if the hardware + produces any latency. See "Hardware Latency Detector" section + below. + "irqsoff" Traces the areas that disable interrupts and saves the trace with the longest max latency. See tracing_max_latency. When a new max is recorded, it replaces the old trace. It is best to view this - trace with the latency-format option enabled. + trace with the latency-format option enabled, which + happens automatically when the tracer is selected. "preemptoff" @@ -571,6 +679,26 @@ Here is the list of current tracers that may be configured. RT tasks (as the current "wakeup" does). This is useful for those interested in wake up timings of RT tasks. + "wakeup_dl" + + Traces and records the max latency that it takes for + a SCHED_DEADLINE task to be woken (as the "wakeup" and + "wakeup_rt" does). + + "mmiotrace" + + A special tracer that is used to trace binary module. + It will trace all the calls that a module makes to the + hardware. Everything it writes and reads from the I/O + as well. + + "branch" + + This tracer can be configured when tracing likely/unlikely + calls within the kernel. It will trace when a likely and + unlikely branch is hit and if it was correct in its prediction + of being correct. + "nop" This is the "trace nothing" tracer. To remove all @@ -582,7 +710,7 @@ Examples of using the tracer ---------------------------- Here are typical examples of using the tracers when controlling -them only with the debugfs interface (without using any +them only with the tracefs interface (without using any user-land utilities). Output format: @@ -674,7 +802,7 @@ why a latency happened. Here is a typical trace. This shows that the current tracer is "irqsoff" tracing the time for which interrupts were disabled. It gives the trace version (which never changes) and the version of the kernel upon which this was executed on -(3.10). Then it displays the max latency in microseconds (259 us). The number +(3.8). Then it displays the max latency in microseconds (259 us). The number of trace entries displayed and the total number (both are four: #4/4). VP, KP, SP, and HP are always zero and are reserved for later use. #P is the number of online CPUs (#P:4). @@ -709,6 +837,8 @@ explains which is which. '.' otherwise. hardirq/softirq: + 'Z' - NMI occurred inside a hardirq + 'z' - NMI is running 'H' - hard irq occurred inside a softirq. 'h' - hard irq is running 's' - soft irq is running @@ -757,24 +887,24 @@ nohex nobin noblock trace_printk -nobranch annotate nouserstacktrace nosym-userobj noprintk-msg-only context-info nolatency-format -sleep-time -graph-time record-cmd +norecord-tgid overwrite nodisable_on_free irq-info markers noevent-fork function-trace +nofunction-fork nodisplay-graph nostacktrace +nobranch To disable one of the options, echo in the option prepended with "no". @@ -830,8 +960,6 @@ Here are the available options: trace_printk - Can disable trace_printk() from writing into the buffer. - branch - Enable branch tracing with the tracer. - annotate - It is sometimes confusing when the CPU buffers are full and one CPU buffer had a lot of events recently, thus a shorter time frame, were another CPU may have only had @@ -850,7 +978,8 @@ Here are the available options: <idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock userstacktrace - This option changes the trace. It records a - stacktrace of the current userspace thread. + stacktrace of the current user space thread after + each trace event. sym-userobj - when user stacktrace are enabled, look up which object the address belongs to, and print a @@ -873,29 +1002,21 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] context-info - Show only the event data. Hides the comm, PID, timestamp, CPU, and other useful data. - latency-format - This option changes the trace. When - it is enabled, the trace displays - additional information about the - latencies, as described in "Latency - trace format". - - sleep-time - When running function graph tracer, to include - the time a task schedules out in its function. - When enabled, it will account time the task has been - scheduled out as part of the function call. - - graph-time - When running function profiler with function graph tracer, - to include the time to call nested functions. When this is - not set, the time reported for the function will only - include the time the function itself executed for, not the - time for functions that it called. + latency-format - This option changes the trace output. When it is enabled, + the trace displays additional information about the + latency, as described in "Latency trace format". record-cmd - When any event or tracer is enabled, a hook is enabled - in the sched_switch trace point to fill comm cache + in the sched_switch trace point to fill comm cache with mapped pids and comms. But this may cause some overhead, and if you only care about pids, and not the name of the task, disabling this option can lower the - impact of tracing. + impact of tracing. See "saved_cmdlines". + + record-tgid - When any event or tracer is enabled, a hook is enabled + in the sched_switch trace point to fill the cache of + mapped Thread Group IDs (TGID) mapping to pids. See + "saved_tgids". overwrite - This controls what happens when the trace buffer is full. If "1" (default), the oldest events are @@ -935,19 +1056,98 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] functions. This keeps the overhead of the tracer down when performing latency tests. + function-fork - When set, tasks with PIDs listed in set_ftrace_pid will + have the PIDs of their children added to set_ftrace_pid + when those tasks fork. Also, when tasks with PIDs in + set_ftrace_pid exit, their PIDs will be removed from the + file. + display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will use function graph tracing instead of function tracing. - stacktrace - This is one of the options that changes the trace - itself. When a trace is recorded, so is the stack - of functions. This allows for back traces of - trace sites. + stacktrace - When set, a stack trace is recorded after any trace event + is recorded. + + branch - Enable branch tracing with the tracer. This enables branch + tracer along with the currently set tracer. Enabling this + with the "nop" tracer is the same as just enabling the + "branch" tracer. Note: Some tracers have their own options. They only appear in this file when the tracer is active. They always appear in the options directory. +Here are the per tracer options: + +Options for function tracer: + + func_stack_trace - When set, a stack trace is recorded after every + function that is recorded. NOTE! Limit the functions + that are recorded before enabling this, with + "set_ftrace_filter" otherwise the system performance + will be critically degraded. Remember to disable + this option before clearing the function filter. + +Options for function_graph tracer: + + Since the function_graph tracer has a slightly different output + it has its own options to control what is displayed. + + funcgraph-overrun - When set, the "overrun" of the graph stack is + displayed after each function traced. The + overrun, is when the stack depth of the calls + is greater than what is reserved for each task. + Each task has a fixed array of functions to + trace in the call graph. If the depth of the + calls exceeds that, the function is not traced. + The overrun is the number of functions missed + due to exceeding this array. + + funcgraph-cpu - When set, the CPU number of the CPU where the trace + occurred is displayed. + + funcgraph-overhead - When set, if the function takes longer than + A certain amount, then a delay marker is + displayed. See "delay" above, under the + header description. + + funcgraph-proc - Unlike other tracers, the process' command line + is not displayed by default, but instead only + when a task is traced in and out during a context + switch. Enabling this options has the command + of each process displayed at every line. + + funcgraph-duration - At the end of each function (the return) + the duration of the amount of time in the + function is displayed in microseconds. + + funcgraph-abstime - When set, the timestamp is displayed at each + line. + + funcgraph-irqs - When disabled, functions that happen inside an + interrupt will not be traced. + + funcgraph-tail - When set, the return event will include the function + that it represents. By default this is off, and + only a closing curly bracket "}" is displayed for + the return of a function. + + sleep-time - When running function graph tracer, to include + the time a task schedules out in its function. + When enabled, it will account time the task has been + scheduled out as part of the function call. + + graph-time - When running function profiler with function graph tracer, + to include the time to call nested functions. When this is + not set, the time reported for the function will only + include the time the function itself executed for, not the + time for functions that it called. + +Options for blk tracer: + + blk_classic - Shows a more minimalistic output. + irqsoff ------- @@ -1711,6 +1911,85 @@ events. <idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep +Hardware Latency Detector +------------------------- + +The hardware latency detector is executed by enabling the "hwlat" tracer. + +NOTE, this tracer will affect the performance of the system as it will +periodically make a CPU constantly busy with interrupts disabled. + + # echo hwlat > current_tracer + # sleep 100 + # cat trace +# tracer: hwlat +# +# _-----=> irqs-off +# / _----=> need-resched +# | / _---=> hardirq/softirq +# || / _--=> preempt-depth +# ||| / delay +# TASK-PID CPU# |||| TIMESTAMP FUNCTION +# | | | |||| | | + <...>-3638 [001] d... 19452.055471: #1 inner/outer(us): 12/14 ts:1499801089.066141940 + <...>-3638 [003] d... 19454.071354: #2 inner/outer(us): 11/9 ts:1499801091.082164365 + <...>-3638 [002] dn.. 19461.126852: #3 inner/outer(us): 12/9 ts:1499801098.138150062 + <...>-3638 [001] d... 19488.340960: #4 inner/outer(us): 8/12 ts:1499801125.354139633 + <...>-3638 [003] d... 19494.388553: #5 inner/outer(us): 8/12 ts:1499801131.402150961 + <...>-3638 [003] d... 19501.283419: #6 inner/outer(us): 0/12 ts:1499801138.297435289 nmi-total:4 nmi-count:1 + + +The above output is somewhat the same in the header. All events will have +interrupts disabled 'd'. Under the FUNCTION title there is: + + #1 - This is the count of events recorded that were greater than the + tracing_threshold (See below). + + inner/outer(us): 12/14 + + This shows two numbers as "inner latency" and "outer latency". The test + runs in a loop checking a timestamp twice. The latency detected within + the two timestamps is the "inner latency" and the latency detected + after the previous timestamp and the next timestamp in the loop is + the "outer latency". + + ts:1499801089.066141940 + + The absolute timestamp that the event happened. + + nmi-total:4 nmi-count:1 + + On architectures that support it, if an NMI comes in during the + test, the time spent in NMI is reported in "nmi-total" (in + microseconds). + + All architectures that have NMIs will show the "nmi-count" if an + NMI comes in during the test. + +hwlat files: + + tracing_threshold - This gets automatically set to "10" to represent 10 + microseconds. This is the threshold of latency that + needs to be detected before the trace will be recorded. + + Note, when hwlat tracer is finished (another tracer is + written into "current_tracer"), the original value for + tracing_threshold is placed back into this file. + + hwlat_detector/width - The length of time the test runs with interrupts + disabled. + + hwlat_detector/window - The length of time of the window which the test + runs. That is, the test will run for "width" + microseconds per "window" microseconds + + tracing_cpumask - When the test is started. A kernel thread is created that + runs the test. This thread will alternate between CPUs + listed in the tracing_cpumask between each period + (one "window"). To limit the test to specific CPUs + set the mask in this file to only the CPUs that the test + should run on. + function -------- @@ -1821,15 +2100,15 @@ something like this simple program: #define STR(x) _STR(x) #define MAX_PATH 256 -const char *find_debugfs(void) +const char *find_tracefs(void) { - static char debugfs[MAX_PATH+1]; - static int debugfs_found; + static char tracefs[MAX_PATH+1]; + static int tracefs_found; char type[100]; FILE *fp; - if (debugfs_found) - return debugfs; + if (tracefs_found) + return tracefs; if ((fp = fopen("/proc/mounts","r")) == NULL) { perror("/proc/mounts"); @@ -1839,27 +2118,27 @@ const char *find_debugfs(void) while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", - debugfs, type) == 2) { - if (strcmp(type, "debugfs") == 0) + tracefs, type) == 2) { + if (strcmp(type, "tracefs") == 0) break; } fclose(fp); - if (strcmp(type, "debugfs") != 0) { - fprintf(stderr, "debugfs not mounted"); + if (strcmp(type, "tracefs") != 0) { + fprintf(stderr, "tracefs not mounted"); return NULL; } - strcat(debugfs, "/tracing/"); - debugfs_found = 1; + strcat(tracefs, "/tracing/"); + tracefs_found = 1; - return debugfs; + return tracefs; } const char *tracing_file(const char *file_name) { static char trace_file[MAX_PATH+1]; - snprintf(trace_file, MAX_PATH, "%s/%s", find_debugfs(), file_name); + snprintf(trace_file, MAX_PATH, "%s/%s", find_tracefs(), file_name); return trace_file; } @@ -1898,12 +2177,12 @@ Or this simple script! ------ #!/bin/bash -debugfs=`sed -ne 's/^debugfs \(.*\) debugfs.*/\1/p' /proc/mounts` -echo nop > $debugfs/tracing/current_tracer -echo 0 > $debugfs/tracing/tracing_on -echo $$ > $debugfs/tracing/set_ftrace_pid -echo function > $debugfs/tracing/current_tracer -echo 1 > $debugfs/tracing/tracing_on +tracefs=`sed -ne 's/^tracefs \(.*\) tracefs.*/\1/p' /proc/mounts` +echo nop > $tracefs/tracing/current_tracer +echo 0 > $tracefs/tracing/tracing_on +echo $$ > $tracefs/tracing/set_ftrace_pid +echo function > $tracefs/tracing/current_tracer +echo 1 > $tracefs/tracing/tracing_on exec "$@" ------ @@ -2145,13 +2424,18 @@ include the -pg switch in the compiling of the kernel.) At compile time every C file object is run through the recordmcount program (located in the scripts directory). This program will parse the ELF headers in the C object to find all -the locations in the .text section that call mcount. (Note, only -white listed .text sections are processed, since processing other -sections like .init.text may cause races due to those sections -being freed unexpectedly). - -A new section called "__mcount_loc" is created that holds -references to all the mcount call sites in the .text section. +the locations in the .text section that call mcount. Starting +with gcc verson 4.6, the -mfentry has been added for x86, which +calls "__fentry__" instead of "mcount". Which is called before +the creation of the stack frame. + +Note, not all sections are traced. They may be prevented by either +a notrace, or blocked another way and all inline functions are not +traced. Check the "available_filter_functions" file to see what functions +can be traced. + +A section called "__mcount_loc" is created that holds +references to all the mcount/fentry call sites in the .text section. The recordmcount program re-links this section back into the original object. The final linking stage of the kernel will add all these references into a single table. @@ -2679,7 +2963,7 @@ in time without stopping tracing. Ftrace swaps the current buffer with a spare buffer, and tracing continues in the new current (=previous spare) buffer. -The following debugfs files in "tracing" are related to this +The following tracefs files in "tracing" are related to this feature: snapshot: @@ -2752,7 +3036,7 @@ cat: snapshot: Device or resource busy Instances --------- -In the debugfs tracing directory is a directory called "instances". +In the tracefs tracing directory is a directory called "instances". This directory can have new directories created inside of it using mkdir, and removing directories with rmdir. The directory created with mkdir in this directory will already contain files and other diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 2953d558bbee..53f6b6401cf0 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3816,7 +3816,7 @@ match_records(struct ftrace_hash *hash, char *func, int len, char *mod) int exclude_mod = 0; int found = 0; int ret; - int clear_filter; + int clear_filter = 0; if (func) { func_g.type = filter_parse_regex(func, len, &func_g.search, @@ -3950,7 +3950,7 @@ static int cache_mod(struct trace_array *tr, continue; /* no func matches all */ - if (!func || strcmp(func, "*") == 0 || + if (strcmp(func, "*") == 0 || (ftrace_mod->func && strcmp(ftrace_mod->func, func) == 0)) { ret = 0; @@ -3978,6 +3978,7 @@ static int ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len, int reset, int enable); +#ifdef CONFIG_MODULES static void process_mod_list(struct list_head *head, struct ftrace_ops *ops, char *mod, bool enable) { @@ -4068,6 +4069,7 @@ static void process_cached_mods(const char *mod_name) kfree(mod); } +#endif /* * We register the module command as a template to show others how diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 948ec32e0c27..2d0ffcc49dba 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1916,7 +1916,11 @@ static int trace_save_cmdline(struct task_struct *tsk) { unsigned pid, idx; - if (!tsk->pid || unlikely(tsk->pid > PID_MAX_DEFAULT)) + /* treat recording of idle task as a success */ + if (!tsk->pid) + return 1; + + if (unlikely(tsk->pid > PID_MAX_DEFAULT)) return 0; /* @@ -2002,7 +2006,11 @@ int trace_find_tgid(int pid) static int trace_save_tgid(struct task_struct *tsk) { - if (unlikely(!tgid_map || !tsk->pid || tsk->pid > PID_MAX_DEFAULT)) + /* treat recording of idle task as a success */ + if (!tsk->pid) + return 1; + + if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT)) return 0; tgid_map[tsk->pid] = tsk->tgid; @@ -2029,11 +2037,20 @@ static bool tracing_record_taskinfo_skip(int flags) */ void tracing_record_taskinfo(struct task_struct *task, int flags) { + bool done; + if (tracing_record_taskinfo_skip(flags)) return; - if ((flags & TRACE_RECORD_CMDLINE) && !trace_save_cmdline(task)) - return; - if ((flags & TRACE_RECORD_TGID) && !trace_save_tgid(task)) + + /* + * Record as much task information as possible. If some fail, continue + * to try to record the others. + */ + done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(task); + done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(task); + + /* If recording any information failed, retry again soon. */ + if (!done) return; __this_cpu_write(trace_taskinfo_save, false); @@ -2050,15 +2067,22 @@ void tracing_record_taskinfo(struct task_struct *task, int flags) void tracing_record_taskinfo_sched_switch(struct task_struct *prev, struct task_struct *next, int flags) { + bool done; + if (tracing_record_taskinfo_skip(flags)) return; - if ((flags & TRACE_RECORD_CMDLINE) && - (!trace_save_cmdline(prev) || !trace_save_cmdline(next))) - return; + /* + * Record as much task information as possible. If some fail, continue + * to try to record the others. + */ + done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(prev); + done &= !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(next); + done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(prev); + done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(next); - if ((flags & TRACE_RECORD_TGID) && - (!trace_save_tgid(prev) || !trace_save_tgid(next))) + /* If recording any information failed, retry again soon. */ + if (!done) return; __this_cpu_write(trace_taskinfo_save, false); @@ -3334,14 +3358,23 @@ static void print_func_help_header_irq(struct trace_buffer *buf, struct seq_file unsigned int flags) { bool tgid = flags & TRACE_ITER_RECORD_TGID; - - seq_printf(m, "# %s _-----=> irqs-off\n", tgid ? " " : ""); - seq_printf(m, "# %s / _----=> need-resched\n", tgid ? " " : ""); - seq_printf(m, "# %s| / _---=> hardirq/softirq\n", tgid ? " " : ""); - seq_printf(m, "# %s|| / _--=> preempt-depth\n", tgid ? " " : ""); - seq_printf(m, "# %s||| / delay\n", tgid ? " " : ""); - seq_printf(m, "# TASK-PID CPU#%s|||| TIMESTAMP FUNCTION\n", tgid ? " TGID " : ""); - seq_printf(m, "# | | | %s|||| | |\n", tgid ? " | " : ""); + const char tgid_space[] = " "; + const char space[] = " "; + + seq_printf(m, "# %s _-----=> irqs-off\n", + tgid ? tgid_space : space); + seq_printf(m, "# %s / _----=> need-resched\n", + tgid ? tgid_space : space); + seq_printf(m, "# %s| / _---=> hardirq/softirq\n", + tgid ? tgid_space : space); + seq_printf(m, "# %s|| / _--=> preempt-depth\n", + tgid ? tgid_space : space); + seq_printf(m, "# %s||| / delay\n", + tgid ? tgid_space : space); + seq_printf(m, "# TASK-PID CPU#%s|||| TIMESTAMP FUNCTION\n", + tgid ? " TGID " : space); + seq_printf(m, "# | | | %s|||| | |\n", + tgid ? " | " : space); } void @@ -4689,6 +4722,76 @@ static const struct file_operations tracing_readme_fops = { .llseek = generic_file_llseek, }; +static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos) +{ + int *ptr = v; + + if (*pos || m->count) + ptr++; + + (*pos)++; + + for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) { + if (trace_find_tgid(*ptr)) + return ptr; + } + + return NULL; +} + +static void *saved_tgids_start(struct seq_file *m, loff_t *pos) +{ + void *v; + loff_t l = 0; + + if (!tgid_map) + return NULL; + + v = &tgid_map[0]; + while (l <= *pos) { + v = saved_tgids_next(m, v, &l); + if (!v) + return NULL; + } + + return v; +} + +static void saved_tgids_stop(struct seq_file *m, void *v) +{ +} + +static int saved_tgids_show(struct seq_file *m, void *v) +{ + int pid = (int *)v - tgid_map; + + seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid)); + return 0; +} + +static const struct seq_operations tracing_saved_tgids_seq_ops = { + .start = saved_tgids_start, + .stop = saved_tgids_stop, + .next = saved_tgids_next, + .show = saved_tgids_show, +}; + +static int tracing_saved_tgids_open(struct inode *inode, struct file *filp) +{ + if (tracing_disabled) + return -ENODEV; + + return seq_open(filp, &tracing_saved_tgids_seq_ops); +} + + +static const struct file_operations tracing_saved_tgids_fops = { + .open = tracing_saved_tgids_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos) { unsigned int *ptr = v; @@ -7921,6 +8024,9 @@ static __init int tracer_init_tracefs(void) trace_create_file("saved_cmdlines_size", 0644, d_tracer, NULL, &tracing_saved_cmdlines_size_fops); + trace_create_file("saved_tgids", 0444, d_tracer, + NULL, &tracing_saved_tgids_fops); + trace_eval_init(); trace_create_eval_file(d_tracer); diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 2c5221819be5..c9b5aa10fbf9 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -598,6 +598,14 @@ static struct notifier_block trace_kprobe_module_nb = { .priority = 1 /* Invoked after kprobe module callback */ }; +/* Convert certain expected symbols into '_' when generating event names */ +static inline void sanitize_event_name(char *name) +{ + while (*name++ != '\0') + if (*name == ':' || *name == '.') + *name = '_'; +} + static int create_trace_kprobe(int argc, char **argv) { /* @@ -736,6 +744,7 @@ static int create_trace_kprobe(int argc, char **argv) else snprintf(buf, MAX_EVENT_NAME_LEN, "%c_0x%p", is_return ? 'r' : 'p', addr); + sanitize_event_name(buf); event = buf; } tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive, diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index b4a751e8f9d6..a4df67cbc711 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -406,6 +406,8 @@ static const struct file_operations stack_trace_fops = { .release = seq_release, }; +#ifdef CONFIG_DYNAMIC_FTRACE + static int stack_trace_filter_open(struct inode *inode, struct file *file) { @@ -423,6 +425,8 @@ static const struct file_operations stack_trace_filter_fops = { .release = ftrace_regex_release, }; +#endif /* CONFIG_DYNAMIC_FTRACE */ + int stack_trace_sysctl(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, @@ -477,8 +481,10 @@ static __init int stack_trace_init(void) trace_create_file("stack_trace", 0444, d_tracer, NULL, &stack_trace_fops); +#ifdef CONFIG_DYNAMIC_FTRACE trace_create_file("stack_trace_filter", 0444, d_tracer, &trace_ops, &stack_trace_filter_fops); +#endif if (stack_trace_filter_buf[0]) ftrace_set_early_filter(&trace_ops, stack_trace_filter_buf, 1); diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc new file mode 100644 index 000000000000..b9302cc82c12 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc @@ -0,0 +1,36 @@ +#!/bin/sh +# description: Kprobe event auto/manual naming + +[ -f kprobe_events ] || exit_unsupported # this is configurable + +disable_events +echo > kprobe_events + +:;: "Add an event on function without name" ;: + +FUNC=`grep " [tT] .*vfs_read$" /proc/kallsyms | tail -n 1 | cut -f 3 -d " "` +[ "x" != "x$FUNC" ] || exit_unresolved +echo "p $FUNC" > kprobe_events +PROBE_NAME=`echo $FUNC | tr ".:" "_"` +test -d events/kprobes/p_${PROBE_NAME}_0 || exit_failure + +:;: "Add an event on function with new name" ;: + +echo "p:event1 $FUNC" > kprobe_events +test -d events/kprobes/event1 || exit_failure + +:;: "Add an event on function with new name and group" ;: + +echo "p:kprobes2/event2 $FUNC" > kprobe_events +test -d events/kprobes2/event2 || exit_failure + +:;: "Add an event on dot function without name" ;: + +FUNC=`grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | cut -f 3 -d " "` +[ "x" != "x$FUNC" ] || exit_unresolved +echo "p $FUNC" > kprobe_events +EVENT=`grep $FUNC kprobe_events | cut -f 1 -d " " | cut -f 2 -d:` +[ "x" != "x$EVENT" ] || exit_failure +test -d events/$EVENT || exit_failure + +echo > kprobe_events diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc new file mode 100644 index 000000000000..6d634e4b7680 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc @@ -0,0 +1,28 @@ +#!/bin/sh +# description: Kprobe dynamic event - probing module + +[ -f kprobe_events ] || exit_unsupported # this is configurable + +disable_events +echo > kprobe_events + +:;: "Add an event on a module function without specifying event name" ;: + +MOD=`lsmod | head -n 2 | tail -n 1 | cut -f1 -d" "` +FUNC=`grep -m 1 ".* t .*\\[$MOD\\]" /proc/kallsyms | xargs | cut -f3 -d" "` +[ "x" != "x$MOD" -a "y" != "y$FUNC" ] || exit_unresolved +echo "p $MOD:$FUNC" > kprobe_events +PROBE_NAME=`echo $MOD:$FUNC | tr ".:" "_"` +test -d events/kprobes/p_${PROBE_NAME}_0 || exit_failure + +:;: "Add an event on a module function with new event name" ;: + +echo "p:event1 $MOD:$FUNC" > kprobe_events +test -d events/kprobes/event1 || exit_failure + +:;: "Add an event on a module function with new event and group name" ;: + +echo "p:kprobes1/event1 $MOD:$FUNC" > kprobe_events +test -d events/kprobes1/event1 || exit_failure + +echo > kprobe_events diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc index f4d1ff785d67..2a1cb9908746 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc @@ -2,10 +2,10 @@ # description: Register/unregister many kprobe events # ftrace fentry skip size depends on the machine architecture. -# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc +# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le case `uname -m` in x86_64|i[3456]86) OFFS=5;; - ppc*) OFFS=4;; + ppc64le) OFFS=8;; *) OFFS=0;; esac |