diff options
22 files changed, 6041 insertions, 412 deletions
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index c010be8c85d7..08d74d75150d 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -512,3 +512,1558 @@ The following commands are supported: Note that there can be only one traceon or traceoff trigger per triggering event. + +- hist + + This command aggregates event hits into a hash table keyed on one or + more trace event format fields (or stacktrace) and a set of running + totals derived from one or more trace event format fields and/or + event counts (hitcount). + + The format of a hist trigger is as follows: + + hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>] + [:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue] + [:clear][:name=histname1] [if <filter>] + + When a matching event is hit, an entry is added to a hash table + using the key(s) and value(s) named. Keys and values correspond to + fields in the event's format description. Values must correspond to + numeric fields - on an event hit, the value(s) will be added to a + sum kept for that field. The special string 'hitcount' can be used + in place of an explicit value field - this is simply a count of + event hits. If 'values' isn't specified, an implicit 'hitcount' + value will be automatically created and used as the only value. + Keys can be any field, or the special string 'stacktrace', which + will use the event's kernel stacktrace as the key. The keywords + 'keys' or 'key' can be used to specify keys, and the keywords + 'values', 'vals', or 'val' can be used to specify values. Compound + keys consisting of up to two fields can be specified by the 'keys' + keyword. Hashing a compound key produces a unique entry in the + table for each unique combination of component keys, and can be + useful for providing more fine-grained summaries of event data. + Additionally, sort keys consisting of up to two fields can be + specified by the 'sort' keyword. If more than one field is + specified, the result will be a 'sort within a sort': the first key + is taken to be the primary sort key and the second the secondary + key. If a hist trigger is given a name using the 'name' parameter, + its histogram data will be shared with other triggers of the same + name, and trigger hits will update this common data. Only triggers + with 'compatible' fields can be combined in this way; triggers are + 'compatible' if the fields named in the trigger share the same + number and type of fields and those fields also have the same names. + Note that any two events always share the compatible 'hitcount' and + 'stacktrace' fields and can therefore be combined using those + fields, however pointless that may be. + + 'hist' triggers add a 'hist' file to each event's subdirectory. + Reading the 'hist' file for the event will dump the hash table in + its entirety to stdout. If there are multiple hist triggers + attached to an event, there will be a table for each trigger in the + output. The table displayed for a named trigger will be the same as + any other instance having the same name. Each printed hash table + entry is a simple list of the keys and values comprising the entry; + keys are printed first and are delineated by curly braces, and are + followed by the set of value fields for the entry. By default, + numeric fields are displayed as base-10 integers. This can be + modified by appending any of the following modifiers to the field + name: + + .hex display a number as a hex value + .sym display an address as a symbol + .sym-offset display an address as a symbol and offset + .syscall display a syscall id as a system call name + .execname display a common_pid as a program name + + Note that in general the semantics of a given field aren't + interpreted when applying a modifier to it, but there are some + restrictions to be aware of in this regard: + + - only the 'hex' modifier can be used for values (because values + are essentially sums, and the other modifiers don't make sense + in that context). + - the 'execname' modifier can only be used on a 'common_pid'. The + reason for this is that the execname is simply the 'comm' value + saved for the 'current' process when an event was triggered, + which is the same as the common_pid value saved by the event + tracing code. Trying to apply that comm value to other pid + values wouldn't be correct, and typically events that care save + pid-specific comm fields in the event itself. + + A typical usage scenario would be the following to enable a hist + trigger, read its current contents, and then turn it off: + + # echo 'hist:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + + # echo '!hist:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + The trigger file itself can be read to show the details of the + currently attached hist trigger. This information is also displayed + at the top of the 'hist' file when read. + + By default, the size of the hash table is 2048 entries. The 'size' + parameter can be used to specify more or fewer than that. The units + are in terms of hashtable entries - if a run uses more entries than + specified, the results will show the number of 'drops', the number + of hits that were ignored. The size should be a power of 2 between + 128 and 131072 (any non- power-of-2 number specified will be rounded + up). + + The 'sort' parameter can be used to specify a value field to sort + on. The default if unspecified is 'hitcount' and the default sort + order is 'ascending'. To sort in the opposite direction, append + .descending' to the sort key. + + The 'pause' parameter can be used to pause an existing hist trigger + or to start a hist trigger but not log any events until told to do + so. 'continue' or 'cont' can be used to start or restart a paused + hist trigger. + + The 'clear' parameter will clear the contents of a running hist + trigger and leave its current paused/active state. + + Note that the 'pause', 'cont', and 'clear' parameters should be + applied using 'append' shell operator ('>>') if applied to an + existing trigger, rather than via the '>' operator, which will cause + the trigger to be removed through truncation. + +- enable_hist/disable_hist + + The enable_hist and disable_hist triggers can be used to have one + event conditionally start and stop another event's already-attached + hist trigger. Any number of enable_hist and disable_hist triggers + can be attached to a given event, allowing that event to kick off + and stop aggregations on a host of other events. + + The format is very similar to the enable/disable_event triggers: + + enable_hist:<system>:<event>[:count] + disable_hist:<system>:<event>[:count] + + Instead of enabling or disabling the tracing of the target event + into the trace buffer as the enable/disable_event triggers do, the + enable/disable_hist triggers enable or disable the aggregation of + the target event into a hash table. + + A typical usage scenario for the enable_hist/disable_hist triggers + would be to first set up a paused hist trigger on some event, + followed by an enable_hist/disable_hist pair that turns the hist + aggregation on and off when conditions of interest are hit: + + # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + The above sets up an initially paused hist trigger which is unpaused + and starts aggregating events when a given program is executed, and + which stops aggregating when the process exits and the hist trigger + is paused again. + + The examples below provide a more concrete illustration of the + concepts and typical usage patterns discussed above. + + +6.2 'hist' trigger examples +--------------------------- + + The first set of examples creates aggregations using the kmalloc + event. The fields that can be used for the hist trigger are listed + in the kmalloc event's format file: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format + name: kmalloc + ID: 374 + format: + field:unsigned short common_type; offset:0; size:2; signed:0; + field:unsigned char common_flags; offset:2; size:1; signed:0; + field:unsigned char common_preempt_count; offset:3; size:1; signed:0; + field:int common_pid; offset:4; size:4; signed:1; + + field:unsigned long call_site; offset:8; size:8; signed:0; + field:const void * ptr; offset:16; size:8; signed:0; + field:size_t bytes_req; offset:24; size:8; signed:0; + field:size_t bytes_alloc; offset:32; size:8; signed:0; + field:gfp_t gfp_flags; offset:40; size:4; signed:0; + + We'll start by creating a hist trigger that generates a simple table + that lists the total number of bytes requested for each function in + the kernel that made one or more calls to kmalloc: + + # echo 'hist:key=call_site:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + This tells the tracing system to create a 'hist' trigger using the + call_site field of the kmalloc event as the key for the table, which + just means that each unique call_site address will have an entry + created for it in the table. The 'val=bytes_req' parameter tells + the hist trigger that for each unique entry (call_site) in the + table, it should keep a running total of the number of bytes + requested by that call_site. + + We'll let it run for awhile and then dump the contents of the 'hist' + file in the kmalloc event's subdirectory (for readability, a number + of entries have been omitted): + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] + + { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176 + { call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024 + { call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384 + { call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24 + { call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8 + { call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152 + { call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144 + { call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144 + { call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560 + { call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736 + . + . + . + { call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576 + { call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336 + { call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504 + { call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584 + { call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448 + { call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720 + { call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088 + { call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920 + { call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716 + { call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712 + { call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160 + { call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520 + + Totals: + Hits: 4610 + Entries: 45 + Dropped: 0 + + The output displays a line for each entry, beginning with the key + specified in the trigger, followed by the value(s) also specified in + the trigger. At the beginning of the output is a line that displays + the trigger info, which can also be displayed by reading the + 'trigger' file: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] + + At the end of the output are a few lines that display the overall + totals for the run. The 'Hits' field shows the total number of + times the event trigger was hit, the 'Entries' field shows the total + number of used entries in the hash table, and the 'Dropped' field + shows the number of hits that were dropped because the number of + used entries for the run exceeded the maximum number of entries + allowed for the table (normally 0, but if not a hint that you may + want to increase the size of the table using the 'size' parameter). + + Notice in the above output that there's an extra field, 'hitcount', + which wasn't specified in the trigger. Also notice that in the + trigger info output, there's a parameter, 'sort=hitcount', which + wasn't specified in the trigger either. The reason for that is that + every trigger implicitly keeps a count of the total number of hits + attributed to a given entry, called the 'hitcount'. That hitcount + information is explicitly displayed in the output, and in the + absence of a user-specified sort parameter, is used as the default + sort field. + + The value 'hitcount' can be used in place of an explicit value in + the 'values' parameter if you don't really need to have any + particular field summed and are mainly interested in hit + frequencies. + + To turn the hist trigger off, simply call up the trigger in the + command history and re-execute it with a '!' prepended: + + # echo '!hist:key=call_site:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + Finally, notice that the call_site as displayed in the output above + isn't really very useful. It's an address, but normally addresses + are displayed in hex. To have a numeric field displayed as a hex + value, simply append '.hex' to the field name in the trigger: + + # echo 'hist:key=call_site.hex:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active] + + { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433 + { call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176 + { call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384 + { call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8 + { call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511 + { call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12 + { call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152 + { call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24 + { call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144 + { call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648 + { call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144 + { call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544 + . + . + . + { call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024 + { call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680 + { call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112 + { call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232 + { call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360 + { call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640 + { call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600 + { call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584 + { call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656 + { call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456 + { call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600 + + Totals: + Hits: 4775 + Entries: 46 + Dropped: 0 + + Even that's only marginally more useful - while hex values do look + more like addresses, what users are typically more interested in + when looking at text addresses are the corresponding symbols + instead. To have an address displayed as symbolic value instead, + simply append '.sym' or '.sym-offset' to the field name in the + trigger: + + # echo 'hist:key=call_site.sym:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active] + + { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024 + { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 + { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192 + { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40 + { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528 + { call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624 + { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96 + { call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464 + { call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304 + { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424 + . + . + . + { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240 + { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672 + { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208 + { call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152 + { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576 + { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248 + { call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384 + { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584 + { call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176 + { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265 + + Totals: + Hits: 109928 + Entries: 71 + Dropped: 0 + + Because the default sort key above is 'hitcount', the above shows a + the list of call_sites by increasing hitcount, so that at the bottom + we see the functions that made the most kmalloc calls during the + run. If instead we we wanted to see the top kmalloc callers in + terms of the number of bytes requested rather than the number of + calls, and we wanted the top caller to appear at the top, we can use + the 'sort' parameter, along with the 'descending' modifier: + + # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active] + + { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176 + { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135 + { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128 + { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784 + { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992 + { call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704 + { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088 + { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536 + { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664 + { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632 + . + . + . + { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48 + { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48 + { call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48 + { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40 + { call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16 + { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 + { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 + + Totals: + Hits: 32133 + Entries: 81 + Dropped: 0 + + To display the offset and size information in addition to the symbol + name, just use 'sym-offset' instead: + + # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active] + + { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720 + { call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936 + { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832 + { call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384 + { call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040 + { call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072 + { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696 + { call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640 + { call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456 + . + . + . + { call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128 + { call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96 + { call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96 + { call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84 + { call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8 + { call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7 + + Totals: + Hits: 26098 + Entries: 64 + Dropped: 0 + + We can also add multiple fields to the 'values' parameter. For + example, we might want to see the total number of bytes allocated + alongside bytes requested, and display the result sorted by bytes + allocated in a descending order: + + # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active] + + { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016 + { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224 + { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760 + { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744 + { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304 + { call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640 + { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760 + { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312 + { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432 + . + . + . + { call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192 + { call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96 + { call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64 + { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8 + { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8 + { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8 + + Totals: + Hits: 66598 + Entries: 65 + Dropped: 0 + + Finally, to finish off our kmalloc example, instead of simply having + the hist trigger display symbolic call_sites, we can have the hist + trigger additionally display the complete set of kernel stack traces + that led to each call_site. To do that, we simply use the special + value 'stacktrace' for the key parameter: + + # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The above trigger will use the kernel stack trace in effect when an + event is triggered as the key for the hash table. This allows the + enumeration of every kernel callpath that led up to a particular + event, along with a running total of any of the event fields for + that event. Here we tally bytes requested and bytes allocated for + every callpath in the system that led up to a kmalloc (in this case + every callpath to a kmalloc for a kernel compile): + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] + + { stacktrace: + __kmalloc_track_caller+0x10b/0x1a0 + kmemdup+0x20/0x50 + hidraw_report_event+0x8a/0x120 [hid] + hid_report_raw_event+0x3ea/0x440 [hid] + hid_input_report+0x112/0x190 [hid] + hid_irq_in+0xc2/0x260 [usbhid] + __usb_hcd_giveback_urb+0x72/0x120 + usb_giveback_urb_bh+0x9e/0xe0 + tasklet_hi_action+0xf8/0x100 + __do_softirq+0x114/0x2c0 + irq_exit+0xa5/0xb0 + do_IRQ+0x5a/0xf0 + ret_from_intr+0x0/0x30 + cpuidle_enter+0x17/0x20 + cpu_startup_entry+0x315/0x3e0 + rest_init+0x7c/0x80 + } hitcount: 3 bytes_req: 21 bytes_alloc: 24 + { stacktrace: + __kmalloc_track_caller+0x10b/0x1a0 + kmemdup+0x20/0x50 + hidraw_report_event+0x8a/0x120 [hid] + hid_report_raw_event+0x3ea/0x440 [hid] + hid_input_report+0x112/0x190 [hid] + hid_irq_in+0xc2/0x260 [usbhid] + __usb_hcd_giveback_urb+0x72/0x120 + usb_giveback_urb_bh+0x9e/0xe0 + tasklet_hi_action+0xf8/0x100 + __do_softirq+0x114/0x2c0 + irq_exit+0xa5/0xb0 + do_IRQ+0x5a/0xf0 + ret_from_intr+0x0/0x30 + } hitcount: 3 bytes_req: 21 bytes_alloc: 24 + { stacktrace: + kmem_cache_alloc_trace+0xeb/0x150 + aa_alloc_task_context+0x27/0x40 + apparmor_cred_prepare+0x1f/0x50 + security_prepare_creds+0x16/0x20 + prepare_creds+0xdf/0x1a0 + SyS_capset+0xb5/0x200 + system_call_fastpath+0x12/0x6a + } hitcount: 1 bytes_req: 32 bytes_alloc: 32 + . + . + . + { stacktrace: + __kmalloc+0x11b/0x1b0 + i915_gem_execbuffer2+0x6c/0x2c0 [i915] + drm_ioctl+0x349/0x670 [drm] + do_vfs_ioctl+0x2f0/0x4f0 + SyS_ioctl+0x81/0xa0 + system_call_fastpath+0x12/0x6a + } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808 + { stacktrace: + __kmalloc+0x11b/0x1b0 + load_elf_phdrs+0x76/0xa0 + load_elf_binary+0x102/0x1650 + search_binary_handler+0x97/0x1d0 + do_execveat_common.isra.34+0x551/0x6e0 + SyS_execve+0x3a/0x50 + return_from_execve+0x0/0x23 + } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048 + { stacktrace: + kmem_cache_alloc_trace+0xeb/0x150 + apparmor_file_alloc_security+0x27/0x40 + security_file_alloc+0x16/0x20 + get_empty_filp+0x93/0x1c0 + path_openat+0x31/0x5f0 + do_filp_open+0x3a/0x90 + do_sys_open+0x128/0x220 + SyS_open+0x1e/0x20 + system_call_fastpath+0x12/0x6a + } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376 + { stacktrace: + __kmalloc+0x11b/0x1b0 + seq_buf_alloc+0x1b/0x50 + seq_read+0x2cc/0x370 + proc_reg_read+0x3d/0x80 + __vfs_read+0x28/0xe0 + vfs_read+0x86/0x140 + SyS_read+0x46/0xb0 + system_call_fastpath+0x12/0x6a + } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768 + + Totals: + Hits: 6085872 + Entries: 253 + Dropped: 0 + + If you key a hist trigger on common_pid, in order for example to + gather and display sorted totals for each process, you can use the + special .execname modifier to display the executable names for the + processes in the table rather than raw pids. The example below + keeps a per-process sum of total bytes read: + + # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist + # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active] + + { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512 + { common_pid: Xorg [ 1309] } hitcount: 525 count: 256640 + { common_pid: compiz [ 2889] } hitcount: 59 count: 254400 + { common_pid: bash [ 8710] } hitcount: 3 count: 66369 + { common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739 + { common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648 + { common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216 + { common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396 + { common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264 + { common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424 + { common_pid: gmain [ 1315] } hitcount: 18 count: 6336 + . + . + . + { common_pid: postgres [ 1892] } hitcount: 2 count: 32 + { common_pid: postgres [ 1891] } hitcount: 2 count: 32 + { common_pid: gmain [ 8704] } hitcount: 2 count: 32 + { common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21 + { common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16 + { common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16 + { common_pid: gdbus [ 2998] } hitcount: 1 count: 16 + { common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8 + { common_pid: init [ 1] } hitcount: 2 count: 2 + + Totals: + Hits: 2116 + Entries: 51 + Dropped: 0 + + Similarly, if you key a hist trigger on syscall id, for example to + gather and display a list of systemwide syscall hits, you can use + the special .syscall modifier to display the syscall names rather + than raw ids. The example below keeps a running total of syscall + counts for the system during the run: + + # echo 'hist:key=id.syscall:val=hitcount' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active] + + { id: sys_fsync [ 74] } hitcount: 1 + { id: sys_newuname [ 63] } hitcount: 1 + { id: sys_prctl [157] } hitcount: 1 + { id: sys_statfs [137] } hitcount: 1 + { id: sys_symlink [ 88] } hitcount: 1 + { id: sys_sendmmsg [307] } hitcount: 1 + { id: sys_semctl [ 66] } hitcount: 1 + { id: sys_readlink [ 89] } hitcount: 3 + { id: sys_bind [ 49] } hitcount: 3 + { id: sys_getsockname [ 51] } hitcount: 3 + { id: sys_unlink [ 87] } hitcount: 3 + { id: sys_rename [ 82] } hitcount: 4 + { id: unknown_syscall [ 58] } hitcount: 4 + { id: sys_connect [ 42] } hitcount: 4 + { id: sys_getpid [ 39] } hitcount: 4 + . + . + . + { id: sys_rt_sigprocmask [ 14] } hitcount: 952 + { id: sys_futex [202] } hitcount: 1534 + { id: sys_write [ 1] } hitcount: 2689 + { id: sys_setitimer [ 38] } hitcount: 2797 + { id: sys_read [ 0] } hitcount: 3202 + { id: sys_select [ 23] } hitcount: 3773 + { id: sys_writev [ 20] } hitcount: 4531 + { id: sys_poll [ 7] } hitcount: 8314 + { id: sys_recvmsg [ 47] } hitcount: 13738 + { id: sys_ioctl [ 16] } hitcount: 21843 + + Totals: + Hits: 67612 + Entries: 72 + Dropped: 0 + + The syscall counts above provide a rough overall picture of system + call activity on the system; we can see for example that the most + popular system call on this system was the 'sys_ioctl' system call. + + We can use 'compound' keys to refine that number and provide some + further insight as to which processes exactly contribute to the + overall ioctl count. + + The command below keeps a hitcount for every unique combination of + system call id and pid - the end result is essentially a table + that keeps a per-pid sum of system call hits. The results are + sorted using the system call id as the primary key, and the + hitcount sum as the secondary key: + + # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active] + + { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1 + { id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1 + { id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1 + { id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1 + { id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2 + { id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2 + { id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2 + { id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2 + { id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2 + { id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2 + . + . + . + { id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12 + { id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16 + { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808 + { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580 + . + . + . + { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3 + { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6 + { id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2 + { id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4 + { id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6 + + Totals: + Hits: 31536 + Entries: 323 + Dropped: 0 + + The above list does give us a breakdown of the ioctl syscall by + pid, but it also gives us quite a bit more than that, which we + don't really care about at the moment. Since we know the syscall + id for sys_ioctl (16, displayed next to the sys_ioctl name), we + can use that to filter out all the other syscalls: + + # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active] + + { id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: evolution-addre [ 8571] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: gmain [ 3003] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: gmain [ 2781] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: gmain [ 2829] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: bash [ 8726] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: bash [ 8508] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: gmain [ 2970] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: gmain [ 2768] } hitcount: 1 + . + . + . + { id: sys_ioctl [ 16], common_pid: pool [ 8559] } hitcount: 45 + { id: sys_ioctl [ 16], common_pid: pool [ 8555] } hitcount: 48 + { id: sys_ioctl [ 16], common_pid: pool [ 8551] } hitcount: 48 + { id: sys_ioctl [ 16], common_pid: avahi-daemon [ 896] } hitcount: 66 + { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 26674 + { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 73443 + + Totals: + Hits: 101162 + Entries: 103 + Dropped: 0 + + The above output shows that 'compiz' and 'Xorg' are far and away + the heaviest ioctl callers (which might lead to questions about + whether they really need to be making all those calls and to + possible avenues for further investigation.) + + The compound key examples used a key and a sum value (hitcount) to + sort the output, but we can just as easily use two keys instead. + Here's an example where we use a compound key composed of the the + common_pid and size event fields. Sorting with pid as the primary + key and 'size' as the secondary key allows us to display an + ordered summary of the recvfrom sizes, with counts, received by + each process: + + # echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger + + # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist + # trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active] + + { common_pid: smbd [ 784], size: 4 } hitcount: 1 + { common_pid: dnsmasq [ 1412], size: 4096 } hitcount: 672 + { common_pid: postgres [ 1796], size: 1000 } hitcount: 6 + { common_pid: postgres [ 1867], size: 1000 } hitcount: 10 + { common_pid: bamfdaemon [ 2787], size: 28 } hitcount: 2 + { common_pid: bamfdaemon [ 2787], size: 14360 } hitcount: 1 + { common_pid: compiz [ 2994], size: 8 } hitcount: 1 + { common_pid: compiz [ 2994], size: 20 } hitcount: 11 + { common_pid: gnome-terminal [ 3199], size: 4 } hitcount: 2 + { common_pid: firefox [ 8817], size: 4 } hitcount: 1 + { common_pid: firefox [ 8817], size: 8 } hitcount: 5 + { common_pid: firefox [ 8817], size: 588 } hitcount: 2 + { common_pid: firefox [ 8817], size: 628 } hitcount: 1 + { common_pid: firefox [ 8817], size: 6944 } hitcount: 1 + { common_pid: firefox [ 8817], size: 408880 } hitcount: 2 + { common_pid: firefox [ 8822], size: 8 } hitcount: 2 + { common_pid: firefox [ 8822], size: 160 } hitcount: 2 + { common_pid: firefox [ 8822], size: 320 } hitcount: 2 + { common_pid: firefox [ 8822], size: 352 } hitcount: 1 + . + . + . + { common_pid: pool [ 8923], size: 1960 } hitcount: 10 + { common_pid: pool [ 8923], size: 2048 } hitcount: 10 + { common_pid: pool [ 8924], size: 1960 } hitcount: 10 + { common_pid: pool [ 8924], size: 2048 } hitcount: 10 + { common_pid: pool [ 8928], size: 1964 } hitcount: 4 + { common_pid: pool [ 8928], size: 1965 } hitcount: 2 + { common_pid: pool [ 8928], size: 2048 } hitcount: 6 + { common_pid: pool [ 8929], size: 1982 } hitcount: 1 + { common_pid: pool [ 8929], size: 2048 } hitcount: 1 + + Totals: + Hits: 2016 + Entries: 224 + Dropped: 0 + + The above example also illustrates the fact that although a compound + key is treated as a single entity for hashing purposes, the sub-keys + it's composed of can be accessed independently. + + The next example uses a string field as the hash key and + demonstrates how you can manually pause and continue a hist trigger. + In this example, we'll aggregate fork counts and don't expect a + large number of entries in the hash table, so we'll drop it to a + much smaller number, say 256: + + # echo 'hist:key=child_comm:val=hitcount:size=256' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active] + + { child_comm: dconf worker } hitcount: 1 + { child_comm: ibus-daemon } hitcount: 1 + { child_comm: whoopsie } hitcount: 1 + { child_comm: smbd } hitcount: 1 + { child_comm: gdbus } hitcount: 1 + { child_comm: kthreadd } hitcount: 1 + { child_comm: dconf worker } hitcount: 1 + { child_comm: evolution-alarm } hitcount: 2 + { child_comm: Socket Thread } hitcount: 2 + { child_comm: postgres } hitcount: 2 + { child_comm: bash } hitcount: 3 + { child_comm: compiz } hitcount: 3 + { child_comm: evolution-sourc } hitcount: 4 + { child_comm: dhclient } hitcount: 4 + { child_comm: pool } hitcount: 5 + { child_comm: nm-dispatcher.a } hitcount: 8 + { child_comm: firefox } hitcount: 8 + { child_comm: dbus-daemon } hitcount: 8 + { child_comm: glib-pacrunner } hitcount: 10 + { child_comm: evolution } hitcount: 23 + + Totals: + Hits: 89 + Entries: 20 + Dropped: 0 + + If we want to pause the hist trigger, we can simply append :pause to + the command that started the trigger. Notice that the trigger info + displays as [paused]: + + # echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \ + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused] + + { child_comm: dconf worker } hitcount: 1 + { child_comm: kthreadd } hitcount: 1 + { child_comm: dconf worker } hitcount: 1 + { child_comm: gdbus } hitcount: 1 + { child_comm: ibus-daemon } hitcount: 1 + { child_comm: Socket Thread } hitcount: 2 + { child_comm: evolution-alarm } hitcount: 2 + { child_comm: smbd } hitcount: 2 + { child_comm: bash } hitcount: 3 + { child_comm: whoopsie } hitcount: 3 + { child_comm: compiz } hitcount: 3 + { child_comm: evolution-sourc } hitcount: 4 + { child_comm: pool } hitcount: 5 + { child_comm: postgres } hitcount: 6 + { child_comm: firefox } hitcount: 8 + { child_comm: dhclient } hitcount: 10 + { child_comm: emacs } hitcount: 12 + { child_comm: dbus-daemon } hitcount: 20 + { child_comm: nm-dispatcher.a } hitcount: 20 + { child_comm: evolution } hitcount: 35 + { child_comm: glib-pacrunner } hitcount: 59 + + Totals: + Hits: 199 + Entries: 21 + Dropped: 0 + + To manually continue having the trigger aggregate events, append + :cont instead. Notice that the trigger info displays as [active] + again, and the data has changed: + + # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \ + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active] + + { child_comm: dconf worker } hitcount: 1 + { child_comm: dconf worker } hitcount: 1 + { child_comm: kthreadd } hitcount: 1 + { child_comm: gdbus } hitcount: 1 + { child_comm: ibus-daemon } hitcount: 1 + { child_comm: Socket Thread } hitcount: 2 + { child_comm: evolution-alarm } hitcount: 2 + { child_comm: smbd } hitcount: 2 + { child_comm: whoopsie } hitcount: 3 + { child_comm: compiz } hitcount: 3 + { child_comm: evolution-sourc } hitcount: 4 + { child_comm: bash } hitcount: 5 + { child_comm: pool } hitcount: 5 + { child_comm: postgres } hitcount: 6 + { child_comm: firefox } hitcount: 8 + { child_comm: dhclient } hitcount: 11 + { child_comm: emacs } hitcount: 12 + { child_comm: dbus-daemon } hitcount: 22 + { child_comm: nm-dispatcher.a } hitcount: 22 + { child_comm: evolution } hitcount: 35 + { child_comm: glib-pacrunner } hitcount: 59 + + Totals: + Hits: 206 + Entries: 21 + Dropped: 0 + + The previous example showed how to start and stop a hist trigger by + appending 'pause' and 'continue' to the hist trigger command. A + hist trigger can also be started in a paused state by initially + starting the trigger with ':pause' appended. This allows you to + start the trigger only when you're ready to start collecting data + and not before. For example, you could start the trigger in a + paused state, then unpause it and do something you want to measure, + then pause the trigger again when done. + + Of course, doing this manually can be difficult and error-prone, but + it is possible to automatically start and stop a hist trigger based + on some condition, via the enable_hist and disable_hist triggers. + + For example, suppose we wanted to take a look at the relative + weights in terms of skb length for each callpath that leads to a + netif_receieve_skb event when downloading a decent-sized file using + wget. + + First we set up an initially paused stacktrace trigger on the + netif_receive_skb event: + + # echo 'hist:key=stacktrace:vals=len:pause' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + Next, we set up an 'enable_hist' trigger on the sched_process_exec + event, with an 'if filename==/usr/bin/wget' filter. The effect of + this new trigger is that it will 'unpause' the hist trigger we just + set up on netif_receive_skb if and only if it sees a + sched_process_exec event with a filename of '/usr/bin/wget'. When + that happens, all netif_receive_skb events are aggregated into a + hash table keyed on stacktrace: + + # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + The aggregation continues until the netif_receive_skb is paused + again, which is what the following disable_hist event does by + creating a similar setup on the sched_process_exit event, using the + filter 'comm==wget': + + # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + Whenever a process exits and the comm field of the disable_hist + trigger filter matches 'comm==wget', the netif_receive_skb hist + trigger is disabled. + + The overall effect is that netif_receive_skb events are aggregated + into the hash table for only the duration of the wget. Executing a + wget command and then listing the 'hist' file will display the + output generated by the wget command: + + $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz + + # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist + # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] + + { stacktrace: + __netif_receive_skb_core+0x46d/0x990 + __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x23/0x90 + napi_gro_receive+0xc8/0x100 + ieee80211_deliver_skb+0xd6/0x270 [mac80211] + ieee80211_rx_handlers+0xccf/0x22f0 [mac80211] + ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211] + ieee80211_rx+0x31d/0x900 [mac80211] + iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm] + iwl_rx_dispatch+0x8e/0xf0 [iwldvm] + iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi] + irq_thread_fn+0x20/0x50 + irq_thread+0x11f/0x150 + kthread+0xd2/0xf0 + ret_from_fork+0x42/0x70 + } hitcount: 85 len: 28884 + { stacktrace: + __netif_receive_skb_core+0x46d/0x990 + __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x23/0x90 + napi_gro_complete+0xa4/0xe0 + dev_gro_receive+0x23a/0x360 + napi_gro_receive+0x30/0x100 + ieee80211_deliver_skb+0xd6/0x270 [mac80211] + ieee80211_rx_handlers+0xccf/0x22f0 [mac80211] + ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211] + ieee80211_rx+0x31d/0x900 [mac80211] + iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm] + iwl_rx_dispatch+0x8e/0xf0 [iwldvm] + iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi] + irq_thread_fn+0x20/0x50 + irq_thread+0x11f/0x150 + kthread+0xd2/0xf0 + } hitcount: 98 len: 664329 + { stacktrace: + __netif_receive_skb_core+0x46d/0x990 + __netif_receive_skb+0x18/0x60 + process_backlog+0xa8/0x150 + net_rx_action+0x15d/0x340 + __do_softirq+0x114/0x2c0 + do_softirq_own_stack+0x1c/0x30 + do_softirq+0x65/0x70 + __local_bh_enable_ip+0xb5/0xc0 + ip_finish_output+0x1f4/0x840 + ip_output+0x6b/0xc0 + ip_local_out_sk+0x31/0x40 + ip_send_skb+0x1a/0x50 + udp_send_skb+0x173/0x2a0 + udp_sendmsg+0x2bf/0x9f0 + inet_sendmsg+0x64/0xa0 + sock_sendmsg+0x3d/0x50 + } hitcount: 115 len: 13030 + { stacktrace: + __netif_receive_skb_core+0x46d/0x990 + __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x23/0x90 + napi_gro_complete+0xa4/0xe0 + napi_gro_flush+0x6d/0x90 + iwl_pcie_irq_handler+0x92a/0x12f0 [iwlwifi] + irq_thread_fn+0x20/0x50 + irq_thread+0x11f/0x150 + kthread+0xd2/0xf0 + ret_from_fork+0x42/0x70 + } hitcount: 934 len: 5512212 + + Totals: + Hits: 1232 + Entries: 4 + Dropped: 0 + + The above shows all the netif_receive_skb callpaths and their total + lengths for the duration of the wget command. + + The 'clear' hist trigger param can be used to clear the hash table. + Suppose we wanted to try another run of the previous example but + this time also wanted to see the complete list of events that went + into the histogram. In order to avoid having to set everything up + again, we can just clear the histogram first: + + # echo 'hist:key=stacktrace:vals=len:clear' >> \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + Just to verify that it is in fact cleared, here's what we now see in + the hist file: + + # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist + # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] + + Totals: + Hits: 0 + Entries: 0 + Dropped: 0 + + Since we want to see the detailed list of every netif_receive_skb + event occurring during the new run, which are in fact the same + events being aggregated into the hash table, we add some additional + 'enable_event' events to the triggering sched_process_exec and + sched_process_exit events as such: + + # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + # echo 'disable_event:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + If you read the trigger files for the sched_process_exec and + sched_process_exit triggers, you should see two triggers for each: + one enabling/disabling the hist aggregation and the other + enabling/disabling the logging of events: + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget + enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + enable_event:net:netif_receive_skb:unlimited if comm==wget + disable_hist:net:netif_receive_skb:unlimited if comm==wget + + In other words, whenever either of the sched_process_exec or + sched_process_exit events is hit and matches 'wget', it enables or + disables both the histogram and the event log, and what you end up + with is a hash table and set of events just covering the specified + duration. Run the wget command again: + + $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz + + Displaying the 'hist' file should show something similar to what you + saw in the last run, but this time you should also see the + individual events in the trace file: + + # cat /sys/kernel/debug/tracing/trace + + # tracer: nop + # + # entries-in-buffer/entries-written: 183/1426 #P:4 + # + # _-----=> irqs-off + # / _----=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / delay + # TASK-PID CPU# |||| TIMESTAMP FUNCTION + # | | | |||| | | + wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60 + wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60 + dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130 + dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138 + ##### CPU 2 buffer started #### + irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948 + irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500 + irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948 + irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948 + irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500 + . + . + . + + The following example demonstrates how multiple hist triggers can be + attached to a given event. This capability can be useful for + creating a set of different summaries derived from the same set of + events, or for comparing the effects of different filters, among + other things. + + # echo 'hist:keys=skbaddr.hex:vals=len if len < 0' >> \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' >> \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:keys=skbaddr.hex:vals=len if len == 256' >> \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:keys=skbaddr.hex:vals=len' >> \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:keys=len:vals=common_preempt_count' >> \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + The above set of commands create four triggers differing only in + their filters, along with a completely different though fairly + nonsensical trigger. Note that in order to append multiple hist + triggers to the same file, you should use the '>>' operator to + append them ('>' will also add the new hist trigger, but will remove + any existing hist triggers beforehand). + + Displaying the contents of the 'hist' file for the event shows the + contents of all five histograms: + + # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist + + # event histogram + # + # trigger info: hist:keys=len:vals=hitcount,common_preempt_count:sort=hitcount:size=2048 [active] + # + + { len: 176 } hitcount: 1 common_preempt_count: 0 + { len: 223 } hitcount: 1 common_preempt_count: 0 + { len: 4854 } hitcount: 1 common_preempt_count: 0 + { len: 395 } hitcount: 1 common_preempt_count: 0 + { len: 177 } hitcount: 1 common_preempt_count: 0 + { len: 446 } hitcount: 1 common_preempt_count: 0 + { len: 1601 } hitcount: 1 common_preempt_count: 0 + . + . + . + { len: 1280 } hitcount: 66 common_preempt_count: 0 + { len: 116 } hitcount: 81 common_preempt_count: 40 + { len: 708 } hitcount: 112 common_preempt_count: 0 + { len: 46 } hitcount: 221 common_preempt_count: 0 + { len: 1264 } hitcount: 458 common_preempt_count: 0 + + Totals: + Hits: 1428 + Entries: 147 + Dropped: 0 + + + # event histogram + # + # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active] + # + + { skbaddr: ffff8800baee5e00 } hitcount: 1 len: 130 + { skbaddr: ffff88005f3d5600 } hitcount: 1 len: 1280 + { skbaddr: ffff88005f3d4900 } hitcount: 1 len: 1280 + { skbaddr: ffff88009fed6300 } hitcount: 1 len: 115 + { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 115 + { skbaddr: ffff88008cdb1900 } hitcount: 1 len: 46 + { skbaddr: ffff880064b5ef00 } hitcount: 1 len: 118 + { skbaddr: ffff880044e3c700 } hitcount: 1 len: 60 + { skbaddr: ffff880100065900 } hitcount: 1 len: 46 + { skbaddr: ffff8800d46bd500 } hitcount: 1 len: 116 + { skbaddr: ffff88005f3d5f00 } hitcount: 1 len: 1280 + { skbaddr: ffff880100064700 } hitcount: 1 len: 365 + { skbaddr: ffff8800badb6f00 } hitcount: 1 len: 60 + . + . + . + { skbaddr: ffff88009fe0be00 } hitcount: 27 len: 24677 + { skbaddr: ffff88009fe0a400 } hitcount: 27 len: 23052 + { skbaddr: ffff88009fe0b700 } hitcount: 31 len: 25589 + { skbaddr: ffff88009fe0b600 } hitcount: 32 len: 27326 + { skbaddr: ffff88006a462800 } hitcount: 68 len: 71678 + { skbaddr: ffff88006a463700 } hitcount: 70 len: 72678 + { skbaddr: ffff88006a462b00 } hitcount: 71 len: 77589 + { skbaddr: ffff88006a463600 } hitcount: 73 len: 71307 + { skbaddr: ffff88006a462200 } hitcount: 81 len: 81032 + + Totals: + Hits: 1451 + Entries: 318 + Dropped: 0 + + + # event histogram + # + # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len == 256 [active] + # + + + Totals: + Hits: 0 + Entries: 0 + Dropped: 0 + + + # event histogram + # + # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len > 4096 [active] + # + + { skbaddr: ffff88009fd2c300 } hitcount: 1 len: 7212 + { skbaddr: ffff8800d2bcce00 } hitcount: 1 len: 7212 + { skbaddr: ffff8800d2bcd700 } hitcount: 1 len: 7212 + { skbaddr: ffff8800d2bcda00 } hitcount: 1 len: 21492 + { skbaddr: ffff8800ae2e2d00 } hitcount: 1 len: 7212 + { skbaddr: ffff8800d2bcdb00 } hitcount: 1 len: 7212 + { skbaddr: ffff88006a4df500 } hitcount: 1 len: 4854 + { skbaddr: ffff88008ce47b00 } hitcount: 1 len: 18636 + { skbaddr: ffff8800ae2e2200 } hitcount: 1 len: 12924 + { skbaddr: ffff88005f3e1000 } hitcount: 1 len: 4356 + { skbaddr: ffff8800d2bcdc00 } hitcount: 2 len: 24420 + { skbaddr: ffff8800d2bcc200 } hitcount: 2 len: 12996 + + Totals: + Hits: 14 + Entries: 12 + Dropped: 0 + + + # event histogram + # + # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len < 0 [active] + # + + + Totals: + Hits: 0 + Entries: 0 + Dropped: 0 + + Named triggers can be used to have triggers share a common set of + histogram data. This capability is mostly useful for combining the + output of events generated by tracepoints contained inside inline + functions, but names can be used in a hist trigger on any event. + For example, these two triggers when hit will update the same 'len' + field in the shared 'foo' histogram data: + + # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + You can see that they're updating common histogram data by reading + each event's hist files at the same time: + + # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist; + cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + + # event histogram + # + # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active] + # + + { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46 + { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76 + { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468 + { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46 + { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52 + { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168 + { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260 + { skbaddr: ffff880064505000 } hitcount: 1 len: 46 + { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32 + { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46 + { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44 + { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168 + { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40 + { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40 + { skbaddr: ffff880064505f00 } hitcount: 1 len: 174 + { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160 + { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76 + { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32 + { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988 + { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46 + { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44 + { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676 + { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107 + { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92 + { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142 + { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220 + { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92 + { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92 + { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675 + { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138 + { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138 + { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184 + { skbaddr: ffff880064504400 } hitcount: 4 len: 184 + { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184 + { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230 + { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196 + { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276 + { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276 + + Totals: + Hits: 81 + Entries: 42 + Dropped: 0 + # event histogram + # + # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active] + # + + { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46 + { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76 + { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468 + { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46 + { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52 + { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168 + { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260 + { skbaddr: ffff880064505000 } hitcount: 1 len: 46 + { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32 + { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46 + { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44 + { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168 + { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40 + { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40 + { skbaddr: ffff880064505f00 } hitcount: 1 len: 174 + { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160 + { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76 + { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32 + { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46 + { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988 + { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46 + { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44 + { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676 + { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107 + { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92 + { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142 + { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220 + { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92 + { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92 + { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675 + { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138 + { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138 + { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184 + { skbaddr: ffff880064504400 } hitcount: 4 len: 184 + { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184 + { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230 + { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196 + { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276 + { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276 + + Totals: + Hits: 81 + Entries: 42 + Dropped: 0 + + And here's an example that shows how to combine histogram data from + any two events even if they don't share any 'compatible' fields + other than 'hitcount' and 'stacktrace'. These commands create a + couple of triggers named 'bar' using those fields: + + # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + And displaying the output of either shows some interesting if + somewhat confusing output: + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + + # event histogram + # + # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active] + # + + { stacktrace: + _do_fork+0x18e/0x330 + kernel_thread+0x29/0x30 + kthreadd+0x154/0x1b0 + ret_from_fork+0x3f/0x70 + } hitcount: 1 + { stacktrace: + netif_rx_internal+0xb2/0xd0 + netif_rx_ni+0x20/0x70 + dev_loopback_xmit+0xaa/0xd0 + ip_mc_output+0x126/0x240 + ip_local_out_sk+0x31/0x40 + igmp_send_report+0x1e9/0x230 + igmp_timer_expire+0xe9/0x120 + call_timer_fn+0x39/0xf0 + run_timer_softirq+0x1e1/0x290 + __do_softirq+0xfd/0x290 + irq_exit+0x98/0xb0 + smp_apic_timer_interrupt+0x4a/0x60 + apic_timer_interrupt+0x6d/0x80 + cpuidle_enter+0x17/0x20 + call_cpuidle+0x3b/0x60 + cpu_startup_entry+0x22d/0x310 + } hitcount: 1 + { stacktrace: + netif_rx_internal+0xb2/0xd0 + netif_rx_ni+0x20/0x70 + dev_loopback_xmit+0xaa/0xd0 + ip_mc_output+0x17f/0x240 + ip_local_out_sk+0x31/0x40 + ip_send_skb+0x1a/0x50 + udp_send_skb+0x13e/0x270 + udp_sendmsg+0x2bf/0x980 + inet_sendmsg+0x67/0xa0 + sock_sendmsg+0x38/0x50 + SYSC_sendto+0xef/0x170 + SyS_sendto+0xe/0x10 + entry_SYSCALL_64_fastpath+0x12/0x6a + } hitcount: 2 + { stacktrace: + netif_rx_internal+0xb2/0xd0 + netif_rx+0x1c/0x60 + loopback_xmit+0x6c/0xb0 + dev_hard_start_xmit+0x219/0x3a0 + __dev_queue_xmit+0x415/0x4f0 + dev_queue_xmit_sk+0x13/0x20 + ip_finish_output2+0x237/0x340 + ip_finish_output+0x113/0x1d0 + ip_output+0x66/0xc0 + ip_local_out_sk+0x31/0x40 + ip_send_skb+0x1a/0x50 + udp_send_skb+0x16d/0x270 + udp_sendmsg+0x2bf/0x980 + inet_sendmsg+0x67/0xa0 + sock_sendmsg+0x38/0x50 + ___sys_sendmsg+0x14e/0x270 + } hitcount: 76 + { stacktrace: + netif_rx_internal+0xb2/0xd0 + netif_rx+0x1c/0x60 + loopback_xmit+0x6c/0xb0 + dev_hard_start_xmit+0x219/0x3a0 + __dev_queue_xmit+0x415/0x4f0 + dev_queue_xmit_sk+0x13/0x20 + ip_finish_output2+0x237/0x340 + ip_finish_output+0x113/0x1d0 + ip_output+0x66/0xc0 + ip_local_out_sk+0x31/0x40 + ip_send_skb+0x1a/0x50 + udp_send_skb+0x16d/0x270 + udp_sendmsg+0x2bf/0x980 + inet_sendmsg+0x67/0xa0 + sock_sendmsg+0x38/0x50 + ___sys_sendmsg+0x269/0x270 + } hitcount: 77 + { stacktrace: + netif_rx_internal+0xb2/0xd0 + netif_rx+0x1c/0x60 + loopback_xmit+0x6c/0xb0 + dev_hard_start_xmit+0x219/0x3a0 + __dev_queue_xmit+0x415/0x4f0 + dev_queue_xmit_sk+0x13/0x20 + ip_finish_output2+0x237/0x340 + ip_finish_output+0x113/0x1d0 + ip_output+0x66/0xc0 + ip_local_out_sk+0x31/0x40 + ip_send_skb+0x1a/0x50 + udp_send_skb+0x16d/0x270 + udp_sendmsg+0x2bf/0x980 + inet_sendmsg+0x67/0xa0 + sock_sendmsg+0x38/0x50 + SYSC_sendto+0xef/0x170 + } hitcount: 88 + { stacktrace: + _do_fork+0x18e/0x330 + SyS_clone+0x19/0x20 + entry_SYSCALL_64_fastpath+0x12/0x6a + } hitcount: 244 + + Totals: + Hits: 489 + Entries: 7 + Dropped: 0 diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 9857606dd7b7..a6b3705e62a6 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -210,6 +210,11 @@ of ftrace. Here is a list of some of the key files: Note, sched_switch and sched_wake_up will also trace events listed in this file. + To have the PIDs of children of tasks with their PID in this file + added on fork, enable the "event-fork" option. That option will also + cause the PIDs of tasks to be removed from this file when the task + exits. + set_graph_function: Set a "trigger" function where tracing should start @@ -725,16 +730,14 @@ noraw nohex nobin noblock -nostacktrace trace_printk -noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj noprintk-msg-only context-info -latency-format +nolatency-format sleep-time graph-time record-cmd @@ -742,7 +745,10 @@ overwrite nodisable_on_free irq-info markers +noevent-fork function-trace +nodisplay-graph +nostacktrace To disable one of the options, echo in the option prepended with "no". @@ -796,11 +802,6 @@ Here are the available options: block - When set, reading trace_pipe will not block when polled. - stacktrace - This is one of the options that changes the trace - itself. When a trace is recorded, so is the stack - of functions. This allows for back traces of - trace sites. - trace_printk - Can disable trace_printk() from writing into the buffer. branch - Enable branch tracing with the tracer. @@ -897,6 +898,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] When disabled, the trace_marker will error with EINVAL on write. + event-fork - When set, tasks with PIDs listed in set_event_pid will have + the PIDs of their children added to set_event_pid when those + tasks fork. Also, when tasks with PIDs in set_event_pid exit, + their PIDs will be removed from the file. function-trace - The latency tracers will enable function tracing if this option is enabled (default it is). When @@ -904,8 +909,17 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] functions. This keeps the overhead of the tracer down when performing latency tests. - Note: Some tracers have their own options. They only appear - when the tracer is active. + display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will + use function graph tracing instead of function tracing. + + stacktrace - This is one of the options that changes the trace + itself. When a trace is recorded, so is the stack + of functions. This allows for back traces of + trace sites. + + Note: Some tracers have their own options. They only appear in this + file when the tracer is active. They always appear in the + options directory. diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 222f6aa0418f..be007610ceb0 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -154,21 +154,6 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_buffer, struct trace_event_file *trace_file, int type, unsigned long len, unsigned long flags, int pc); -struct ring_buffer_event * -trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer, - int type, unsigned long len, - unsigned long flags, int pc); -void trace_buffer_unlock_commit(struct trace_array *tr, - struct ring_buffer *buffer, - struct ring_buffer_event *event, - unsigned long flags, int pc); -void trace_buffer_unlock_commit_regs(struct trace_array *tr, - struct ring_buffer *buffer, - struct ring_buffer_event *event, - unsigned long flags, int pc, - struct pt_regs *regs); -void trace_current_buffer_discard_commit(struct ring_buffer *buffer, - struct ring_buffer_event *event); void tracing_record_cmdline(struct task_struct *tsk); @@ -229,7 +214,6 @@ enum { TRACE_EVENT_FL_NO_SET_FILTER_BIT, TRACE_EVENT_FL_IGNORE_ENABLE_BIT, TRACE_EVENT_FL_WAS_ENABLED_BIT, - TRACE_EVENT_FL_USE_CALL_FILTER_BIT, TRACE_EVENT_FL_TRACEPOINT_BIT, TRACE_EVENT_FL_KPROBE_BIT, TRACE_EVENT_FL_UPROBE_BIT, @@ -244,7 +228,6 @@ enum { * WAS_ENABLED - Set and stays set when an event was ever enabled * (used for module unloading, if a module event is enabled, * it is best to clear the buffers that used it). - * USE_CALL_FILTER - For trace internal events, don't use file filter * TRACEPOINT - Event is a tracepoint * KPROBE - Event is a kprobe * UPROBE - Event is a uprobe @@ -255,7 +238,6 @@ enum { TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT), TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT), TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT), - TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT), TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT), TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT), TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT), @@ -407,16 +389,12 @@ enum event_trigger_type { ETT_SNAPSHOT = (1 << 1), ETT_STACKTRACE = (1 << 2), ETT_EVENT_ENABLE = (1 << 3), + ETT_EVENT_HIST = (1 << 4), + ETT_HIST_ENABLE = (1 << 5), }; extern int filter_match_preds(struct event_filter *filter, void *rec); -extern int filter_check_discard(struct trace_event_file *file, void *rec, - struct ring_buffer *buffer, - struct ring_buffer_event *event); -extern int call_filter_check_discard(struct trace_event_call *call, void *rec, - struct ring_buffer *buffer, - struct ring_buffer_event *event); extern enum event_trigger_type event_triggers_call(struct trace_event_file *file, void *rec); extern void event_triggers_post_call(struct trace_event_file *file, @@ -450,100 +428,6 @@ trace_trigger_soft_disabled(struct trace_event_file *file) return false; } -/* - * Helper function for event_trigger_unlock_commit{_regs}(). - * If there are event triggers attached to this event that requires - * filtering against its fields, then they wil be called as the - * entry already holds the field information of the current event. - * - * It also checks if the event should be discarded or not. - * It is to be discarded if the event is soft disabled and the - * event was only recorded to process triggers, or if the event - * filter is active and this event did not match the filters. - * - * Returns true if the event is discarded, false otherwise. - */ -static inline bool -__event_trigger_test_discard(struct trace_event_file *file, - struct ring_buffer *buffer, - struct ring_buffer_event *event, - void *entry, - enum event_trigger_type *tt) -{ - unsigned long eflags = file->flags; - - if (eflags & EVENT_FILE_FL_TRIGGER_COND) - *tt = event_triggers_call(file, entry); - - if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags)) - ring_buffer_discard_commit(buffer, event); - else if (!filter_check_discard(file, entry, buffer, event)) - return false; - - return true; -} - -/** - * event_trigger_unlock_commit - handle triggers and finish event commit - * @file: The file pointer assoctiated to the event - * @buffer: The ring buffer that the event is being written to - * @event: The event meta data in the ring buffer - * @entry: The event itself - * @irq_flags: The state of the interrupts at the start of the event - * @pc: The state of the preempt count at the start of the event. - * - * This is a helper function to handle triggers that require data - * from the event itself. It also tests the event against filters and - * if the event is soft disabled and should be discarded. - */ -static inline void -event_trigger_unlock_commit(struct trace_event_file *file, - struct ring_buffer *buffer, - struct ring_buffer_event *event, - void *entry, unsigned long irq_flags, int pc) -{ - enum event_trigger_type tt = ETT_NONE; - - if (!__event_trigger_test_discard(file, buffer, event, entry, &tt)) - trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc); - - if (tt) - event_triggers_post_call(file, tt, entry); -} - -/** - * event_trigger_unlock_commit_regs - handle triggers and finish event commit - * @file: The file pointer assoctiated to the event - * @buffer: The ring buffer that the event is being written to - * @event: The event meta data in the ring buffer - * @entry: The event itself - * @irq_flags: The state of the interrupts at the start of the event - * @pc: The state of the preempt count at the start of the event. - * - * This is a helper function to handle triggers that require data - * from the event itself. It also tests the event against filters and - * if the event is soft disabled and should be discarded. - * - * Same as event_trigger_unlock_commit() but calls - * trace_buffer_unlock_commit_regs() instead of trace_buffer_unlock_commit(). - */ -static inline void -event_trigger_unlock_commit_regs(struct trace_event_file *file, - struct ring_buffer *buffer, - struct ring_buffer_event *event, - void *entry, unsigned long irq_flags, int pc, - struct pt_regs *regs) -{ - enum event_trigger_type tt = ETT_NONE; - - if (!__event_trigger_test_discard(file, buffer, event, entry, &tt)) - trace_buffer_unlock_commit_regs(file->tr, buffer, event, - irq_flags, pc, regs); - - if (tt) - event_triggers_post_call(file, tt, entry); -} - #ifdef CONFIG_BPF_EVENTS unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx); #else diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e45db6b0d878..fafeaf803bd0 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -528,6 +528,32 @@ config MMIOTRACE See Documentation/trace/mmiotrace.txt. If you are not helping to develop drivers, say N. +config TRACING_MAP + bool + depends on ARCH_HAVE_NMI_SAFE_CMPXCHG + help + tracing_map is a special-purpose lock-free map for tracing, + separated out as a stand-alone facility in order to allow it + to be shared between multiple tracers. It isn't meant to be + generally used outside of that context, and is normally + selected by tracers that use it. + +config HIST_TRIGGERS + bool "Histogram triggers" + depends on ARCH_HAVE_NMI_SAFE_CMPXCHG + select TRACING_MAP + default n + help + Hist triggers allow one or more arbitrary trace event fields + to be aggregated into hash tables and dumped to stdout by + reading a debugfs/tracefs file. They're useful for + gathering quick and dirty (though precise) summaries of + event activity as an initial guide for further investigation + using more advanced tools. + + See Documentation/trace/events.txt. + If in doubt, say N. + config MMIOTRACE_TEST tristate "Test module for mmiotrace" depends on MMIOTRACE && m diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 9b1044e936a6..979e7bfbde7a 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -31,6 +31,7 @@ obj-$(CONFIG_TRACING) += trace_output.o obj-$(CONFIG_TRACING) += trace_seq.o obj-$(CONFIG_TRACING) += trace_stat.o obj-$(CONFIG_TRACING) += trace_printk.o +obj-$(CONFIG_TRACING_MAP) += tracing_map.o obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o @@ -53,6 +54,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o endif obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o +obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o obj-$(CONFIG_TRACEPOINTS) += power-traces.o diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a2f0b9f33e9b..8a4bd6b68a0b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -253,6 +253,9 @@ unsigned long long ns2usecs(cycle_t nsec) #define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \ TRACE_ITER_PRINTK_MSGONLY | TRACE_ITER_RECORD_CMD) +/* trace_flags that are default zero for instances */ +#define ZEROED_TRACE_FLAGS \ + TRACE_ITER_EVENT_FORK /* * The global_trace is the descriptor that holds the tracing @@ -303,33 +306,18 @@ void trace_array_put(struct trace_array *this_tr) mutex_unlock(&trace_types_lock); } -int filter_check_discard(struct trace_event_file *file, void *rec, - struct ring_buffer *buffer, - struct ring_buffer_event *event) -{ - if (unlikely(file->flags & EVENT_FILE_FL_FILTERED) && - !filter_match_preds(file->filter, rec)) { - ring_buffer_discard_commit(buffer, event); - return 1; - } - - return 0; -} -EXPORT_SYMBOL_GPL(filter_check_discard); - int call_filter_check_discard(struct trace_event_call *call, void *rec, struct ring_buffer *buffer, struct ring_buffer_event *event) { if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) && !filter_match_preds(call->filter, rec)) { - ring_buffer_discard_commit(buffer, event); + __trace_event_discard_commit(buffer, event); return 1; } return 0; } -EXPORT_SYMBOL_GPL(call_filter_check_discard); static cycle_t buffer_ftrace_now(struct trace_buffer *buf, int cpu) { @@ -1672,6 +1660,16 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags, } EXPORT_SYMBOL_GPL(tracing_generic_entry_update); +static __always_inline void +trace_event_setup(struct ring_buffer_event *event, + int type, unsigned long flags, int pc) +{ + struct trace_entry *ent = ring_buffer_event_data(event); + + tracing_generic_entry_update(ent, flags, pc); + ent->type = type; +} + struct ring_buffer_event * trace_buffer_lock_reserve(struct ring_buffer *buffer, int type, @@ -1681,34 +1679,137 @@ trace_buffer_lock_reserve(struct ring_buffer *buffer, struct ring_buffer_event *event; event = ring_buffer_lock_reserve(buffer, len); - if (event != NULL) { - struct trace_entry *ent = ring_buffer_event_data(event); + if (event != NULL) + trace_event_setup(event, type, flags, pc); + + return event; +} + +DEFINE_PER_CPU(struct ring_buffer_event *, trace_buffered_event); +DEFINE_PER_CPU(int, trace_buffered_event_cnt); +static int trace_buffered_event_ref; + +/** + * trace_buffered_event_enable - enable buffering events + * + * When events are being filtered, it is quicker to use a temporary + * buffer to write the event data into if there's a likely chance + * that it will not be committed. The discard of the ring buffer + * is not as fast as committing, and is much slower than copying + * a commit. + * + * When an event is to be filtered, allocate per cpu buffers to + * write the event data into, and if the event is filtered and discarded + * it is simply dropped, otherwise, the entire data is to be committed + * in one shot. + */ +void trace_buffered_event_enable(void) +{ + struct ring_buffer_event *event; + struct page *page; + int cpu; - tracing_generic_entry_update(ent, flags, pc); - ent->type = type; + WARN_ON_ONCE(!mutex_is_locked(&event_mutex)); + + if (trace_buffered_event_ref++) + return; + + for_each_tracing_cpu(cpu) { + page = alloc_pages_node(cpu_to_node(cpu), + GFP_KERNEL | __GFP_NORETRY, 0); + if (!page) + goto failed; + + event = page_address(page); + memset(event, 0, sizeof(*event)); + + per_cpu(trace_buffered_event, cpu) = event; + + preempt_disable(); + if (cpu == smp_processor_id() && + this_cpu_read(trace_buffered_event) != + per_cpu(trace_buffered_event, cpu)) + WARN_ON_ONCE(1); + preempt_enable(); } - return event; + return; + failed: + trace_buffered_event_disable(); } -void -__buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event) +static void enable_trace_buffered_event(void *data) { - __this_cpu_write(trace_cmdline_save, true); - ring_buffer_unlock_commit(buffer, event); + /* Probably not needed, but do it anyway */ + smp_rmb(); + this_cpu_dec(trace_buffered_event_cnt); } -void trace_buffer_unlock_commit(struct trace_array *tr, - struct ring_buffer *buffer, - struct ring_buffer_event *event, - unsigned long flags, int pc) +static void disable_trace_buffered_event(void *data) { - __buffer_unlock_commit(buffer, event); + this_cpu_inc(trace_buffered_event_cnt); +} - ftrace_trace_stack(tr, buffer, flags, 6, pc, NULL); - ftrace_trace_userstack(buffer, flags, pc); +/** + * trace_buffered_event_disable - disable buffering events + * + * When a filter is removed, it is faster to not use the buffered + * events, and to commit directly into the ring buffer. Free up + * the temp buffers when there are no more users. This requires + * special synchronization with current events. + */ +void trace_buffered_event_disable(void) +{ + int cpu; + + WARN_ON_ONCE(!mutex_is_locked(&event_mutex)); + + if (WARN_ON_ONCE(!trace_buffered_event_ref)) + return; + + if (--trace_buffered_event_ref) + return; + + preempt_disable(); + /* For each CPU, set the buffer as used. */ + smp_call_function_many(tracing_buffer_mask, + disable_trace_buffered_event, NULL, 1); + preempt_enable(); + + /* Wait for all current users to finish */ + synchronize_sched(); + + for_each_tracing_cpu(cpu) { + free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); + per_cpu(trace_buffered_event, cpu) = NULL; + } + /* + * Make sure trace_buffered_event is NULL before clearing + * trace_buffered_event_cnt. + */ + smp_wmb(); + + preempt_disable(); + /* Do the work on each cpu */ + smp_call_function_many(tracing_buffer_mask, + enable_trace_buffered_event, NULL, 1); + preempt_enable(); +} + +void +__buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event) +{ + __this_cpu_write(trace_cmdline_save, true); + + /* If this is the temp buffer, we need to commit fully */ + if (this_cpu_read(trace_buffered_event) == event) { + /* Length is in event->array[0] */ + ring_buffer_write(buffer, event->array[0], &event->array[1]); + /* Release the temp buffer */ + this_cpu_dec(trace_buffered_event_cnt); + } else + ring_buffer_unlock_commit(buffer, event); } -EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit); static struct ring_buffer *temp_buffer; @@ -1719,8 +1820,23 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb, unsigned long flags, int pc) { struct ring_buffer_event *entry; + int val; *current_rb = trace_file->tr->trace_buffer.buffer; + + if ((trace_file->flags & + (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) && + (entry = this_cpu_read(trace_buffered_event))) { + /* Try to use the per cpu buffer first */ + val = this_cpu_inc_return(trace_buffered_event_cnt); + if (val == 1) { + trace_event_setup(entry, type, flags, pc); + entry->array[0] = len; + return entry; + } + this_cpu_dec(trace_buffered_event_cnt); + } + entry = trace_buffer_lock_reserve(*current_rb, type, len, flags, pc); /* @@ -1738,17 +1854,6 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb, } EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve); -struct ring_buffer_event * -trace_current_buffer_lock_reserve(struct ring_buffer **current_rb, - int type, unsigned long len, - unsigned long flags, int pc) -{ - *current_rb = global_trace.trace_buffer.buffer; - return trace_buffer_lock_reserve(*current_rb, - type, len, flags, pc); -} -EXPORT_SYMBOL_GPL(trace_current_buffer_lock_reserve); - void trace_buffer_unlock_commit_regs(struct trace_array *tr, struct ring_buffer *buffer, struct ring_buffer_event *event, @@ -1760,14 +1865,6 @@ void trace_buffer_unlock_commit_regs(struct trace_array *tr, ftrace_trace_stack(tr, buffer, flags, 0, pc, regs); ftrace_trace_userstack(buffer, flags, pc); } -EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit_regs); - -void trace_current_buffer_discard_commit(struct ring_buffer *buffer, - struct ring_buffer_event *event) -{ - ring_buffer_discard_commit(buffer, event); -} -EXPORT_SYMBOL_GPL(trace_current_buffer_discard_commit); void trace_function(struct trace_array *tr, @@ -3571,6 +3668,9 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) if (mask == TRACE_ITER_RECORD_CMD) trace_event_enable_cmd_record(enabled); + if (mask == TRACE_ITER_EVENT_FORK) + trace_event_follow_fork(tr, enabled); + if (mask == TRACE_ITER_OVERWRITE) { ring_buffer_change_overwrite(tr->trace_buffer.buffer, enabled); #ifdef CONFIG_TRACER_MAX_TRACE @@ -3658,7 +3758,7 @@ tracing_trace_options_write(struct file *filp, const char __user *ubuf, if (cnt >= sizeof(buf)) return -EINVAL; - if (copy_from_user(&buf, ubuf, cnt)) + if (copy_from_user(buf, ubuf, cnt)) return -EFAULT; buf[cnt] = 0; @@ -3804,12 +3904,19 @@ static const char readme_msg[] = "\t trigger: traceon, traceoff\n" "\t enable_event:<system>:<event>\n" "\t disable_event:<system>:<event>\n" +#ifdef CONFIG_HIST_TRIGGERS + "\t enable_hist:<system>:<event>\n" + "\t disable_hist:<system>:<event>\n" +#endif #ifdef CONFIG_STACKTRACE "\t\t stacktrace\n" #endif #ifdef CONFIG_TRACER_SNAPSHOT "\t\t snapshot\n" #endif +#ifdef CONFIG_HIST_TRIGGERS + "\t\t hist (see below)\n" +#endif "\t example: echo traceoff > events/block/block_unplug/trigger\n" "\t echo traceoff:3 > events/block/block_unplug/trigger\n" "\t echo 'enable_event:kmem:kmalloc:3 if nr_rq > 1' > \\\n" @@ -3825,6 +3932,56 @@ static const char readme_msg[] = "\t To remove a trigger with a count:\n" "\t echo '!<trigger>:0 > <system>/<event>/trigger\n" "\t Filters can be ignored when removing a trigger.\n" +#ifdef CONFIG_HIST_TRIGGERS + " hist trigger\t- If set, event hits are aggregated into a hash table\n" + "\t Format: hist:keys=<field1[,field2,...]>\n" + "\t [:values=<field1[,field2,...]>]\n" + "\t [:sort=<field1[,field2,...]>]\n" + "\t [:size=#entries]\n" + "\t [:pause][:continue][:clear]\n" + "\t [:name=histname1]\n" + "\t [if <filter>]\n\n" + "\t When a matching event is hit, an entry is added to a hash\n" + "\t table using the key(s) and value(s) named, and the value of a\n" + "\t sum called 'hitcount' is incremented. Keys and values\n" + "\t correspond to fields in the event's format description. Keys\n" + "\t can be any field, or the special string 'stacktrace'.\n" + "\t Compound keys consisting of up to two fields can be specified\n" + "\t by the 'keys' keyword. Values must correspond to numeric\n" + "\t fields. Sort keys consisting of up to two fields can be\n" + "\t specified using the 'sort' keyword. The sort direction can\n" + "\t be modified by appending '.descending' or '.ascending' to a\n" + "\t sort field. The 'size' parameter can be used to specify more\n" + "\t or fewer than the default 2048 entries for the hashtable size.\n" + "\t If a hist trigger is given a name using the 'name' parameter,\n" + "\t its histogram data will be shared with other triggers of the\n" + "\t same name, and trigger hits will update this common data.\n\n" + "\t Reading the 'hist' file for the event will dump the hash\n" + "\t table in its entirety to stdout. If there are multiple hist\n" + "\t triggers attached to an event, there will be a table for each\n" + "\t trigger in the output. The table displayed for a named\n" + "\t trigger will be the same as any other instance having the\n" + "\t same name. The default format used to display a given field\n" + "\t can be modified by appending any of the following modifiers\n" + "\t to the field name, as applicable:\n\n" + "\t .hex display a number as a hex value\n" + "\t .sym display an address as a symbol\n" + "\t .sym-offset display an address as a symbol and offset\n" + "\t .execname display a common_pid as a program name\n" + "\t .syscall display a syscall id as a syscall name\n\n" + "\t .log2 display log2 value rather than raw number\n\n" + "\t The 'pause' parameter can be used to pause an existing hist\n" + "\t trigger or to start a hist trigger but not log any events\n" + "\t until told to do so. 'continue' can be used to start or\n" + "\t restart a paused hist trigger.\n\n" + "\t The 'clear' parameter will clear the contents of a running\n" + "\t hist trigger and leave its current paused/active state\n" + "\t unchanged.\n\n" + "\t The enable_hist and disable_hist triggers can be used to\n" + "\t have one event conditionally start and stop another event's\n" + "\t already-attached hist trigger. The syntax is analagous to\n" + "\t the enable_event and disable_event triggers.\n" +#endif ; static ssize_t @@ -4474,7 +4631,7 @@ tracing_set_trace_write(struct file *filp, const char __user *ubuf, if (cnt > MAX_TRACER_SIZE) cnt = MAX_TRACER_SIZE; - if (copy_from_user(&buf, ubuf, cnt)) + if (copy_from_user(buf, ubuf, cnt)) return -EFAULT; buf[cnt] = 0; @@ -5264,7 +5421,7 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf, if (cnt >= sizeof(buf)) return -EINVAL; - if (copy_from_user(&buf, ubuf, cnt)) + if (copy_from_user(buf, ubuf, cnt)) return -EFAULT; buf[cnt] = 0; @@ -6650,7 +6807,7 @@ static int instance_mkdir(const char *name) if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL)) goto out_free_tr; - tr->trace_flags = global_trace.trace_flags; + tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS; cpumask_copy(tr->tracing_cpumask, cpu_all_mask); @@ -6724,6 +6881,12 @@ static int instance_rmdir(const char *name) list_del(&tr->list); + /* Disable all the flags that were enabled coming in */ + for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) { + if ((1 << i) & ZEROED_TRACE_FLAGS) + set_tracer_flag(tr, 1 << i, 0); + } + tracing_set_nop(tr); event_trace_del_tracer(tr); ftrace_destroy_function_files(tr); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 3fff4adfd431..5167c366d6b7 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -177,9 +177,8 @@ struct trace_options { }; struct trace_pid_list { - unsigned int nr_pids; - int order; - pid_t *pids; + int pid_max; + unsigned long *pids; }; /* @@ -656,6 +655,7 @@ static inline void __trace_stack(struct trace_array *tr, unsigned long flags, extern cycle_t ftrace_now(int cpu); extern void trace_find_cmdline(int pid, char comm[]); +extern void trace_event_follow_fork(struct trace_array *tr, bool enable); #ifdef CONFIG_DYNAMIC_FTRACE extern unsigned long ftrace_update_tot_cnt; @@ -967,6 +967,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(STOP_ON_FREE, "disable_on_free"), \ C(IRQ_INFO, "irq-info"), \ C(MARKERS, "markers"), \ + C(EVENT_FORK, "event-fork"), \ FUNCTION_FLAGS \ FGRAPH_FLAGS \ STACK_FLAGS \ @@ -1064,6 +1065,137 @@ struct trace_subsystem_dir { int nr_events; }; +extern int call_filter_check_discard(struct trace_event_call *call, void *rec, + struct ring_buffer *buffer, + struct ring_buffer_event *event); + +void trace_buffer_unlock_commit_regs(struct trace_array *tr, + struct ring_buffer *buffer, + struct ring_buffer_event *event, + unsigned long flags, int pc, + struct pt_regs *regs); + +static inline void trace_buffer_unlock_commit(struct trace_array *tr, + struct ring_buffer *buffer, + struct ring_buffer_event *event, + unsigned long flags, int pc) +{ + trace_buffer_unlock_commit_regs(tr, buffer, event, flags, pc, NULL); +} + +DECLARE_PER_CPU(struct ring_buffer_event *, trace_buffered_event); +DECLARE_PER_CPU(int, trace_buffered_event_cnt); +void trace_buffered_event_disable(void); +void trace_buffered_event_enable(void); + +static inline void +__trace_event_discard_commit(struct ring_buffer *buffer, + struct ring_buffer_event *event) +{ + if (this_cpu_read(trace_buffered_event) == event) { + /* Simply release the temp buffer */ + this_cpu_dec(trace_buffered_event_cnt); + return; + } + ring_buffer_discard_commit(buffer, event); +} + +/* + * Helper function for event_trigger_unlock_commit{_regs}(). + * If there are event triggers attached to this event that requires + * filtering against its fields, then they wil be called as the + * entry already holds the field information of the current event. + * + * It also checks if the event should be discarded or not. + * It is to be discarded if the event is soft disabled and the + * event was only recorded to process triggers, or if the event + * filter is active and this event did not match the filters. + * + * Returns true if the event is discarded, false otherwise. + */ +static inline bool +__event_trigger_test_discard(struct trace_event_file *file, + struct ring_buffer *buffer, + struct ring_buffer_event *event, + void *entry, + enum event_trigger_type *tt) +{ + unsigned long eflags = file->flags; + + if (eflags & EVENT_FILE_FL_TRIGGER_COND) + *tt = event_triggers_call(file, entry); + + if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags) || + (unlikely(file->flags & EVENT_FILE_FL_FILTERED) && + !filter_match_preds(file->filter, entry))) { + __trace_event_discard_commit(buffer, event); + return true; + } + + return false; +} + +/** + * event_trigger_unlock_commit - handle triggers and finish event commit + * @file: The file pointer assoctiated to the event + * @buffer: The ring buffer that the event is being written to + * @event: The event meta data in the ring buffer + * @entry: The event itself + * @irq_flags: The state of the interrupts at the start of the event + * @pc: The state of the preempt count at the start of the event. + * + * This is a helper function to handle triggers that require data + * from the event itself. It also tests the event against filters and + * if the event is soft disabled and should be discarded. + */ +static inline void +event_trigger_unlock_commit(struct trace_event_file *file, + struct ring_buffer *buffer, + struct ring_buffer_event *event, + void *entry, unsigned long irq_flags, int pc) +{ + enum event_trigger_type tt = ETT_NONE; + + if (!__event_trigger_test_discard(file, buffer, event, entry, &tt)) + trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc); + + if (tt) + event_triggers_post_call(file, tt, entry); +} + +/** + * event_trigger_unlock_commit_regs - handle triggers and finish event commit + * @file: The file pointer assoctiated to the event + * @buffer: The ring buffer that the event is being written to + * @event: The event meta data in the ring buffer + * @entry: The event itself + * @irq_flags: The state of the interrupts at the start of the event + * @pc: The state of the preempt count at the start of the event. + * + * This is a helper function to handle triggers that require data + * from the event itself. It also tests the event against filters and + * if the event is soft disabled and should be discarded. + * + * Same as event_trigger_unlock_commit() but calls + * trace_buffer_unlock_commit_regs() instead of trace_buffer_unlock_commit(). + */ +static inline void +event_trigger_unlock_commit_regs(struct trace_event_file *file, + struct ring_buffer *buffer, + struct ring_buffer_event *event, + void *entry, unsigned long irq_flags, int pc, + struct pt_regs *regs) +{ + enum event_trigger_type tt = ETT_NONE; + + if (!__event_trigger_test_discard(file, buffer, event, entry, &tt)) + trace_buffer_unlock_commit_regs(file->tr, buffer, event, + irq_flags, pc, regs); + + if (tt) + event_triggers_post_call(file, tt, entry); +} + #define FILTER_PRED_INVALID ((unsigned short)-1) #define FILTER_PRED_IS_RIGHT (1 << 15) #define FILTER_PRED_FOLD (1 << 15) @@ -1161,6 +1293,15 @@ extern struct mutex event_mutex; extern struct list_head ftrace_events; extern const struct file_operations event_trigger_fops; +extern const struct file_operations event_hist_fops; + +#ifdef CONFIG_HIST_TRIGGERS +extern int register_trigger_hist_cmd(void); +extern int register_trigger_hist_enable_disable_cmds(void); +#else +static inline int register_trigger_hist_cmd(void) { return 0; } +static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; } +#endif extern int register_trigger_cmds(void); extern void clear_event_triggers(struct trace_array *tr); @@ -1174,9 +1315,41 @@ struct event_trigger_data { char *filter_str; void *private_data; bool paused; + bool paused_tmp; struct list_head list; + char *name; + struct list_head named_list; + struct event_trigger_data *named_data; +}; + +/* Avoid typos */ +#define ENABLE_EVENT_STR "enable_event" +#define DISABLE_EVENT_STR "disable_event" +#define ENABLE_HIST_STR "enable_hist" +#define DISABLE_HIST_STR "disable_hist" + +struct enable_trigger_data { + struct trace_event_file *file; + bool enable; + bool hist; }; +extern int event_enable_trigger_print(struct seq_file *m, + struct event_trigger_ops *ops, + struct event_trigger_data *data); +extern void event_enable_trigger_free(struct event_trigger_ops *ops, + struct event_trigger_data *data); +extern int event_enable_trigger_func(struct event_command *cmd_ops, + struct trace_event_file *file, + char *glob, char *cmd, char *param); +extern int event_enable_register_trigger(char *glob, + struct event_trigger_ops *ops, + struct event_trigger_data *data, + struct trace_event_file *file); +extern void event_enable_unregister_trigger(char *glob, + struct event_trigger_ops *ops, + struct event_trigger_data *test, + struct trace_event_file *file); extern void trigger_data_free(struct event_trigger_data *data); extern int event_trigger_init(struct event_trigger_ops *ops, struct event_trigger_data *data); @@ -1189,7 +1362,18 @@ extern void unregister_trigger(char *glob, struct event_trigger_ops *ops, extern int set_trigger_filter(char *filter_str, struct event_trigger_data *trigger_data, struct trace_event_file *file); +extern struct event_trigger_data *find_named_trigger(const char *name); +extern bool is_named_trigger(struct event_trigger_data *test); +extern int save_named_trigger(const char *name, + struct event_trigger_data *data); +extern void del_named_trigger(struct event_trigger_data *data); +extern void pause_named_trigger(struct event_trigger_data *data); +extern void unpause_named_trigger(struct event_trigger_data *data); +extern void set_named_trigger_data(struct event_trigger_data *data, + struct event_trigger_data *named_data); extern int register_event_command(struct event_command *cmd); +extern int unregister_event_command(struct event_command *cmd); +extern int register_trigger_hist_enable_disable_cmds(void); /** * struct event_trigger_ops - callbacks for trace event triggers diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index b7b0760ba6ee..3d4155892a1e 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -15,7 +15,7 @@ #include <linux/kthread.h> #include <linux/tracefs.h> #include <linux/uaccess.h> -#include <linux/bsearch.h> +#include <linux/vmalloc.h> #include <linux/module.h> #include <linux/ctype.h> #include <linux/sort.h> @@ -381,6 +381,7 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file, { struct trace_event_call *call = file->event_call; struct trace_array *tr = file->tr; + unsigned long file_flags = file->flags; int ret = 0; int disable; @@ -463,6 +464,15 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file, break; } + /* Enable or disable use of trace_buffered_event */ + if ((file_flags & EVENT_FILE_FL_SOFT_DISABLED) != + (file->flags & EVENT_FILE_FL_SOFT_DISABLED)) { + if (file->flags & EVENT_FILE_FL_SOFT_DISABLED) + trace_buffered_event_enable(); + else + trace_buffered_event_disable(); + } + return ret; } @@ -489,24 +499,26 @@ static void ftrace_clear_events(struct trace_array *tr) mutex_unlock(&event_mutex); } -static int cmp_pid(const void *key, const void *elt) +/* Shouldn't this be in a header? */ +extern int pid_max; + +/* Returns true if found in filter */ +static bool +find_filtered_pid(struct trace_pid_list *filtered_pids, pid_t search_pid) { - const pid_t *search_pid = key; - const pid_t *pid = elt; + /* + * If pid_max changed after filtered_pids was created, we + * by default ignore all pids greater than the previous pid_max. + */ + if (search_pid >= filtered_pids->pid_max) + return false; - if (*search_pid == *pid) - return 0; - if (*search_pid < *pid) - return -1; - return 1; + return test_bit(search_pid, filtered_pids->pids); } static bool -check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task) +ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task) { - pid_t search_pid; - pid_t *pid; - /* * Return false, because if filtered_pids does not exist, * all pids are good to trace. @@ -514,15 +526,68 @@ check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task) if (!filtered_pids) return false; - search_pid = task->pid; + return !find_filtered_pid(filtered_pids, task->pid); +} + +static void filter_add_remove_task(struct trace_pid_list *pid_list, + struct task_struct *self, + struct task_struct *task) +{ + if (!pid_list) + return; + + /* For forks, we only add if the forking task is listed */ + if (self) { + if (!find_filtered_pid(pid_list, self->pid)) + return; + } - pid = bsearch(&search_pid, filtered_pids->pids, - filtered_pids->nr_pids, sizeof(pid_t), - cmp_pid); - if (!pid) - return true; + /* Sorry, but we don't support pid_max changing after setting */ + if (task->pid >= pid_list->pid_max) + return; - return false; + /* "self" is set for forks, and NULL for exits */ + if (self) + set_bit(task->pid, pid_list->pids); + else + clear_bit(task->pid, pid_list->pids); +} + +static void +event_filter_pid_sched_process_exit(void *data, struct task_struct *task) +{ + struct trace_pid_list *pid_list; + struct trace_array *tr = data; + + pid_list = rcu_dereference_sched(tr->filtered_pids); + filter_add_remove_task(pid_list, NULL, task); +} + +static void +event_filter_pid_sched_process_fork(void *data, + struct task_struct *self, + struct task_struct *task) +{ + struct trace_pid_list *pid_list; + struct trace_array *tr = data; + + pid_list = rcu_dereference_sched(tr->filtered_pids); + filter_add_remove_task(pid_list, self, task); +} + +void trace_event_follow_fork(struct trace_array *tr, bool enable) +{ + if (enable) { + register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork, + tr, INT_MIN); + register_trace_prio_sched_process_exit(event_filter_pid_sched_process_exit, + tr, INT_MAX); + } else { + unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork, + tr); + unregister_trace_sched_process_exit(event_filter_pid_sched_process_exit, + tr); + } } static void @@ -535,8 +600,8 @@ event_filter_pid_sched_switch_probe_pre(void *data, bool preempt, pid_list = rcu_dereference_sched(tr->filtered_pids); this_cpu_write(tr->trace_buffer.data->ignore_pid, - check_ignore_pid(pid_list, prev) && - check_ignore_pid(pid_list, next)); + ignore_this_task(pid_list, prev) && + ignore_this_task(pid_list, next)); } static void @@ -549,7 +614,7 @@ event_filter_pid_sched_switch_probe_post(void *data, bool preempt, pid_list = rcu_dereference_sched(tr->filtered_pids); this_cpu_write(tr->trace_buffer.data->ignore_pid, - check_ignore_pid(pid_list, next)); + ignore_this_task(pid_list, next)); } static void @@ -565,7 +630,7 @@ event_filter_pid_sched_wakeup_probe_pre(void *data, struct task_struct *task) pid_list = rcu_dereference_sched(tr->filtered_pids); this_cpu_write(tr->trace_buffer.data->ignore_pid, - check_ignore_pid(pid_list, task)); + ignore_this_task(pid_list, task)); } static void @@ -582,7 +647,7 @@ event_filter_pid_sched_wakeup_probe_post(void *data, struct task_struct *task) /* Set tracing if current is enabled */ this_cpu_write(tr->trace_buffer.data->ignore_pid, - check_ignore_pid(pid_list, current)); + ignore_this_task(pid_list, current)); } static void __ftrace_clear_event_pids(struct trace_array *tr) @@ -620,7 +685,7 @@ static void __ftrace_clear_event_pids(struct trace_array *tr) /* Wait till all users are no longer using pid filtering */ synchronize_sched(); - free_pages((unsigned long)pid_list->pids, pid_list->order); + vfree(pid_list->pids); kfree(pid_list); } @@ -964,11 +1029,32 @@ static void t_stop(struct seq_file *m, void *p) mutex_unlock(&event_mutex); } +static void * +p_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct trace_array *tr = m->private; + struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids); + unsigned long pid = (unsigned long)v; + + (*pos)++; + + /* pid already is +1 of the actual prevous bit */ + pid = find_next_bit(pid_list->pids, pid_list->pid_max, pid); + + /* Return pid + 1 to allow zero to be represented */ + if (pid < pid_list->pid_max) + return (void *)(pid + 1); + + return NULL; +} + static void *p_start(struct seq_file *m, loff_t *pos) __acquires(RCU) { struct trace_pid_list *pid_list; struct trace_array *tr = m->private; + unsigned long pid; + loff_t l = 0; /* * Grab the mutex, to keep calls to p_next() having the same @@ -981,10 +1067,18 @@ static void *p_start(struct seq_file *m, loff_t *pos) pid_list = rcu_dereference_sched(tr->filtered_pids); - if (!pid_list || *pos >= pid_list->nr_pids) + if (!pid_list) + return NULL; + + pid = find_first_bit(pid_list->pids, pid_list->pid_max); + if (pid >= pid_list->pid_max) return NULL; - return (void *)&pid_list->pids[*pos]; + /* Return pid + 1 so that zero can be the exit value */ + for (pid++; pid && l < *pos; + pid = (unsigned long)p_next(m, (void *)pid, &l)) + ; + return (void *)pid; } static void p_stop(struct seq_file *m, void *p) @@ -994,25 +1088,11 @@ static void p_stop(struct seq_file *m, void *p) mutex_unlock(&event_mutex); } -static void * -p_next(struct seq_file *m, void *v, loff_t *pos) -{ - struct trace_array *tr = m->private; - struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids); - - (*pos)++; - - if (*pos >= pid_list->nr_pids) - return NULL; - - return (void *)&pid_list->pids[*pos]; -} - static int p_show(struct seq_file *m, void *v) { - pid_t *pid = v; + unsigned long pid = (unsigned long)v - 1; - seq_printf(m, "%d\n", *pid); + seq_printf(m, "%lu\n", pid); return 0; } @@ -1561,11 +1641,6 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) return r; } -static int max_pids(struct trace_pid_list *pid_list) -{ - return (PAGE_SIZE << pid_list->order) / sizeof(pid_t); -} - static void ignore_task_cpu(void *data) { struct trace_array *tr = data; @@ -1579,7 +1654,7 @@ static void ignore_task_cpu(void *data) mutex_is_locked(&event_mutex)); this_cpu_write(tr->trace_buffer.data->ignore_pid, - check_ignore_pid(pid_list, current)); + ignore_this_task(pid_list, current)); } static ssize_t @@ -1589,7 +1664,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf, struct seq_file *m = filp->private_data; struct trace_array *tr = m->private; struct trace_pid_list *filtered_pids = NULL; - struct trace_pid_list *pid_list = NULL; + struct trace_pid_list *pid_list; struct trace_event_file *file; struct trace_parser parser; unsigned long val; @@ -1597,7 +1672,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf, ssize_t read = 0; ssize_t ret = 0; pid_t pid; - int i; + int nr_pids = 0; if (!cnt) return 0; @@ -1610,10 +1685,43 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf, return -ENOMEM; mutex_lock(&event_mutex); + filtered_pids = rcu_dereference_protected(tr->filtered_pids, + lockdep_is_held(&event_mutex)); + /* - * Load as many pids into the array before doing a - * swap from the tr->filtered_pids to the new list. + * Always recreate a new array. The write is an all or nothing + * operation. Always create a new array when adding new pids by + * the user. If the operation fails, then the current list is + * not modified. */ + pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL); + if (!pid_list) { + read = -ENOMEM; + goto out; + } + pid_list->pid_max = READ_ONCE(pid_max); + /* Only truncating will shrink pid_max */ + if (filtered_pids && filtered_pids->pid_max > pid_list->pid_max) + pid_list->pid_max = filtered_pids->pid_max; + pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); + if (!pid_list->pids) { + kfree(pid_list); + read = -ENOMEM; + goto out; + } + if (filtered_pids) { + /* copy the current bits to the new max */ + pid = find_first_bit(filtered_pids->pids, + filtered_pids->pid_max); + while (pid < filtered_pids->pid_max) { + set_bit(pid, pid_list->pids); + pid = find_next_bit(filtered_pids->pids, + filtered_pids->pid_max, + pid + 1); + nr_pids++; + } + } + while (cnt > 0) { this_pos = 0; @@ -1631,92 +1739,35 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf, ret = -EINVAL; if (kstrtoul(parser.buffer, 0, &val)) break; - if (val > INT_MAX) + if (val >= pid_list->pid_max) break; pid = (pid_t)val; - ret = -ENOMEM; - if (!pid_list) { - pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL); - if (!pid_list) - break; + set_bit(pid, pid_list->pids); + nr_pids++; - filtered_pids = rcu_dereference_protected(tr->filtered_pids, - lockdep_is_held(&event_mutex)); - if (filtered_pids) - pid_list->order = filtered_pids->order; - else - pid_list->order = 0; - - pid_list->pids = (void *)__get_free_pages(GFP_KERNEL, - pid_list->order); - if (!pid_list->pids) - break; - - if (filtered_pids) { - pid_list->nr_pids = filtered_pids->nr_pids; - memcpy(pid_list->pids, filtered_pids->pids, - pid_list->nr_pids * sizeof(pid_t)); - } else - pid_list->nr_pids = 0; - } - - if (pid_list->nr_pids >= max_pids(pid_list)) { - pid_t *pid_page; - - pid_page = (void *)__get_free_pages(GFP_KERNEL, - pid_list->order + 1); - if (!pid_page) - break; - memcpy(pid_page, pid_list->pids, - pid_list->nr_pids * sizeof(pid_t)); - free_pages((unsigned long)pid_list->pids, pid_list->order); - - pid_list->order++; - pid_list->pids = pid_page; - } - - pid_list->pids[pid_list->nr_pids++] = pid; trace_parser_clear(&parser); ret = 0; } trace_parser_put(&parser); if (ret < 0) { - if (pid_list) - free_pages((unsigned long)pid_list->pids, pid_list->order); + vfree(pid_list->pids); kfree(pid_list); - mutex_unlock(&event_mutex); - return ret; - } - - if (!pid_list) { - mutex_unlock(&event_mutex); - return ret; + read = ret; + goto out; } - sort(pid_list->pids, pid_list->nr_pids, sizeof(pid_t), cmp_pid, NULL); - - /* Remove duplicates */ - for (i = 1; i < pid_list->nr_pids; i++) { - int start = i; - - while (i < pid_list->nr_pids && - pid_list->pids[i - 1] == pid_list->pids[i]) - i++; - - if (start != i) { - if (i < pid_list->nr_pids) { - memmove(&pid_list->pids[start], &pid_list->pids[i], - (pid_list->nr_pids - i) * sizeof(pid_t)); - pid_list->nr_pids -= i - start; - i = start; - } else - pid_list->nr_pids = start; - } + if (!nr_pids) { + /* Cleared the list of pids */ + vfree(pid_list->pids); + kfree(pid_list); + read = ret; + if (!filtered_pids) + goto out; + pid_list = NULL; } - rcu_assign_pointer(tr->filtered_pids, pid_list); list_for_each_entry(file, &tr->events, list) { @@ -1726,7 +1777,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf, if (filtered_pids) { synchronize_sched(); - free_pages((unsigned long)filtered_pids->pids, filtered_pids->order); + vfree(filtered_pids->pids); kfree(filtered_pids); } else { /* @@ -1763,10 +1814,12 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf, */ on_each_cpu(ignore_task_cpu, tr, 1); + out: mutex_unlock(&event_mutex); ret = read; - *ppos += read; + if (read > 0) + *ppos += read; return ret; } @@ -2121,6 +2174,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) trace_create_file("trigger", 0644, file->dir, file, &event_trigger_fops); +#ifdef CONFIG_HIST_TRIGGERS + trace_create_file("hist", 0444, file->dir, file, + &event_hist_fops); +#endif trace_create_file("format", 0444, file->dir, call, &ftrace_event_format_fops); @@ -3368,7 +3425,7 @@ static __init void event_trace_self_tests(void) static DEFINE_PER_CPU(atomic_t, ftrace_test_event_disable); -static struct trace_array *event_tr; +static struct trace_event_file event_trace_file __initdata; static void __init function_test_events_call(unsigned long ip, unsigned long parent_ip, @@ -3392,17 +3449,17 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip, local_save_flags(flags); - event = trace_current_buffer_lock_reserve(&buffer, - TRACE_FN, sizeof(*entry), - flags, pc); + event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file, + TRACE_FN, sizeof(*entry), + flags, pc); if (!event) goto out; entry = ring_buffer_event_data(event); entry->ip = ip; entry->parent_ip = parent_ip; - trace_buffer_unlock_commit(event_tr, buffer, event, flags, pc); - + event_trigger_unlock_commit(&event_trace_file, buffer, event, + entry, flags, pc); out: atomic_dec(&per_cpu(ftrace_test_event_disable, cpu)); preempt_enable_notrace(); @@ -3417,9 +3474,11 @@ static struct ftrace_ops trace_ops __initdata = static __init void event_trace_self_test_with_function(void) { int ret; - event_tr = top_trace_array(); - if (WARN_ON(!event_tr)) + + event_trace_file.tr = top_trace_array(); + if (WARN_ON(!event_trace_file.tr)) return; + ret = register_ftrace_function(&trace_ops); if (WARN_ON(ret < 0)) { pr_info("Failed to enable function tracer for event tests\n"); diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index b3f5051cd4e9..9daa9b3bc6d9 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -689,10 +689,7 @@ static void append_filter_err(struct filter_parse_state *ps, static inline struct event_filter *event_filter(struct trace_event_file *file) { - if (file->event_call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - return file->event_call->filter; - else - return file->filter; + return file->filter; } /* caller must hold event_mutex */ @@ -826,12 +823,12 @@ static void __free_preds(struct event_filter *filter) static void filter_disable(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; + unsigned long old_flags = file->flags; - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - call->flags &= ~TRACE_EVENT_FL_FILTERED; - else - file->flags &= ~EVENT_FILE_FL_FILTERED; + file->flags &= ~EVENT_FILE_FL_FILTERED; + + if (old_flags != file->flags) + trace_buffered_event_disable(); } static void __free_filter(struct event_filter *filter) @@ -883,13 +880,8 @@ static int __alloc_preds(struct event_filter *filter, int n_preds) static inline void __remove_filter(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; - filter_disable(file); - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - remove_filter_string(call->filter); - else - remove_filter_string(file->filter); + remove_filter_string(file->filter); } static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir, @@ -906,15 +898,8 @@ static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir, static inline void __free_subsystem_filter(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; - - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) { - __free_filter(call->filter); - call->filter = NULL; - } else { - __free_filter(file->filter); - file->filter = NULL; - } + __free_filter(file->filter); + file->filter = NULL; } static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir, @@ -1718,69 +1703,43 @@ fail: static inline void event_set_filtered_flag(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; + unsigned long old_flags = file->flags; - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - call->flags |= TRACE_EVENT_FL_FILTERED; - else - file->flags |= EVENT_FILE_FL_FILTERED; + file->flags |= EVENT_FILE_FL_FILTERED; + + if (old_flags != file->flags) + trace_buffered_event_enable(); } static inline void event_set_filter(struct trace_event_file *file, struct event_filter *filter) { - struct trace_event_call *call = file->event_call; - - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - rcu_assign_pointer(call->filter, filter); - else - rcu_assign_pointer(file->filter, filter); + rcu_assign_pointer(file->filter, filter); } static inline void event_clear_filter(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; - - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - RCU_INIT_POINTER(call->filter, NULL); - else - RCU_INIT_POINTER(file->filter, NULL); + RCU_INIT_POINTER(file->filter, NULL); } static inline void event_set_no_set_filter_flag(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; - - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - call->flags |= TRACE_EVENT_FL_NO_SET_FILTER; - else - file->flags |= EVENT_FILE_FL_NO_SET_FILTER; + file->flags |= EVENT_FILE_FL_NO_SET_FILTER; } static inline void event_clear_no_set_filter_flag(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; - - if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) - call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER; - else - file->flags &= ~EVENT_FILE_FL_NO_SET_FILTER; + file->flags &= ~EVENT_FILE_FL_NO_SET_FILTER; } static inline bool event_no_set_filter_flag(struct trace_event_file *file) { - struct trace_event_call *call = file->event_call; - if (file->flags & EVENT_FILE_FL_NO_SET_FILTER) return true; - if ((call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) && - (call->flags & TRACE_EVENT_FL_NO_SET_FILTER)) - return true; - return false; } diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c new file mode 100644 index 000000000000..0c05b8a99806 --- /dev/null +++ b/kernel/trace/trace_events_hist.c @@ -0,0 +1,1755 @@ +/* + * trace_events_hist - trace event hist triggers + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Copyright (C) 2015 Tom Zanussi <tom.zanussi@linux.intel.com> + */ + +#include <linux/module.h> +#include <linux/kallsyms.h> +#include <linux/mutex.h> +#include <linux/slab.h> +#include <linux/stacktrace.h> + +#include "tracing_map.h" +#include "trace.h" + +struct hist_field; + +typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event); + +struct hist_field { + struct ftrace_event_field *field; + unsigned long flags; + hist_field_fn_t fn; + unsigned int size; + unsigned int offset; +}; + +static u64 hist_field_none(struct hist_field *field, void *event) +{ + return 0; +} + +static u64 hist_field_counter(struct hist_field *field, void *event) +{ + return 1; +} + +static u64 hist_field_string(struct hist_field *hist_field, void *event) +{ + char *addr = (char *)(event + hist_field->field->offset); + + return (u64)(unsigned long)addr; +} + +static u64 hist_field_dynstring(struct hist_field *hist_field, void *event) +{ + u32 str_item = *(u32 *)(event + hist_field->field->offset); + int str_loc = str_item & 0xffff; + char *addr = (char *)(event + str_loc); + + return (u64)(unsigned long)addr; +} + +static u64 hist_field_pstring(struct hist_field *hist_field, void *event) +{ + char **addr = (char **)(event + hist_field->field->offset); + + return (u64)(unsigned long)*addr; +} + +static u64 hist_field_log2(struct hist_field *hist_field, void *event) +{ + u64 val = *(u64 *)(event + hist_field->field->offset); + + return (u64) ilog2(roundup_pow_of_two(val)); +} + +#define DEFINE_HIST_FIELD_FN(type) \ +static u64 hist_field_##type(struct hist_field *hist_field, void *event)\ +{ \ + type *addr = (type *)(event + hist_field->field->offset); \ + \ + return (u64)(unsigned long)*addr; \ +} + +DEFINE_HIST_FIELD_FN(s64); +DEFINE_HIST_FIELD_FN(u64); +DEFINE_HIST_FIELD_FN(s32); +DEFINE_HIST_FIELD_FN(u32); +DEFINE_HIST_FIELD_FN(s16); +DEFINE_HIST_FIELD_FN(u16); +DEFINE_HIST_FIELD_FN(s8); +DEFINE_HIST_FIELD_FN(u8); + +#define for_each_hist_field(i, hist_data) \ + for ((i) = 0; (i) < (hist_data)->n_fields; (i)++) + +#define for_each_hist_val_field(i, hist_data) \ + for ((i) = 0; (i) < (hist_data)->n_vals; (i)++) + +#define for_each_hist_key_field(i, hist_data) \ + for ((i) = (hist_data)->n_vals; (i) < (hist_data)->n_fields; (i)++) + +#define HIST_STACKTRACE_DEPTH 16 +#define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long)) +#define HIST_STACKTRACE_SKIP 5 + +#define HITCOUNT_IDX 0 +#define HIST_KEY_SIZE_MAX (MAX_FILTER_STR_VAL + HIST_STACKTRACE_SIZE) + +enum hist_field_flags { + HIST_FIELD_FL_HITCOUNT = 1, + HIST_FIELD_FL_KEY = 2, + HIST_FIELD_FL_STRING = 4, + HIST_FIELD_FL_HEX = 8, + HIST_FIELD_FL_SYM = 16, + HIST_FIELD_FL_SYM_OFFSET = 32, + HIST_FIELD_FL_EXECNAME = 64, + HIST_FIELD_FL_SYSCALL = 128, + HIST_FIELD_FL_STACKTRACE = 256, + HIST_FIELD_FL_LOG2 = 512, +}; + +struct hist_trigger_attrs { + char *keys_str; + char *vals_str; + char *sort_key_str; + char *name; + bool pause; + bool cont; + bool clear; + unsigned int map_bits; +}; + +struct hist_trigger_data { + struct hist_field *fields[TRACING_MAP_FIELDS_MAX]; + unsigned int n_vals; + unsigned int n_keys; + unsigned int n_fields; + unsigned int key_size; + struct tracing_map_sort_key sort_keys[TRACING_MAP_SORT_KEYS_MAX]; + unsigned int n_sort_keys; + struct trace_event_file *event_file; + struct hist_trigger_attrs *attrs; + struct tracing_map *map; +}; + +static hist_field_fn_t select_value_fn(int field_size, int field_is_signed) +{ + hist_field_fn_t fn = NULL; + + switch (field_size) { + case 8: + if (field_is_signed) + fn = hist_field_s64; + else + fn = hist_field_u64; + break; + case 4: + if (field_is_signed) + fn = hist_field_s32; + else + fn = hist_field_u32; + break; + case 2: + if (field_is_signed) + fn = hist_field_s16; + else + fn = hist_field_u16; + break; + case 1: + if (field_is_signed) + fn = hist_field_s8; + else + fn = hist_field_u8; + break; + } + + return fn; +} + +static int parse_map_size(char *str) +{ + unsigned long size, map_bits; + int ret; + + strsep(&str, "="); + if (!str) { + ret = -EINVAL; + goto out; + } + + ret = kstrtoul(str, 0, &size); + if (ret) + goto out; + + map_bits = ilog2(roundup_pow_of_two(size)); + if (map_bits < TRACING_MAP_BITS_MIN || + map_bits > TRACING_MAP_BITS_MAX) + ret = -EINVAL; + else + ret = map_bits; + out: + return ret; +} + +static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs) +{ + if (!attrs) + return; + + kfree(attrs->name); + kfree(attrs->sort_key_str); + kfree(attrs->keys_str); + kfree(attrs->vals_str); + kfree(attrs); +} + +static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str) +{ + struct hist_trigger_attrs *attrs; + int ret = 0; + + attrs = kzalloc(sizeof(*attrs), GFP_KERNEL); + if (!attrs) + return ERR_PTR(-ENOMEM); + + while (trigger_str) { + char *str = strsep(&trigger_str, ":"); + + if ((strncmp(str, "key=", strlen("key=")) == 0) || + (strncmp(str, "keys=", strlen("keys=")) == 0)) + attrs->keys_str = kstrdup(str, GFP_KERNEL); + else if ((strncmp(str, "val=", strlen("val=")) == 0) || + (strncmp(str, "vals=", strlen("vals=")) == 0) || + (strncmp(str, "values=", strlen("values=")) == 0)) + attrs->vals_str = kstrdup(str, GFP_KERNEL); + else if (strncmp(str, "sort=", strlen("sort=")) == 0) + attrs->sort_key_str = kstrdup(str, GFP_KERNEL); + else if (strncmp(str, "name=", strlen("name=")) == 0) + attrs->name = kstrdup(str, GFP_KERNEL); + else if (strcmp(str, "pause") == 0) + attrs->pause = true; + else if ((strcmp(str, "cont") == 0) || + (strcmp(str, "continue") == 0)) + attrs->cont = true; + else if (strcmp(str, "clear") == 0) + attrs->clear = true; + else if (strncmp(str, "size=", strlen("size=")) == 0) { + int map_bits = parse_map_size(str); + + if (map_bits < 0) { + ret = map_bits; + goto free; + } + attrs->map_bits = map_bits; + } else { + ret = -EINVAL; + goto free; + } + } + + if (!attrs->keys_str) { + ret = -EINVAL; + goto free; + } + + return attrs; + free: + destroy_hist_trigger_attrs(attrs); + + return ERR_PTR(ret); +} + +static inline void save_comm(char *comm, struct task_struct *task) +{ + if (!task->pid) { + strcpy(comm, "<idle>"); + return; + } + + if (WARN_ON_ONCE(task->pid < 0)) { + strcpy(comm, "<XXX>"); + return; + } + + memcpy(comm, task->comm, TASK_COMM_LEN); +} + +static void hist_trigger_elt_comm_free(struct tracing_map_elt *elt) +{ + kfree((char *)elt->private_data); +} + +static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt) +{ + struct hist_trigger_data *hist_data = elt->map->private_data; + struct hist_field *key_field; + unsigned int i; + + for_each_hist_key_field(i, hist_data) { + key_field = hist_data->fields[i]; + + if (key_field->flags & HIST_FIELD_FL_EXECNAME) { + unsigned int size = TASK_COMM_LEN + 1; + + elt->private_data = kzalloc(size, GFP_KERNEL); + if (!elt->private_data) + return -ENOMEM; + break; + } + } + + return 0; +} + +static void hist_trigger_elt_comm_copy(struct tracing_map_elt *to, + struct tracing_map_elt *from) +{ + char *comm_from = from->private_data; + char *comm_to = to->private_data; + + if (comm_from) + memcpy(comm_to, comm_from, TASK_COMM_LEN + 1); +} + +static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt) +{ + char *comm = elt->private_data; + + if (comm) + save_comm(comm, current); +} + +static const struct tracing_map_ops hist_trigger_elt_comm_ops = { + .elt_alloc = hist_trigger_elt_comm_alloc, + .elt_copy = hist_trigger_elt_comm_copy, + .elt_free = hist_trigger_elt_comm_free, + .elt_init = hist_trigger_elt_comm_init, +}; + +static void destroy_hist_field(struct hist_field *hist_field) +{ + kfree(hist_field); +} + +static struct hist_field *create_hist_field(struct ftrace_event_field *field, + unsigned long flags) +{ + struct hist_field *hist_field; + + if (field && is_function_field(field)) + return NULL; + + hist_field = kzalloc(sizeof(struct hist_field), GFP_KERNEL); + if (!hist_field) + return NULL; + + if (flags & HIST_FIELD_FL_HITCOUNT) { + hist_field->fn = hist_field_counter; + goto out; + } + + if (flags & HIST_FIELD_FL_STACKTRACE) { + hist_field->fn = hist_field_none; + goto out; + } + + if (flags & HIST_FIELD_FL_LOG2) { + hist_field->fn = hist_field_log2; + goto out; + } + + if (WARN_ON_ONCE(!field)) + goto out; + + if (is_string_field(field)) { + flags |= HIST_FIELD_FL_STRING; + + if (field->filter_type == FILTER_STATIC_STRING) + hist_field->fn = hist_field_string; + else if (field->filter_type == FILTER_DYN_STRING) + hist_field->fn = hist_field_dynstring; + else + hist_field->fn = hist_field_pstring; + } else { + hist_field->fn = select_value_fn(field->size, + field->is_signed); + if (!hist_field->fn) { + destroy_hist_field(hist_field); + return NULL; + } + } + out: + hist_field->field = field; + hist_field->flags = flags; + + return hist_field; +} + +static void destroy_hist_fields(struct hist_trigger_data *hist_data) +{ + unsigned int i; + + for (i = 0; i < TRACING_MAP_FIELDS_MAX; i++) { + if (hist_data->fields[i]) { + destroy_hist_field(hist_data->fields[i]); + hist_data->fields[i] = NULL; + } + } +} + +static int create_hitcount_val(struct hist_trigger_data *hist_data) +{ + hist_data->fields[HITCOUNT_IDX] = + create_hist_field(NULL, HIST_FIELD_FL_HITCOUNT); + if (!hist_data->fields[HITCOUNT_IDX]) + return -ENOMEM; + + hist_data->n_vals++; + + if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX)) + return -EINVAL; + + return 0; +} + +static int create_val_field(struct hist_trigger_data *hist_data, + unsigned int val_idx, + struct trace_event_file *file, + char *field_str) +{ + struct ftrace_event_field *field = NULL; + unsigned long flags = 0; + char *field_name; + int ret = 0; + + if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX)) + return -EINVAL; + + field_name = strsep(&field_str, "."); + if (field_str) { + if (strcmp(field_str, "hex") == 0) + flags |= HIST_FIELD_FL_HEX; + else { + ret = -EINVAL; + goto out; + } + } + + field = trace_find_event_field(file->event_call, field_name); + if (!field) { + ret = -EINVAL; + goto out; + } + + hist_data->fields[val_idx] = create_hist_field(field, flags); + if (!hist_data->fields[val_idx]) { + ret = -ENOMEM; + goto out; + } + + ++hist_data->n_vals; + + if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX)) + ret = -EINVAL; + out: + return ret; +} + +static int create_val_fields(struct hist_trigger_data *hist_data, + struct trace_event_file *file) +{ + char *fields_str, *field_str; + unsigned int i, j; + int ret; + + ret = create_hitcount_val(hist_data); + if (ret) + goto out; + + fields_str = hist_data->attrs->vals_str; + if (!fields_str) + goto out; + + strsep(&fields_str, "="); + if (!fields_str) + goto out; + + for (i = 0, j = 1; i < TRACING_MAP_VALS_MAX && + j < TRACING_MAP_VALS_MAX; i++) { + field_str = strsep(&fields_str, ","); + if (!field_str) + break; + if (strcmp(field_str, "hitcount") == 0) + continue; + ret = create_val_field(hist_data, j++, file, field_str); + if (ret) + goto out; + } + if (fields_str && (strcmp(fields_str, "hitcount") != 0)) + ret = -EINVAL; + out: + return ret; +} + +static int create_key_field(struct hist_trigger_data *hist_data, + unsigned int key_idx, + unsigned int key_offset, + struct trace_event_file *file, + char *field_str) +{ + struct ftrace_event_field *field = NULL; + unsigned long flags = 0; + unsigned int key_size; + int ret = 0; + + if (WARN_ON(key_idx >= TRACING_MAP_FIELDS_MAX)) + return -EINVAL; + + flags |= HIST_FIELD_FL_KEY; + + if (strcmp(field_str, "stacktrace") == 0) { + flags |= HIST_FIELD_FL_STACKTRACE; + key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH; + } else { + char *field_name = strsep(&field_str, "."); + + if (field_str) { + if (strcmp(field_str, "hex") == 0) + flags |= HIST_FIELD_FL_HEX; + else if (strcmp(field_str, "sym") == 0) + flags |= HIST_FIELD_FL_SYM; + else if (strcmp(field_str, "sym-offset") == 0) + flags |= HIST_FIELD_FL_SYM_OFFSET; + else if ((strcmp(field_str, "execname") == 0) && + (strcmp(field_name, "common_pid") == 0)) + flags |= HIST_FIELD_FL_EXECNAME; + else if (strcmp(field_str, "syscall") == 0) + flags |= HIST_FIELD_FL_SYSCALL; + else if (strcmp(field_str, "log2") == 0) + flags |= HIST_FIELD_FL_LOG2; + else { + ret = -EINVAL; + goto out; + } + } + + field = trace_find_event_field(file->event_call, field_name); + if (!field) { + ret = -EINVAL; + goto out; + } + + if (is_string_field(field)) + key_size = MAX_FILTER_STR_VAL; + else + key_size = field->size; + } + + hist_data->fields[key_idx] = create_hist_field(field, flags); + if (!hist_data->fields[key_idx]) { + ret = -ENOMEM; + goto out; + } + + key_size = ALIGN(key_size, sizeof(u64)); + hist_data->fields[key_idx]->size = key_size; + hist_data->fields[key_idx]->offset = key_offset; + hist_data->key_size += key_size; + if (hist_data->key_size > HIST_KEY_SIZE_MAX) { + ret = -EINVAL; + goto out; + } + + hist_data->n_keys++; + + if (WARN_ON(hist_data->n_keys > TRACING_MAP_KEYS_MAX)) + return -EINVAL; + + ret = key_size; + out: + return ret; +} + +static int create_key_fields(struct hist_trigger_data *hist_data, + struct trace_event_file *file) +{ + unsigned int i, key_offset = 0, n_vals = hist_data->n_vals; + char *fields_str, *field_str; + int ret = -EINVAL; + + fields_str = hist_data->attrs->keys_str; + if (!fields_str) + goto out; + + strsep(&fields_str, "="); + if (!fields_str) + goto out; + + for (i = n_vals; i < n_vals + TRACING_MAP_KEYS_MAX; i++) { + field_str = strsep(&fields_str, ","); + if (!field_str) + break; + ret = create_key_field(hist_data, i, key_offset, + file, field_str); + if (ret < 0) + goto out; + key_offset += ret; + } + if (fields_str) { + ret = -EINVAL; + goto out; + } + ret = 0; + out: + return ret; +} + +static int create_hist_fields(struct hist_trigger_data *hist_data, + struct trace_event_file *file) +{ + int ret; + + ret = create_val_fields(hist_data, file); + if (ret) + goto out; + + ret = create_key_fields(hist_data, file); + if (ret) + goto out; + + hist_data->n_fields = hist_data->n_vals + hist_data->n_keys; + out: + return ret; +} + +static int is_descending(const char *str) +{ + if (!str) + return 0; + + if (strcmp(str, "descending") == 0) + return 1; + + if (strcmp(str, "ascending") == 0) + return 0; + + return -EINVAL; +} + +static int create_sort_keys(struct hist_trigger_data *hist_data) +{ + char *fields_str = hist_data->attrs->sort_key_str; + struct ftrace_event_field *field = NULL; + struct tracing_map_sort_key *sort_key; + int descending, ret = 0; + unsigned int i, j; + + hist_data->n_sort_keys = 1; /* we always have at least one, hitcount */ + + if (!fields_str) + goto out; + + strsep(&fields_str, "="); + if (!fields_str) { + ret = -EINVAL; + goto out; + } + + for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) { + char *field_str, *field_name; + + sort_key = &hist_data->sort_keys[i]; + + field_str = strsep(&fields_str, ","); + if (!field_str) { + if (i == 0) + ret = -EINVAL; + break; + } + + if ((i == TRACING_MAP_SORT_KEYS_MAX - 1) && fields_str) { + ret = -EINVAL; + break; + } + + field_name = strsep(&field_str, "."); + if (!field_name) { + ret = -EINVAL; + break; + } + + if (strcmp(field_name, "hitcount") == 0) { + descending = is_descending(field_str); + if (descending < 0) { + ret = descending; + break; + } + sort_key->descending = descending; + continue; + } + + for (j = 1; j < hist_data->n_fields; j++) { + field = hist_data->fields[j]->field; + if (field && (strcmp(field_name, field->name) == 0)) { + sort_key->field_idx = j; + descending = is_descending(field_str); + if (descending < 0) { + ret = descending; + goto out; + } + sort_key->descending = descending; + break; + } + } + if (j == hist_data->n_fields) { + ret = -EINVAL; + break; + } + } + hist_data->n_sort_keys = i; + out: + return ret; +} + +static void destroy_hist_data(struct hist_trigger_data *hist_data) +{ + destroy_hist_trigger_attrs(hist_data->attrs); + destroy_hist_fields(hist_data); + tracing_map_destroy(hist_data->map); + kfree(hist_data); +} + +static int create_tracing_map_fields(struct hist_trigger_data *hist_data) +{ + struct tracing_map *map = hist_data->map; + struct ftrace_event_field *field; + struct hist_field *hist_field; + int i, idx; + + for_each_hist_field(i, hist_data) { + hist_field = hist_data->fields[i]; + if (hist_field->flags & HIST_FIELD_FL_KEY) { + tracing_map_cmp_fn_t cmp_fn; + + field = hist_field->field; + + if (hist_field->flags & HIST_FIELD_FL_STACKTRACE) + cmp_fn = tracing_map_cmp_none; + else if (is_string_field(field)) + cmp_fn = tracing_map_cmp_string; + else + cmp_fn = tracing_map_cmp_num(field->size, + field->is_signed); + idx = tracing_map_add_key_field(map, + hist_field->offset, + cmp_fn); + + } else + idx = tracing_map_add_sum_field(map); + + if (idx < 0) + return idx; + } + + return 0; +} + +static bool need_tracing_map_ops(struct hist_trigger_data *hist_data) +{ + struct hist_field *key_field; + unsigned int i; + + for_each_hist_key_field(i, hist_data) { + key_field = hist_data->fields[i]; + + if (key_field->flags & HIST_FIELD_FL_EXECNAME) + return true; + } + + return false; +} + +static struct hist_trigger_data * +create_hist_data(unsigned int map_bits, + struct hist_trigger_attrs *attrs, + struct trace_event_file *file) +{ + const struct tracing_map_ops *map_ops = NULL; + struct hist_trigger_data *hist_data; + int ret = 0; + + hist_data = kzalloc(sizeof(*hist_data), GFP_KERNEL); + if (!hist_data) + return ERR_PTR(-ENOMEM); + + hist_data->attrs = attrs; + + ret = create_hist_fields(hist_data, file); + if (ret) + goto free; + + ret = create_sort_keys(hist_data); + if (ret) + goto free; + + if (need_tracing_map_ops(hist_data)) + map_ops = &hist_trigger_elt_comm_ops; + + hist_data->map = tracing_map_create(map_bits, hist_data->key_size, + map_ops, hist_data); + if (IS_ERR(hist_data->map)) { + ret = PTR_ERR(hist_data->map); + hist_data->map = NULL; + goto free; + } + + ret = create_tracing_map_fields(hist_data); + if (ret) + goto free; + + ret = tracing_map_init(hist_data->map); + if (ret) + goto free; + + hist_data->event_file = file; + out: + return hist_data; + free: + hist_data->attrs = NULL; + + destroy_hist_data(hist_data); + + hist_data = ERR_PTR(ret); + + goto out; +} + +static void hist_trigger_elt_update(struct hist_trigger_data *hist_data, + struct tracing_map_elt *elt, + void *rec) +{ + struct hist_field *hist_field; + unsigned int i; + u64 hist_val; + + for_each_hist_val_field(i, hist_data) { + hist_field = hist_data->fields[i]; + hist_val = hist_field->fn(hist_field, rec); + tracing_map_update_sum(elt, i, hist_val); + } +} + +static inline void add_to_key(char *compound_key, void *key, + struct hist_field *key_field, void *rec) +{ + size_t size = key_field->size; + + if (key_field->flags & HIST_FIELD_FL_STRING) { + struct ftrace_event_field *field; + + field = key_field->field; + if (field->filter_type == FILTER_DYN_STRING) + size = *(u32 *)(rec + field->offset) >> 16; + else if (field->filter_type == FILTER_PTR_STRING) + size = strlen(key); + else if (field->filter_type == FILTER_STATIC_STRING) + size = field->size; + + /* ensure NULL-termination */ + if (size > key_field->size - 1) + size = key_field->size - 1; + } + + memcpy(compound_key + key_field->offset, key, size); +} + +static void event_hist_trigger(struct event_trigger_data *data, void *rec) +{ + struct hist_trigger_data *hist_data = data->private_data; + bool use_compound_key = (hist_data->n_keys > 1); + unsigned long entries[HIST_STACKTRACE_DEPTH]; + char compound_key[HIST_KEY_SIZE_MAX]; + struct stack_trace stacktrace; + struct hist_field *key_field; + struct tracing_map_elt *elt; + u64 field_contents; + void *key = NULL; + unsigned int i; + + memset(compound_key, 0, hist_data->key_size); + + for_each_hist_key_field(i, hist_data) { + key_field = hist_data->fields[i]; + + if (key_field->flags & HIST_FIELD_FL_STACKTRACE) { + stacktrace.max_entries = HIST_STACKTRACE_DEPTH; + stacktrace.entries = entries; + stacktrace.nr_entries = 0; + stacktrace.skip = HIST_STACKTRACE_SKIP; + + memset(stacktrace.entries, 0, HIST_STACKTRACE_SIZE); + save_stack_trace(&stacktrace); + + key = entries; + } else { + field_contents = key_field->fn(key_field, rec); + if (key_field->flags & HIST_FIELD_FL_STRING) { + key = (void *)(unsigned long)field_contents; + use_compound_key = true; + } else + key = (void *)&field_contents; + } + + if (use_compound_key) + add_to_key(compound_key, key, key_field, rec); + } + + if (use_compound_key) + key = compound_key; + + elt = tracing_map_insert(hist_data->map, key); + if (elt) + hist_trigger_elt_update(hist_data, elt, rec); +} + +static void hist_trigger_stacktrace_print(struct seq_file *m, + unsigned long *stacktrace_entries, + unsigned int max_entries) +{ + char str[KSYM_SYMBOL_LEN]; + unsigned int spaces = 8; + unsigned int i; + + for (i = 0; i < max_entries; i++) { + if (stacktrace_entries[i] == ULONG_MAX) + return; + + seq_printf(m, "%*c", 1 + spaces, ' '); + sprint_symbol(str, stacktrace_entries[i]); + seq_printf(m, "%s\n", str); + } +} + +static void +hist_trigger_entry_print(struct seq_file *m, + struct hist_trigger_data *hist_data, void *key, + struct tracing_map_elt *elt) +{ + struct hist_field *key_field; + char str[KSYM_SYMBOL_LEN]; + bool multiline = false; + unsigned int i; + u64 uval; + + seq_puts(m, "{ "); + + for_each_hist_key_field(i, hist_data) { + key_field = hist_data->fields[i]; + + if (i > hist_data->n_vals) + seq_puts(m, ", "); + + if (key_field->flags & HIST_FIELD_FL_HEX) { + uval = *(u64 *)(key + key_field->offset); + seq_printf(m, "%s: %llx", + key_field->field->name, uval); + } else if (key_field->flags & HIST_FIELD_FL_SYM) { + uval = *(u64 *)(key + key_field->offset); + sprint_symbol_no_offset(str, uval); + seq_printf(m, "%s: [%llx] %-45s", + key_field->field->name, uval, str); + } else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) { + uval = *(u64 *)(key + key_field->offset); + sprint_symbol(str, uval); + seq_printf(m, "%s: [%llx] %-55s", + key_field->field->name, uval, str); + } else if (key_field->flags & HIST_FIELD_FL_EXECNAME) { + char *comm = elt->private_data; + + uval = *(u64 *)(key + key_field->offset); + seq_printf(m, "%s: %-16s[%10llu]", + key_field->field->name, comm, uval); + } else if (key_field->flags & HIST_FIELD_FL_SYSCALL) { + const char *syscall_name; + + uval = *(u64 *)(key + key_field->offset); + syscall_name = get_syscall_name(uval); + if (!syscall_name) + syscall_name = "unknown_syscall"; + + seq_printf(m, "%s: %-30s[%3llu]", + key_field->field->name, syscall_name, uval); + } else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) { + seq_puts(m, "stacktrace:\n"); + hist_trigger_stacktrace_print(m, + key + key_field->offset, + HIST_STACKTRACE_DEPTH); + multiline = true; + } else if (key_field->flags & HIST_FIELD_FL_LOG2) { + seq_printf(m, "%s: ~ 2^%-2llu", key_field->field->name, + *(u64 *)(key + key_field->offset)); + } else if (key_field->flags & HIST_FIELD_FL_STRING) { + seq_printf(m, "%s: %-50s", key_field->field->name, + (char *)(key + key_field->offset)); + } else { + uval = *(u64 *)(key + key_field->offset); + seq_printf(m, "%s: %10llu", key_field->field->name, + uval); + } + } + + if (!multiline) + seq_puts(m, " "); + + seq_puts(m, "}"); + + seq_printf(m, " hitcount: %10llu", + tracing_map_read_sum(elt, HITCOUNT_IDX)); + + for (i = 1; i < hist_data->n_vals; i++) { + if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) { + seq_printf(m, " %s: %10llx", + hist_data->fields[i]->field->name, + tracing_map_read_sum(elt, i)); + } else { + seq_printf(m, " %s: %10llu", + hist_data->fields[i]->field->name, + tracing_map_read_sum(elt, i)); + } + } + + seq_puts(m, "\n"); +} + +static int print_entries(struct seq_file *m, + struct hist_trigger_data *hist_data) +{ + struct tracing_map_sort_entry **sort_entries = NULL; + struct tracing_map *map = hist_data->map; + int i, n_entries; + + n_entries = tracing_map_sort_entries(map, hist_data->sort_keys, + hist_data->n_sort_keys, + &sort_entries); + if (n_entries < 0) + return n_entries; + + for (i = 0; i < n_entries; i++) + hist_trigger_entry_print(m, hist_data, + sort_entries[i]->key, + sort_entries[i]->elt); + + tracing_map_destroy_sort_entries(sort_entries, n_entries); + + return n_entries; +} + +static void hist_trigger_show(struct seq_file *m, + struct event_trigger_data *data, int n) +{ + struct hist_trigger_data *hist_data; + int n_entries, ret = 0; + + if (n > 0) + seq_puts(m, "\n\n"); + + seq_puts(m, "# event histogram\n#\n# trigger info: "); + data->ops->print(m, data->ops, data); + seq_puts(m, "#\n\n"); + + hist_data = data->private_data; + n_entries = print_entries(m, hist_data); + if (n_entries < 0) { + ret = n_entries; + n_entries = 0; + } + + seq_printf(m, "\nTotals:\n Hits: %llu\n Entries: %u\n Dropped: %llu\n", + (u64)atomic64_read(&hist_data->map->hits), + n_entries, (u64)atomic64_read(&hist_data->map->drops)); +} + +static int hist_show(struct seq_file *m, void *v) +{ + struct event_trigger_data *data; + struct trace_event_file *event_file; + int n = 0, ret = 0; + + mutex_lock(&event_mutex); + + event_file = event_file_data(m->private); + if (unlikely(!event_file)) { + ret = -ENODEV; + goto out_unlock; + } + + list_for_each_entry_rcu(data, &event_file->triggers, list) { + if (data->cmd_ops->trigger_type == ETT_EVENT_HIST) + hist_trigger_show(m, data, n++); + } + + out_unlock: + mutex_unlock(&event_mutex); + + return ret; +} + +static int event_hist_open(struct inode *inode, struct file *file) +{ + return single_open(file, hist_show, file); +} + +const struct file_operations event_hist_fops = { + .open = event_hist_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static const char *get_hist_field_flags(struct hist_field *hist_field) +{ + const char *flags_str = NULL; + + if (hist_field->flags & HIST_FIELD_FL_HEX) + flags_str = "hex"; + else if (hist_field->flags & HIST_FIELD_FL_SYM) + flags_str = "sym"; + else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET) + flags_str = "sym-offset"; + else if (hist_field->flags & HIST_FIELD_FL_EXECNAME) + flags_str = "execname"; + else if (hist_field->flags & HIST_FIELD_FL_SYSCALL) + flags_str = "syscall"; + else if (hist_field->flags & HIST_FIELD_FL_LOG2) + flags_str = "log2"; + + return flags_str; +} + +static void hist_field_print(struct seq_file *m, struct hist_field *hist_field) +{ + seq_printf(m, "%s", hist_field->field->name); + if (hist_field->flags) { + const char *flags_str = get_hist_field_flags(hist_field); + + if (flags_str) + seq_printf(m, ".%s", flags_str); + } +} + +static int event_hist_trigger_print(struct seq_file *m, + struct event_trigger_ops *ops, + struct event_trigger_data *data) +{ + struct hist_trigger_data *hist_data = data->private_data; + struct hist_field *key_field; + unsigned int i; + + seq_puts(m, "hist:"); + + if (data->name) + seq_printf(m, "%s:", data->name); + + seq_puts(m, "keys="); + + for_each_hist_key_field(i, hist_data) { + key_field = hist_data->fields[i]; + + if (i > hist_data->n_vals) + seq_puts(m, ","); + + if (key_field->flags & HIST_FIELD_FL_STACKTRACE) + seq_puts(m, "stacktrace"); + else + hist_field_print(m, key_field); + } + + seq_puts(m, ":vals="); + + for_each_hist_val_field(i, hist_data) { + if (i == HITCOUNT_IDX) + seq_puts(m, "hitcount"); + else { + seq_puts(m, ","); + hist_field_print(m, hist_data->fields[i]); + } + } + + seq_puts(m, ":sort="); + + for (i = 0; i < hist_data->n_sort_keys; i++) { + struct tracing_map_sort_key *sort_key; + + sort_key = &hist_data->sort_keys[i]; + + if (i > 0) + seq_puts(m, ","); + + if (sort_key->field_idx == HITCOUNT_IDX) + seq_puts(m, "hitcount"); + else { + unsigned int idx = sort_key->field_idx; + + if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX)) + return -EINVAL; + + hist_field_print(m, hist_data->fields[idx]); + } + + if (sort_key->descending) + seq_puts(m, ".descending"); + } + + seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits)); + + if (data->filter_str) + seq_printf(m, " if %s", data->filter_str); + + if (data->paused) + seq_puts(m, " [paused]"); + else + seq_puts(m, " [active]"); + + seq_putc(m, '\n'); + + return 0; +} + +static int event_hist_trigger_init(struct event_trigger_ops *ops, + struct event_trigger_data *data) +{ + struct hist_trigger_data *hist_data = data->private_data; + + if (!data->ref && hist_data->attrs->name) + save_named_trigger(hist_data->attrs->name, data); + + data->ref++; + + return 0; +} + +static void event_hist_trigger_free(struct event_trigger_ops *ops, + struct event_trigger_data *data) +{ + struct hist_trigger_data *hist_data = data->private_data; + + if (WARN_ON_ONCE(data->ref <= 0)) + return; + + data->ref--; + if (!data->ref) { + if (data->name) + del_named_trigger(data); + trigger_data_free(data); + destroy_hist_data(hist_data); + } +} + +static struct event_trigger_ops event_hist_trigger_ops = { + .func = event_hist_trigger, + .print = event_hist_trigger_print, + .init = event_hist_trigger_init, + .free = event_hist_trigger_free, +}; + +static int event_hist_trigger_named_init(struct event_trigger_ops *ops, + struct event_trigger_data *data) +{ + data->ref++; + + save_named_trigger(data->named_data->name, data); + + event_hist_trigger_init(ops, data->named_data); + + return 0; +} + +static void event_hist_trigger_named_free(struct event_trigger_ops *ops, + struct event_trigger_data *data) +{ + if (WARN_ON_ONCE(data->ref <= 0)) + return; + + event_hist_trigger_free(ops, data->named_data); + + data->ref--; + if (!data->ref) { + del_named_trigger(data); + trigger_data_free(data); + } +} + +static struct event_trigger_ops event_hist_trigger_named_ops = { + .func = event_hist_trigger, + .print = event_hist_trigger_print, + .init = event_hist_trigger_named_init, + .free = event_hist_trigger_named_free, +}; + +static struct event_trigger_ops *event_hist_get_trigger_ops(char *cmd, + char *param) +{ + return &event_hist_trigger_ops; +} + +static void hist_clear(struct event_trigger_data *data) +{ + struct hist_trigger_data *hist_data = data->private_data; + + if (data->name) + pause_named_trigger(data); + + synchronize_sched(); + + tracing_map_clear(hist_data->map); + + if (data->name) + unpause_named_trigger(data); +} + +static bool compatible_field(struct ftrace_event_field *field, + struct ftrace_event_field *test_field) +{ + if (field == test_field) + return true; + if (field == NULL || test_field == NULL) + return false; + if (strcmp(field->name, test_field->name) != 0) + return false; + if (strcmp(field->type, test_field->type) != 0) + return false; + if (field->size != test_field->size) + return false; + if (field->is_signed != test_field->is_signed) + return false; + + return true; +} + +static bool hist_trigger_match(struct event_trigger_data *data, + struct event_trigger_data *data_test, + struct event_trigger_data *named_data, + bool ignore_filter) +{ + struct tracing_map_sort_key *sort_key, *sort_key_test; + struct hist_trigger_data *hist_data, *hist_data_test; + struct hist_field *key_field, *key_field_test; + unsigned int i; + + if (named_data && (named_data != data_test) && + (named_data != data_test->named_data)) + return false; + + if (!named_data && is_named_trigger(data_test)) + return false; + + hist_data = data->private_data; + hist_data_test = data_test->private_data; + + if (hist_data->n_vals != hist_data_test->n_vals || + hist_data->n_fields != hist_data_test->n_fields || + hist_data->n_sort_keys != hist_data_test->n_sort_keys) + return false; + + if (!ignore_filter) { + if ((data->filter_str && !data_test->filter_str) || + (!data->filter_str && data_test->filter_str)) + return false; + } + + for_each_hist_field(i, hist_data) { + key_field = hist_data->fields[i]; + key_field_test = hist_data_test->fields[i]; + + if (key_field->flags != key_field_test->flags) + return false; + if (!compatible_field(key_field->field, key_field_test->field)) + return false; + if (key_field->offset != key_field_test->offset) + return false; + } + + for (i = 0; i < hist_data->n_sort_keys; i++) { + sort_key = &hist_data->sort_keys[i]; + sort_key_test = &hist_data_test->sort_keys[i]; + + if (sort_key->field_idx != sort_key_test->field_idx || + sort_key->descending != sort_key_test->descending) + return false; + } + + if (!ignore_filter && data->filter_str && + (strcmp(data->filter_str, data_test->filter_str) != 0)) + return false; + + return true; +} + +static int hist_register_trigger(char *glob, struct event_trigger_ops *ops, + struct event_trigger_data *data, + struct trace_event_file *file) +{ + struct hist_trigger_data *hist_data = data->private_data; + struct event_trigger_data *test, *named_data = NULL; + int ret = 0; + + if (hist_data->attrs->name) { + named_data = find_named_trigger(hist_data->attrs->name); + if (named_data) { + if (!hist_trigger_match(data, named_data, named_data, + true)) { + ret = -EINVAL; + goto out; + } + } + } + + if (hist_data->attrs->name && !named_data) + goto new; + + list_for_each_entry_rcu(test, &file->triggers, list) { + if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { + if (!hist_trigger_match(data, test, named_data, false)) + continue; + if (hist_data->attrs->pause) + test->paused = true; + else if (hist_data->attrs->cont) + test->paused = false; + else if (hist_data->attrs->clear) + hist_clear(test); + else + ret = -EEXIST; + goto out; + } + } + new: + if (hist_data->attrs->cont || hist_data->attrs->clear) { + ret = -ENOENT; + goto out; + } + + if (named_data) { + destroy_hist_data(data->private_data); + data->private_data = named_data->private_data; + set_named_trigger_data(data, named_data); + data->ops = &event_hist_trigger_named_ops; + } + + if (hist_data->attrs->pause) + data->paused = true; + + if (data->ops->init) { + ret = data->ops->init(data->ops, data); + if (ret < 0) + goto out; + } + + list_add_rcu(&data->list, &file->triggers); + ret++; + + update_cond_flag(file); + + if (trace_event_trigger_enable_disable(file, 1) < 0) { + list_del_rcu(&data->list); + update_cond_flag(file); + ret--; + } + out: + return ret; +} + +static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops, + struct event_trigger_data *data, + struct trace_event_file *file) +{ + struct hist_trigger_data *hist_data = data->private_data; + struct event_trigger_data *test, *named_data = NULL; + bool unregistered = false; + + if (hist_data->attrs->name) + named_data = find_named_trigger(hist_data->attrs->name); + + list_for_each_entry_rcu(test, &file->triggers, list) { + if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { + if (!hist_trigger_match(data, test, named_data, false)) + continue; + unregistered = true; + list_del_rcu(&test->list); + trace_event_trigger_enable_disable(file, 0); + update_cond_flag(file); + break; + } + } + + if (unregistered && test->ops->free) + test->ops->free(test->ops, test); +} + +static void hist_unreg_all(struct trace_event_file *file) +{ + struct event_trigger_data *test; + + list_for_each_entry_rcu(test, &file->triggers, list) { + if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { + list_del_rcu(&test->list); + trace_event_trigger_enable_disable(file, 0); + update_cond_flag(file); + if (test->ops->free) + test->ops->free(test->ops, test); + } + } +} + +static int event_hist_trigger_func(struct event_command *cmd_ops, + struct trace_event_file *file, + char *glob, char *cmd, char *param) +{ + unsigned int hist_trigger_bits = TRACING_MAP_BITS_DEFAULT; + struct event_trigger_data *trigger_data; + struct hist_trigger_attrs *attrs; + struct event_trigger_ops *trigger_ops; + struct hist_trigger_data *hist_data; + char *trigger; + int ret = 0; + + if (!param) + return -EINVAL; + + /* separate the trigger from the filter (k:v [if filter]) */ + trigger = strsep(¶m, " \t"); + if (!trigger) + return -EINVAL; + + attrs = parse_hist_trigger_attrs(trigger); + if (IS_ERR(attrs)) + return PTR_ERR(attrs); + + if (attrs->map_bits) + hist_trigger_bits = attrs->map_bits; + + hist_data = create_hist_data(hist_trigger_bits, attrs, file); + if (IS_ERR(hist_data)) { + destroy_hist_trigger_attrs(attrs); + return PTR_ERR(hist_data); + } + + trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger); + + ret = -ENOMEM; + trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL); + if (!trigger_data) + goto out_free; + + trigger_data->count = -1; + trigger_data->ops = trigger_ops; + trigger_data->cmd_ops = cmd_ops; + + INIT_LIST_HEAD(&trigger_data->list); + RCU_INIT_POINTER(trigger_data->filter, NULL); + + trigger_data->private_data = hist_data; + + /* if param is non-empty, it's supposed to be a filter */ + if (param && cmd_ops->set_filter) { + ret = cmd_ops->set_filter(param, trigger_data, file); + if (ret < 0) + goto out_free; + } + + if (glob[0] == '!') { + cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file); + ret = 0; + goto out_free; + } + + ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); + /* + * The above returns on success the # of triggers registered, + * but if it didn't register any it returns zero. Consider no + * triggers registered a failure too. + */ + if (!ret) { + if (!(attrs->pause || attrs->cont || attrs->clear)) + ret = -ENOENT; + goto out_free; + } else if (ret < 0) + goto out_free; + /* Just return zero, not the number of registered triggers */ + ret = 0; + out: + return ret; + out_free: + if (cmd_ops->set_filter) + cmd_ops->set_filter(NULL, trigger_data, NULL); + + kfree(trigger_data); + + destroy_hist_data(hist_data); + goto out; +} + +static struct event_command trigger_hist_cmd = { + .name = "hist", + .trigger_type = ETT_EVENT_HIST, + .flags = EVENT_CMD_FL_NEEDS_REC, + .func = event_hist_trigger_func, + .reg = hist_register_trigger, + .unreg = hist_unregister_trigger, + .unreg_all = hist_unreg_all, + .get_trigger_ops = event_hist_get_trigger_ops, + .set_filter = set_trigger_filter, +}; + +__init int register_trigger_hist_cmd(void) +{ + int ret; + + ret = register_event_command(&trigger_hist_cmd); + WARN_ON(ret < 0); + + return ret; +} + +static void +hist_enable_trigger(struct event_trigger_data *data, void *rec) +{ + struct enable_trigger_data *enable_data = data->private_data; + struct event_trigger_data *test; + + list_for_each_entry_rcu(test, &enable_data->file->triggers, list) { + if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { + if (enable_data->enable) + test->paused = false; + else + test->paused = true; + } + } +} + +static void +hist_enable_count_trigger(struct event_trigger_data *data, void *rec) +{ + if (!data->count) + return; + + if (data->count != -1) + (data->count)--; + + hist_enable_trigger(data, rec); +} + +static struct event_trigger_ops hist_enable_trigger_ops = { + .func = hist_enable_trigger, + .print = event_enable_trigger_print, + .init = event_trigger_init, + .free = event_enable_trigger_free, +}; + +static struct event_trigger_ops hist_enable_count_trigger_ops = { + .func = hist_enable_count_trigger, + .print = event_enable_trigger_print, + .init = event_trigger_init, + .free = event_enable_trigger_free, +}; + +static struct event_trigger_ops hist_disable_trigger_ops = { + .func = hist_enable_trigger, + .print = event_enable_trigger_print, + .init = event_trigger_init, + .free = event_enable_trigger_free, +}; + +static struct event_trigger_ops hist_disable_count_trigger_ops = { + .func = hist_enable_count_trigger, + .print = event_enable_trigger_print, + .init = event_trigger_init, + .free = event_enable_trigger_free, +}; + +static struct event_trigger_ops * +hist_enable_get_trigger_ops(char *cmd, char *param) +{ + struct event_trigger_ops *ops; + bool enable; + + enable = (strcmp(cmd, ENABLE_HIST_STR) == 0); + + if (enable) + ops = param ? &hist_enable_count_trigger_ops : + &hist_enable_trigger_ops; + else + ops = param ? &hist_disable_count_trigger_ops : + &hist_disable_trigger_ops; + + return ops; +} + +static void hist_enable_unreg_all(struct trace_event_file *file) +{ + struct event_trigger_data *test; + + list_for_each_entry_rcu(test, &file->triggers, list) { + if (test->cmd_ops->trigger_type == ETT_HIST_ENABLE) { + list_del_rcu(&test->list); + update_cond_flag(file); + trace_event_trigger_enable_disable(file, 0); + if (test->ops->free) + test->ops->free(test->ops, test); + } + } +} + +static struct event_command trigger_hist_enable_cmd = { + .name = ENABLE_HIST_STR, + .trigger_type = ETT_HIST_ENABLE, + .func = event_enable_trigger_func, + .reg = event_enable_register_trigger, + .unreg = event_enable_unregister_trigger, + .unreg_all = hist_enable_unreg_all, + .get_trigger_ops = hist_enable_get_trigger_ops, + .set_filter = set_trigger_filter, +}; + +static struct event_command trigger_hist_disable_cmd = { + .name = DISABLE_HIST_STR, + .trigger_type = ETT_HIST_ENABLE, + .func = event_enable_trigger_func, + .reg = event_enable_register_trigger, + .unreg = event_enable_unregister_trigger, + .unreg_all = hist_enable_unreg_all, + .get_trigger_ops = hist_enable_get_trigger_ops, + .set_filter = set_trigger_filter, +}; + +static __init void unregister_trigger_hist_enable_disable_cmds(void) +{ + unregister_event_command(&trigger_hist_enable_cmd); + unregister_event_command(&trigger_hist_disable_cmd); +} + +__init int register_trigger_hist_enable_disable_cmds(void) +{ + int ret; + + ret = register_event_command(&trigger_hist_enable_cmd); + if (WARN_ON(ret < 0)) + return ret; + ret = register_event_command(&trigger_hist_disable_cmd); + if (WARN_ON(ret < 0)) + unregister_trigger_hist_enable_disable_cmds(); + + return ret; +} diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index d67992f3bb0e..a975571cde24 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -347,7 +347,7 @@ __init int register_event_command(struct event_command *cmd) * Currently we only unregister event commands from __init, so mark * this __init too. */ -static __init int unregister_event_command(struct event_command *cmd) +__init int unregister_event_command(struct event_command *cmd) { struct event_command *p, *n; int ret = -ENODEV; @@ -641,6 +641,7 @@ event_trigger_callback(struct event_command *cmd_ops, trigger_data->ops = trigger_ops; trigger_data->cmd_ops = cmd_ops; INIT_LIST_HEAD(&trigger_data->list); + INIT_LIST_HEAD(&trigger_data->named_list); if (glob[0] == '!') { cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file); @@ -764,6 +765,148 @@ int set_trigger_filter(char *filter_str, return ret; } +static LIST_HEAD(named_triggers); + +/** + * find_named_trigger - Find the common named trigger associated with @name + * @name: The name of the set of named triggers to find the common data for + * + * Named triggers are sets of triggers that share a common set of + * trigger data. The first named trigger registered with a given name + * owns the common trigger data that the others subsequently + * registered with the same name will reference. This function + * returns the common trigger data associated with that first + * registered instance. + * + * Return: the common trigger data for the given named trigger on + * success, NULL otherwise. + */ +struct event_trigger_data *find_named_trigger(const char *name) +{ + struct event_trigger_data *data; + + if (!name) + return NULL; + + list_for_each_entry(data, &named_triggers, named_list) { + if (data->named_data) + continue; + if (strcmp(data->name, name) == 0) + return data; + } + + return NULL; +} + +/** + * is_named_trigger - determine if a given trigger is a named trigger + * @test: The trigger data to test + * + * Return: true if 'test' is a named trigger, false otherwise. + */ +bool is_named_trigger(struct event_trigger_data *test) +{ + struct event_trigger_data *data; + + list_for_each_entry(data, &named_triggers, named_list) { + if (test == data) + return true; + } + + return false; +} + +/** + * save_named_trigger - save the trigger in the named trigger list + * @name: The name of the named trigger set + * @data: The trigger data to save + * + * Return: 0 if successful, negative error otherwise. + */ +int save_named_trigger(const char *name, struct event_trigger_data *data) +{ + data->name = kstrdup(name, GFP_KERNEL); + if (!data->name) + return -ENOMEM; + + list_add(&data->named_list, &named_triggers); + + return 0; +} + +/** + * del_named_trigger - delete a trigger from the named trigger list + * @data: The trigger data to delete + */ +void del_named_trigger(struct event_trigger_data *data) +{ + kfree(data->name); + data->name = NULL; + + list_del(&data->named_list); +} + +static void __pause_named_trigger(struct event_trigger_data *data, bool pause) +{ + struct event_trigger_data *test; + + list_for_each_entry(test, &named_triggers, named_list) { + if (strcmp(test->name, data->name) == 0) { + if (pause) { + test->paused_tmp = test->paused; + test->paused = true; + } else { + test->paused = test->paused_tmp; + } + } + } +} + +/** + * pause_named_trigger - Pause all named triggers with the same name + * @data: The trigger data of a named trigger to pause + * + * Pauses a named trigger along with all other triggers having the + * same name. Because named triggers share a common set of data, + * pausing only one is meaningless, so pausing one named trigger needs + * to pause all triggers with the same name. + */ +void pause_named_trigger(struct event_trigger_data *data) +{ + __pause_named_trigger(data, true); +} + +/** + * unpause_named_trigger - Un-pause all named triggers with the same name + * @data: The trigger data of a named trigger to unpause + * + * Un-pauses a named trigger along with all other triggers having the + * same name. Because named triggers share a common set of data, + * unpausing only one is meaningless, so unpausing one named trigger + * needs to unpause all triggers with the same name. + */ +void unpause_named_trigger(struct event_trigger_data *data) +{ + __pause_named_trigger(data, false); +} + +/** + * set_named_trigger_data - Associate common named trigger data + * @data: The trigger data of a named trigger to unpause + * + * Named triggers are sets of triggers that share a common set of + * trigger data. The first named trigger registered with a given name + * owns the common trigger data that the others subsequently + * registered with the same name will reference. This function + * associates the common trigger data from the first trigger with the + * given trigger. + */ +void set_named_trigger_data(struct event_trigger_data *data, + struct event_trigger_data *named_data) +{ + data->named_data = named_data; +} + static void traceon_trigger(struct event_trigger_data *data, void *rec) { @@ -1062,15 +1205,6 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void) unregister_event_command(&trigger_traceoff_cmd); } -/* Avoid typos */ -#define ENABLE_EVENT_STR "enable_event" -#define DISABLE_EVENT_STR "disable_event" - -struct enable_trigger_data { - struct trace_event_file *file; - bool enable; -}; - static void event_enable_trigger(struct event_trigger_data *data, void *rec) { @@ -1100,14 +1234,16 @@ event_enable_count_trigger(struct event_trigger_data *data, void *rec) event_enable_trigger(data, rec); } -static int -event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops, - struct event_trigger_data *data) +int event_enable_trigger_print(struct seq_file *m, + struct event_trigger_ops *ops, + struct event_trigger_data *data) { struct enable_trigger_data *enable_data = data->private_data; seq_printf(m, "%s:%s:%s", - enable_data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR, + enable_data->hist ? + (enable_data->enable ? ENABLE_HIST_STR : DISABLE_HIST_STR) : + (enable_data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR), enable_data->file->event_call->class->system, trace_event_name(enable_data->file->event_call)); @@ -1124,9 +1260,8 @@ event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops, return 0; } -static void -event_enable_trigger_free(struct event_trigger_ops *ops, - struct event_trigger_data *data) +void event_enable_trigger_free(struct event_trigger_ops *ops, + struct event_trigger_data *data) { struct enable_trigger_data *enable_data = data->private_data; @@ -1171,10 +1306,9 @@ static struct event_trigger_ops event_disable_count_trigger_ops = { .free = event_enable_trigger_free, }; -static int -event_enable_trigger_func(struct event_command *cmd_ops, - struct trace_event_file *file, - char *glob, char *cmd, char *param) +int event_enable_trigger_func(struct event_command *cmd_ops, + struct trace_event_file *file, + char *glob, char *cmd, char *param) { struct trace_event_file *event_enable_file; struct enable_trigger_data *enable_data; @@ -1183,6 +1317,7 @@ event_enable_trigger_func(struct event_command *cmd_ops, struct trace_array *tr = file->tr; const char *system; const char *event; + bool hist = false; char *trigger; char *number; bool enable; @@ -1207,8 +1342,15 @@ event_enable_trigger_func(struct event_command *cmd_ops, if (!event_enable_file) goto out; - enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; +#ifdef CONFIG_HIST_TRIGGERS + hist = ((strcmp(cmd, ENABLE_HIST_STR) == 0) || + (strcmp(cmd, DISABLE_HIST_STR) == 0)); + enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) || + (strcmp(cmd, ENABLE_HIST_STR) == 0)); +#else + enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; +#endif trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger); ret = -ENOMEM; @@ -1228,6 +1370,7 @@ event_enable_trigger_func(struct event_command *cmd_ops, INIT_LIST_HEAD(&trigger_data->list); RCU_INIT_POINTER(trigger_data->filter, NULL); + enable_data->hist = hist; enable_data->enable = enable; enable_data->file = event_enable_file; trigger_data->private_data = enable_data; @@ -1305,10 +1448,10 @@ event_enable_trigger_func(struct event_command *cmd_ops, goto out; } -static int event_enable_register_trigger(char *glob, - struct event_trigger_ops *ops, - struct event_trigger_data *data, - struct trace_event_file *file) +int event_enable_register_trigger(char *glob, + struct event_trigger_ops *ops, + struct event_trigger_data *data, + struct trace_event_file *file) { struct enable_trigger_data *enable_data = data->private_data; struct enable_trigger_data *test_enable_data; @@ -1318,6 +1461,8 @@ static int event_enable_register_trigger(char *glob, list_for_each_entry_rcu(test, &file->triggers, list) { test_enable_data = test->private_data; if (test_enable_data && + (test->cmd_ops->trigger_type == + data->cmd_ops->trigger_type) && (test_enable_data->file == enable_data->file)) { ret = -EEXIST; goto out; @@ -1343,10 +1488,10 @@ out: return ret; } -static void event_enable_unregister_trigger(char *glob, - struct event_trigger_ops *ops, - struct event_trigger_data *test, - struct trace_event_file *file) +void event_enable_unregister_trigger(char *glob, + struct event_trigger_ops *ops, + struct event_trigger_data *test, + struct trace_event_file *file) { struct enable_trigger_data *test_enable_data = test->private_data; struct enable_trigger_data *enable_data; @@ -1356,6 +1501,8 @@ static void event_enable_unregister_trigger(char *glob, list_for_each_entry_rcu(data, &file->triggers, list) { enable_data = data->private_data; if (enable_data && + (data->cmd_ops->trigger_type == + test->cmd_ops->trigger_type) && (enable_data->file == test_enable_data->file)) { unregistered = true; list_del_rcu(&data->list); @@ -1375,8 +1522,12 @@ event_enable_get_trigger_ops(char *cmd, char *param) struct event_trigger_ops *ops; bool enable; +#ifdef CONFIG_HIST_TRIGGERS + enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) || + (strcmp(cmd, ENABLE_HIST_STR) == 0)); +#else enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; - +#endif if (enable) ops = param ? &event_enable_count_trigger_ops : &event_enable_trigger_ops; @@ -1447,6 +1598,8 @@ __init int register_trigger_cmds(void) register_trigger_snapshot_cmd(); register_trigger_stacktrace_cmd(); register_trigger_enable_disable_cmds(); + register_trigger_hist_enable_disable_cmds(); + register_trigger_hist_cmd(); return 0; } diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c new file mode 100644 index 000000000000..0a689bbb78ef --- /dev/null +++ b/kernel/trace/tracing_map.c @@ -0,0 +1,1062 @@ +/* + * tracing_map - lock-free map for tracing + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Copyright (C) 2015 Tom Zanussi <tom.zanussi@linux.intel.com> + * + * tracing_map implementation inspired by lock-free map algorithms + * originated by Dr. Cliff Click: + * + * http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable + * http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf + */ + +#include <linux/vmalloc.h> +#include <linux/jhash.h> +#include <linux/slab.h> +#include <linux/sort.h> + +#include "tracing_map.h" +#include "trace.h" + +/* + * NOTE: For a detailed description of the data structures used by + * these functions (such as tracing_map_elt) please see the overview + * of tracing_map data structures at the beginning of tracing_map.h. + */ + +/** + * tracing_map_update_sum - Add a value to a tracing_map_elt's sum field + * @elt: The tracing_map_elt + * @i: The index of the given sum associated with the tracing_map_elt + * @n: The value to add to the sum + * + * Add n to sum i associated with the specified tracing_map_elt + * instance. The index i is the index returned by the call to + * tracing_map_add_sum_field() when the tracing map was set up. + */ +void tracing_map_update_sum(struct tracing_map_elt *elt, unsigned int i, u64 n) +{ + atomic64_add(n, &elt->fields[i].sum); +} + +/** + * tracing_map_read_sum - Return the value of a tracing_map_elt's sum field + * @elt: The tracing_map_elt + * @i: The index of the given sum associated with the tracing_map_elt + * + * Retrieve the value of the sum i associated with the specified + * tracing_map_elt instance. The index i is the index returned by the + * call to tracing_map_add_sum_field() when the tracing map was set + * up. + * + * Return: The sum associated with field i for elt. + */ +u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i) +{ + return (u64)atomic64_read(&elt->fields[i].sum); +} + +int tracing_map_cmp_string(void *val_a, void *val_b) +{ + char *a = val_a; + char *b = val_b; + + return strcmp(a, b); +} + +int tracing_map_cmp_none(void *val_a, void *val_b) +{ + return 0; +} + +static int tracing_map_cmp_atomic64(void *val_a, void *val_b) +{ + u64 a = atomic64_read((atomic64_t *)val_a); + u64 b = atomic64_read((atomic64_t *)val_b); + + return (a > b) ? 1 : ((a < b) ? -1 : 0); +} + +#define DEFINE_TRACING_MAP_CMP_FN(type) \ +static int tracing_map_cmp_##type(void *val_a, void *val_b) \ +{ \ + type a = *(type *)val_a; \ + type b = *(type *)val_b; \ + \ + return (a > b) ? 1 : ((a < b) ? -1 : 0); \ +} + +DEFINE_TRACING_MAP_CMP_FN(s64); +DEFINE_TRACING_MAP_CMP_FN(u64); +DEFINE_TRACING_MAP_CMP_FN(s32); +DEFINE_TRACING_MAP_CMP_FN(u32); +DEFINE_TRACING_MAP_CMP_FN(s16); +DEFINE_TRACING_MAP_CMP_FN(u16); +DEFINE_TRACING_MAP_CMP_FN(s8); +DEFINE_TRACING_MAP_CMP_FN(u8); + +tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size, + int field_is_signed) +{ + tracing_map_cmp_fn_t fn = tracing_map_cmp_none; + + switch (field_size) { + case 8: + if (field_is_signed) + fn = tracing_map_cmp_s64; + else + fn = tracing_map_cmp_u64; + break; + case 4: + if (field_is_signed) + fn = tracing_map_cmp_s32; + else + fn = tracing_map_cmp_u32; + break; + case 2: + if (field_is_signed) + fn = tracing_map_cmp_s16; + else + fn = tracing_map_cmp_u16; + break; + case 1: + if (field_is_signed) + fn = tracing_map_cmp_s8; + else + fn = tracing_map_cmp_u8; + break; + } + + return fn; +} + +static int tracing_map_add_field(struct tracing_map *map, + tracing_map_cmp_fn_t cmp_fn) +{ + int ret = -EINVAL; + + if (map->n_fields < TRACING_MAP_FIELDS_MAX) { + ret = map->n_fields; + map->fields[map->n_fields++].cmp_fn = cmp_fn; + } + + return ret; +} + +/** + * tracing_map_add_sum_field - Add a field describing a tracing_map sum + * @map: The tracing_map + * + * Add a sum field to the key and return the index identifying it in + * the map and associated tracing_map_elts. This is the index used + * for instance to update a sum for a particular tracing_map_elt using + * tracing_map_update_sum() or reading it via tracing_map_read_sum(). + * + * Return: The index identifying the field in the map and associated + * tracing_map_elts, or -EINVAL on error. + */ +int tracing_map_add_sum_field(struct tracing_map *map) +{ + return tracing_map_add_field(map, tracing_map_cmp_atomic64); +} + +/** + * tracing_map_add_key_field - Add a field describing a tracing_map key + * @map: The tracing_map + * @offset: The offset within the key + * @cmp_fn: The comparison function that will be used to sort on the key + * + * Let the map know there is a key and that if it's used as a sort key + * to use cmp_fn. + * + * A key can be a subset of a compound key; for that purpose, the + * offset param is used to describe where within the the compound key + * the key referenced by this key field resides. + * + * Return: The index identifying the field in the map and associated + * tracing_map_elts, or -EINVAL on error. + */ +int tracing_map_add_key_field(struct tracing_map *map, + unsigned int offset, + tracing_map_cmp_fn_t cmp_fn) + +{ + int idx = tracing_map_add_field(map, cmp_fn); + + if (idx < 0) + return idx; + + map->fields[idx].offset = offset; + + map->key_idx[map->n_keys++] = idx; + + return idx; +} + +void tracing_map_array_clear(struct tracing_map_array *a) +{ + unsigned int i; + + if (!a->pages) + return; + + for (i = 0; i < a->n_pages; i++) + memset(a->pages[i], 0, PAGE_SIZE); +} + +void tracing_map_array_free(struct tracing_map_array *a) +{ + unsigned int i; + + if (!a) + return; + + if (!a->pages) { + kfree(a); + return; + } + + for (i = 0; i < a->n_pages; i++) { + if (!a->pages[i]) + break; + free_page((unsigned long)a->pages[i]); + } +} + +struct tracing_map_array *tracing_map_array_alloc(unsigned int n_elts, + unsigned int entry_size) +{ + struct tracing_map_array *a; + unsigned int i; + + a = kzalloc(sizeof(*a), GFP_KERNEL); + if (!a) + return NULL; + + a->entry_size_shift = fls(roundup_pow_of_two(entry_size) - 1); + a->entries_per_page = PAGE_SIZE / (1 << a->entry_size_shift); + a->n_pages = n_elts / a->entries_per_page; + if (!a->n_pages) + a->n_pages = 1; + a->entry_shift = fls(a->entries_per_page) - 1; + a->entry_mask = (1 << a->entry_shift) - 1; + + a->pages = kcalloc(a->n_pages, sizeof(void *), GFP_KERNEL); + if (!a->pages) + goto free; + + for (i = 0; i < a->n_pages; i++) { + a->pages[i] = (void *)get_zeroed_page(GFP_KERNEL); + if (!a->pages[i]) + goto free; + } + out: + return a; + free: + tracing_map_array_free(a); + a = NULL; + + goto out; +} + +static void tracing_map_elt_clear(struct tracing_map_elt *elt) +{ + unsigned i; + + for (i = 0; i < elt->map->n_fields; i++) + if (elt->fields[i].cmp_fn == tracing_map_cmp_atomic64) + atomic64_set(&elt->fields[i].sum, 0); + + if (elt->map->ops && elt->map->ops->elt_clear) + elt->map->ops->elt_clear(elt); +} + +static void tracing_map_elt_init_fields(struct tracing_map_elt *elt) +{ + unsigned int i; + + tracing_map_elt_clear(elt); + + for (i = 0; i < elt->map->n_fields; i++) { + elt->fields[i].cmp_fn = elt->map->fields[i].cmp_fn; + + if (elt->fields[i].cmp_fn != tracing_map_cmp_atomic64) + elt->fields[i].offset = elt->map->fields[i].offset; + } +} + +static void tracing_map_elt_free(struct tracing_map_elt *elt) +{ + if (!elt) + return; + + if (elt->map->ops && elt->map->ops->elt_free) + elt->map->ops->elt_free(elt); + kfree(elt->fields); + kfree(elt->key); + kfree(elt); +} + +static struct tracing_map_elt *tracing_map_elt_alloc(struct tracing_map *map) +{ + struct tracing_map_elt *elt; + int err = 0; + + elt = kzalloc(sizeof(*elt), GFP_KERNEL); + if (!elt) + return ERR_PTR(-ENOMEM); + + elt->map = map; + + elt->key = kzalloc(map->key_size, GFP_KERNEL); + if (!elt->key) { + err = -ENOMEM; + goto free; + } + + elt->fields = kcalloc(map->n_fields, sizeof(*elt->fields), GFP_KERNEL); + if (!elt->fields) { + err = -ENOMEM; + goto free; + } + + tracing_map_elt_init_fields(elt); + + if (map->ops && map->ops->elt_alloc) { + err = map->ops->elt_alloc(elt); + if (err) + goto free; + } + return elt; + free: + tracing_map_elt_free(elt); + + return ERR_PTR(err); +} + +static struct tracing_map_elt *get_free_elt(struct tracing_map *map) +{ + struct tracing_map_elt *elt = NULL; + int idx; + + idx = atomic_inc_return(&map->next_elt); + if (idx < map->max_elts) { + elt = *(TRACING_MAP_ELT(map->elts, idx)); + if (map->ops && map->ops->elt_init) + map->ops->elt_init(elt); + } + + return elt; +} + +static void tracing_map_free_elts(struct tracing_map *map) +{ + unsigned int i; + + if (!map->elts) + return; + + for (i = 0; i < map->max_elts; i++) { + tracing_map_elt_free(*(TRACING_MAP_ELT(map->elts, i))); + *(TRACING_MAP_ELT(map->elts, i)) = NULL; + } + + tracing_map_array_free(map->elts); + map->elts = NULL; +} + +static int tracing_map_alloc_elts(struct tracing_map *map) +{ + unsigned int i; + + map->elts = tracing_map_array_alloc(map->max_elts, + sizeof(struct tracing_map_elt *)); + if (!map->elts) + return -ENOMEM; + + for (i = 0; i < map->max_elts; i++) { + *(TRACING_MAP_ELT(map->elts, i)) = tracing_map_elt_alloc(map); + if (IS_ERR(*(TRACING_MAP_ELT(map->elts, i)))) { + *(TRACING_MAP_ELT(map->elts, i)) = NULL; + tracing_map_free_elts(map); + + return -ENOMEM; + } + } + + return 0; +} + +static inline bool keys_match(void *key, void *test_key, unsigned key_size) +{ + bool match = true; + + if (memcmp(key, test_key, key_size)) + match = false; + + return match; +} + +static inline struct tracing_map_elt * +__tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only) +{ + u32 idx, key_hash, test_key; + struct tracing_map_entry *entry; + + key_hash = jhash(key, map->key_size, 0); + if (key_hash == 0) + key_hash = 1; + idx = key_hash >> (32 - (map->map_bits + 1)); + + while (1) { + idx &= (map->map_size - 1); + entry = TRACING_MAP_ENTRY(map->map, idx); + test_key = entry->key; + + if (test_key && test_key == key_hash && entry->val && + keys_match(key, entry->val->key, map->key_size)) { + atomic64_inc(&map->hits); + return entry->val; + } + + if (!test_key) { + if (lookup_only) + break; + + if (!cmpxchg(&entry->key, 0, key_hash)) { + struct tracing_map_elt *elt; + + elt = get_free_elt(map); + if (!elt) { + atomic64_inc(&map->drops); + entry->key = 0; + break; + } + + memcpy(elt->key, key, map->key_size); + entry->val = elt; + atomic64_inc(&map->hits); + + return entry->val; + } + } + + idx++; + } + + return NULL; +} + +/** + * tracing_map_insert - Insert key and/or retrieve val from a tracing_map + * @map: The tracing_map to insert into + * @key: The key to insert + * + * Inserts a key into a tracing_map and creates and returns a new + * tracing_map_elt for it, or if the key has already been inserted by + * a previous call, returns the tracing_map_elt already associated + * with it. When the map was created, the number of elements to be + * allocated for the map was specified (internally maintained as + * 'max_elts' in struct tracing_map), and that number of + * tracing_map_elts was created by tracing_map_init(). This is the + * pre-allocated pool of tracing_map_elts that tracing_map_insert() + * will allocate from when adding new keys. Once that pool is + * exhausted, tracing_map_insert() is useless and will return NULL to + * signal that state. There are two user-visible tracing_map + * variables, 'hits' and 'drops', which are updated by this function. + * Every time an element is either successfully inserted or retrieved, + * the 'hits' value is incrememented. Every time an element insertion + * fails, the 'drops' value is incremented. + * + * This is a lock-free tracing map insertion function implementing a + * modified form of Cliff Click's basic insertion algorithm. It + * requires the table size be a power of two. To prevent any + * possibility of an infinite loop we always make the internal table + * size double the size of the requested table size (max_elts * 2). + * Likewise, we never reuse a slot or resize or delete elements - when + * we've reached max_elts entries, we simply return NULL once we've + * run out of entries. Readers can at any point in time traverse the + * tracing map and safely access the key/val pairs. + * + * Return: the tracing_map_elt pointer val associated with the key. + * If this was a newly inserted key, the val will be a newly allocated + * and associated tracing_map_elt pointer val. If the key wasn't + * found and the pool of tracing_map_elts has been exhausted, NULL is + * returned and no further insertions will succeed. + */ +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key) +{ + return __tracing_map_insert(map, key, false); +} + +/** + * tracing_map_lookup - Retrieve val from a tracing_map + * @map: The tracing_map to perform the lookup on + * @key: The key to look up + * + * Looks up key in tracing_map and if found returns the matching + * tracing_map_elt. This is a lock-free lookup; see + * tracing_map_insert() for details on tracing_map and how it works. + * Every time an element is retrieved, the 'hits' value is + * incrememented. There is one user-visible tracing_map variable, + * 'hits', which is updated by this function. Every time an element + * is successfully retrieved, the 'hits' value is incrememented. The + * 'drops' value is never updated by this function. + * + * Return: the tracing_map_elt pointer val associated with the key. + * If the key wasn't found, NULL is returned. + */ +struct tracing_map_elt *tracing_map_lookup(struct tracing_map *map, void *key) +{ + return __tracing_map_insert(map, key, true); +} + +/** + * tracing_map_destroy - Destroy a tracing_map + * @map: The tracing_map to destroy + * + * Frees a tracing_map along with its associated array of + * tracing_map_elts. + * + * Callers should make sure there are no readers or writers actively + * reading or inserting into the map before calling this. + */ +void tracing_map_destroy(struct tracing_map *map) +{ + if (!map) + return; + + tracing_map_free_elts(map); + + tracing_map_array_free(map->map); + kfree(map); +} + +/** + * tracing_map_clear - Clear a tracing_map + * @map: The tracing_map to clear + * + * Resets the tracing map to a cleared or initial state. The + * tracing_map_elts are all cleared, and the array of struct + * tracing_map_entry is reset to an initialized state. + * + * Callers should make sure there are no writers actively inserting + * into the map before calling this. + */ +void tracing_map_clear(struct tracing_map *map) +{ + unsigned int i; + + atomic_set(&map->next_elt, -1); + atomic64_set(&map->hits, 0); + atomic64_set(&map->drops, 0); + + tracing_map_array_clear(map->map); + + for (i = 0; i < map->max_elts; i++) + tracing_map_elt_clear(*(TRACING_MAP_ELT(map->elts, i))); +} + +static void set_sort_key(struct tracing_map *map, + struct tracing_map_sort_key *sort_key) +{ + map->sort_key = *sort_key; +} + +/** + * tracing_map_create - Create a lock-free map and element pool + * @map_bits: The size of the map (2 ** map_bits) + * @key_size: The size of the key for the map in bytes + * @ops: Optional client-defined tracing_map_ops instance + * @private_data: Client data associated with the map + * + * Creates and sets up a map to contain 2 ** map_bits number of + * elements (internally maintained as 'max_elts' in struct + * tracing_map). Before using, map fields should be added to the map + * with tracing_map_add_sum_field() and tracing_map_add_key_field(). + * tracing_map_init() should then be called to allocate the array of + * tracing_map_elts, in order to avoid allocating anything in the map + * insertion path. The user-specified map size reflects the maximum + * number of elements that can be contained in the table requested by + * the user - internally we double that in order to keep the table + * sparse and keep collisions manageable. + * + * A tracing_map is a special-purpose map designed to aggregate or + * 'sum' one or more values associated with a specific object of type + * tracing_map_elt, which is attached by the map to a given key. + * + * tracing_map_create() sets up the map itself, and provides + * operations for inserting tracing_map_elts, but doesn't allocate the + * tracing_map_elts themselves, or provide a means for describing the + * keys or sums associated with the tracing_map_elts. All + * tracing_map_elts for a given map have the same set of sums and + * keys, which are defined by the client using the functions + * tracing_map_add_key_field() and tracing_map_add_sum_field(). Once + * the fields are defined, the pool of elements allocated for the map + * can be created, which occurs when the client code calls + * tracing_map_init(). + * + * When tracing_map_init() returns, tracing_map_elt elements can be + * inserted into the map using tracing_map_insert(). When called, + * tracing_map_insert() grabs a free tracing_map_elt from the pool, or + * finds an existing match in the map and in either case returns it. + * The client can then use tracing_map_update_sum() and + * tracing_map_read_sum() to update or read a given sum field for the + * tracing_map_elt. + * + * The client can at any point retrieve and traverse the current set + * of inserted tracing_map_elts in a tracing_map, via + * tracing_map_sort_entries(). Sorting can be done on any field, + * including keys. + * + * See tracing_map.h for a description of tracing_map_ops. + * + * Return: the tracing_map pointer if successful, ERR_PTR if not. + */ +struct tracing_map *tracing_map_create(unsigned int map_bits, + unsigned int key_size, + const struct tracing_map_ops *ops, + void *private_data) +{ + struct tracing_map *map; + unsigned int i; + + if (map_bits < TRACING_MAP_BITS_MIN || + map_bits > TRACING_MAP_BITS_MAX) + return ERR_PTR(-EINVAL); + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (!map) + return ERR_PTR(-ENOMEM); + + map->map_bits = map_bits; + map->max_elts = (1 << map_bits); + atomic_set(&map->next_elt, -1); + + map->map_size = (1 << (map_bits + 1)); + map->ops = ops; + + map->private_data = private_data; + + map->map = tracing_map_array_alloc(map->map_size, + sizeof(struct tracing_map_entry)); + if (!map->map) + goto free; + + map->key_size = key_size; + for (i = 0; i < TRACING_MAP_KEYS_MAX; i++) + map->key_idx[i] = -1; + out: + return map; + free: + tracing_map_destroy(map); + map = ERR_PTR(-ENOMEM); + + goto out; +} + +/** + * tracing_map_init - Allocate and clear a map's tracing_map_elts + * @map: The tracing_map to initialize + * + * Allocates a clears a pool of tracing_map_elts equal to the + * user-specified size of 2 ** map_bits (internally maintained as + * 'max_elts' in struct tracing_map). Before using, the map fields + * should be added to the map with tracing_map_add_sum_field() and + * tracing_map_add_key_field(). tracing_map_init() should then be + * called to allocate the array of tracing_map_elts, in order to avoid + * allocating anything in the map insertion path. The user-specified + * map size reflects the max number of elements requested by the user + * - internally we double that in order to keep the table sparse and + * keep collisions manageable. + * + * See tracing_map.h for a description of tracing_map_ops. + * + * Return: the tracing_map pointer if successful, ERR_PTR if not. + */ +int tracing_map_init(struct tracing_map *map) +{ + int err; + + if (map->n_fields < 2) + return -EINVAL; /* need at least 1 key and 1 val */ + + err = tracing_map_alloc_elts(map); + if (err) + return err; + + tracing_map_clear(map); + + return err; +} + +static int cmp_entries_dup(const struct tracing_map_sort_entry **a, + const struct tracing_map_sort_entry **b) +{ + int ret = 0; + + if (memcmp((*a)->key, (*b)->key, (*a)->elt->map->key_size)) + ret = 1; + + return ret; +} + +static int cmp_entries_sum(const struct tracing_map_sort_entry **a, + const struct tracing_map_sort_entry **b) +{ + const struct tracing_map_elt *elt_a, *elt_b; + struct tracing_map_sort_key *sort_key; + struct tracing_map_field *field; + tracing_map_cmp_fn_t cmp_fn; + void *val_a, *val_b; + int ret = 0; + + elt_a = (*a)->elt; + elt_b = (*b)->elt; + + sort_key = &elt_a->map->sort_key; + + field = &elt_a->fields[sort_key->field_idx]; + cmp_fn = field->cmp_fn; + + val_a = &elt_a->fields[sort_key->field_idx].sum; + val_b = &elt_b->fields[sort_key->field_idx].sum; + + ret = cmp_fn(val_a, val_b); + if (sort_key->descending) + ret = -ret; + + return ret; +} + +static int cmp_entries_key(const struct tracing_map_sort_entry **a, + const struct tracing_map_sort_entry **b) +{ + const struct tracing_map_elt *elt_a, *elt_b; + struct tracing_map_sort_key *sort_key; + struct tracing_map_field *field; + tracing_map_cmp_fn_t cmp_fn; + void *val_a, *val_b; + int ret = 0; + + elt_a = (*a)->elt; + elt_b = (*b)->elt; + + sort_key = &elt_a->map->sort_key; + + field = &elt_a->fields[sort_key->field_idx]; + + cmp_fn = field->cmp_fn; + + val_a = elt_a->key + field->offset; + val_b = elt_b->key + field->offset; + + ret = cmp_fn(val_a, val_b); + if (sort_key->descending) + ret = -ret; + + return ret; +} + +static void destroy_sort_entry(struct tracing_map_sort_entry *entry) +{ + if (!entry) + return; + + if (entry->elt_copied) + tracing_map_elt_free(entry->elt); + + kfree(entry); +} + +/** + * tracing_map_destroy_sort_entries - Destroy an array of sort entries + * @entries: The entries to destroy + * @n_entries: The number of entries in the array + * + * Destroy the elements returned by a tracing_map_sort_entries() call. + */ +void tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries, + unsigned int n_entries) +{ + unsigned int i; + + for (i = 0; i < n_entries; i++) + destroy_sort_entry(entries[i]); + + vfree(entries); +} + +static struct tracing_map_sort_entry * +create_sort_entry(void *key, struct tracing_map_elt *elt) +{ + struct tracing_map_sort_entry *sort_entry; + + sort_entry = kzalloc(sizeof(*sort_entry), GFP_KERNEL); + if (!sort_entry) + return NULL; + + sort_entry->key = key; + sort_entry->elt = elt; + + return sort_entry; +} + +static struct tracing_map_elt *copy_elt(struct tracing_map_elt *elt) +{ + struct tracing_map_elt *dup_elt; + unsigned int i; + + dup_elt = tracing_map_elt_alloc(elt->map); + if (IS_ERR(dup_elt)) + return NULL; + + if (elt->map->ops && elt->map->ops->elt_copy) + elt->map->ops->elt_copy(dup_elt, elt); + + dup_elt->private_data = elt->private_data; + memcpy(dup_elt->key, elt->key, elt->map->key_size); + + for (i = 0; i < elt->map->n_fields; i++) { + atomic64_set(&dup_elt->fields[i].sum, + atomic64_read(&elt->fields[i].sum)); + dup_elt->fields[i].cmp_fn = elt->fields[i].cmp_fn; + } + + return dup_elt; +} + +static int merge_dup(struct tracing_map_sort_entry **sort_entries, + unsigned int target, unsigned int dup) +{ + struct tracing_map_elt *target_elt, *elt; + bool first_dup = (target - dup) == 1; + int i; + + if (first_dup) { + elt = sort_entries[target]->elt; + target_elt = copy_elt(elt); + if (!target_elt) + return -ENOMEM; + sort_entries[target]->elt = target_elt; + sort_entries[target]->elt_copied = true; + } else + target_elt = sort_entries[target]->elt; + + elt = sort_entries[dup]->elt; + + for (i = 0; i < elt->map->n_fields; i++) + atomic64_add(atomic64_read(&elt->fields[i].sum), + &target_elt->fields[i].sum); + + sort_entries[dup]->dup = true; + + return 0; +} + +static int merge_dups(struct tracing_map_sort_entry **sort_entries, + int n_entries, unsigned int key_size) +{ + unsigned int dups = 0, total_dups = 0; + int err, i, j; + void *key; + + if (n_entries < 2) + return total_dups; + + sort(sort_entries, n_entries, sizeof(struct tracing_map_sort_entry *), + (int (*)(const void *, const void *))cmp_entries_dup, NULL); + + key = sort_entries[0]->key; + for (i = 1; i < n_entries; i++) { + if (!memcmp(sort_entries[i]->key, key, key_size)) { + dups++; total_dups++; + err = merge_dup(sort_entries, i - dups, i); + if (err) + return err; + continue; + } + key = sort_entries[i]->key; + dups = 0; + } + + if (!total_dups) + return total_dups; + + for (i = 0, j = 0; i < n_entries; i++) { + if (!sort_entries[i]->dup) { + sort_entries[j] = sort_entries[i]; + if (j++ != i) + sort_entries[i] = NULL; + } else { + destroy_sort_entry(sort_entries[i]); + sort_entries[i] = NULL; + } + } + + return total_dups; +} + +static bool is_key(struct tracing_map *map, unsigned int field_idx) +{ + unsigned int i; + + for (i = 0; i < map->n_keys; i++) + if (map->key_idx[i] == field_idx) + return true; + return false; +} + +static void sort_secondary(struct tracing_map *map, + const struct tracing_map_sort_entry **entries, + unsigned int n_entries, + struct tracing_map_sort_key *primary_key, + struct tracing_map_sort_key *secondary_key) +{ + int (*primary_fn)(const struct tracing_map_sort_entry **, + const struct tracing_map_sort_entry **); + int (*secondary_fn)(const struct tracing_map_sort_entry **, + const struct tracing_map_sort_entry **); + unsigned i, start = 0, n_sub = 1; + + if (is_key(map, primary_key->field_idx)) + primary_fn = cmp_entries_key; + else + primary_fn = cmp_entries_sum; + + if (is_key(map, secondary_key->field_idx)) + secondary_fn = cmp_entries_key; + else + secondary_fn = cmp_entries_sum; + + for (i = 0; i < n_entries - 1; i++) { + const struct tracing_map_sort_entry **a = &entries[i]; + const struct tracing_map_sort_entry **b = &entries[i + 1]; + + if (primary_fn(a, b) == 0) { + n_sub++; + if (i < n_entries - 2) + continue; + } + + if (n_sub < 2) { + start = i + 1; + n_sub = 1; + continue; + } + + set_sort_key(map, secondary_key); + sort(&entries[start], n_sub, + sizeof(struct tracing_map_sort_entry *), + (int (*)(const void *, const void *))secondary_fn, NULL); + set_sort_key(map, primary_key); + + start = i + 1; + n_sub = 1; + } +} + +/** + * tracing_map_sort_entries - Sort the current set of tracing_map_elts in a map + * @map: The tracing_map + * @sort_key: The sort key to use for sorting + * @sort_entries: outval: pointer to allocated and sorted array of entries + * + * tracing_map_sort_entries() sorts the current set of entries in the + * map and returns the list of tracing_map_sort_entries containing + * them to the client in the sort_entries param. The client can + * access the struct tracing_map_elt element of interest directly as + * the 'elt' field of a returned struct tracing_map_sort_entry object. + * + * The sort_key has only two fields: idx and descending. 'idx' refers + * to the index of the field added via tracing_map_add_sum_field() or + * tracing_map_add_key_field() when the tracing_map was initialized. + * 'descending' is a flag that if set reverses the sort order, which + * by default is ascending. + * + * The client should not hold on to the returned array but should use + * it and call tracing_map_destroy_sort_entries() when done. + * + * Return: the number of sort_entries in the struct tracing_map_sort_entry + * array, negative on error + */ +int tracing_map_sort_entries(struct tracing_map *map, + struct tracing_map_sort_key *sort_keys, + unsigned int n_sort_keys, + struct tracing_map_sort_entry ***sort_entries) +{ + int (*cmp_entries_fn)(const struct tracing_map_sort_entry **, + const struct tracing_map_sort_entry **); + struct tracing_map_sort_entry *sort_entry, **entries; + int i, n_entries, ret; + + entries = vmalloc(map->max_elts * sizeof(sort_entry)); + if (!entries) + return -ENOMEM; + + for (i = 0, n_entries = 0; i < map->map_size; i++) { + struct tracing_map_entry *entry; + + entry = TRACING_MAP_ENTRY(map->map, i); + + if (!entry->key || !entry->val) + continue; + + entries[n_entries] = create_sort_entry(entry->val->key, + entry->val); + if (!entries[n_entries++]) { + ret = -ENOMEM; + goto free; + } + } + + if (n_entries == 0) { + ret = 0; + goto free; + } + + if (n_entries == 1) { + *sort_entries = entries; + return 1; + } + + ret = merge_dups(entries, n_entries, map->key_size); + if (ret < 0) + goto free; + n_entries -= ret; + + if (is_key(map, sort_keys[0].field_idx)) + cmp_entries_fn = cmp_entries_key; + else + cmp_entries_fn = cmp_entries_sum; + + set_sort_key(map, &sort_keys[0]); + + sort(entries, n_entries, sizeof(struct tracing_map_sort_entry *), + (int (*)(const void *, const void *))cmp_entries_fn, NULL); + + if (n_sort_keys > 1) + sort_secondary(map, + (const struct tracing_map_sort_entry **)entries, + n_entries, + &sort_keys[0], + &sort_keys[1]); + + *sort_entries = entries; + + return n_entries; + free: + tracing_map_destroy_sort_entries(entries, n_entries); + + return ret; +} diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h new file mode 100644 index 000000000000..618838f5f30a --- /dev/null +++ b/kernel/trace/tracing_map.h @@ -0,0 +1,283 @@ +#ifndef __TRACING_MAP_H +#define __TRACING_MAP_H + +#define TRACING_MAP_BITS_DEFAULT 11 +#define TRACING_MAP_BITS_MAX 17 +#define TRACING_MAP_BITS_MIN 7 + +#define TRACING_MAP_KEYS_MAX 2 +#define TRACING_MAP_VALS_MAX 3 +#define TRACING_MAP_FIELDS_MAX (TRACING_MAP_KEYS_MAX + \ + TRACING_MAP_VALS_MAX) +#define TRACING_MAP_SORT_KEYS_MAX 2 + +typedef int (*tracing_map_cmp_fn_t) (void *val_a, void *val_b); + +/* + * This is an overview of the tracing_map data structures and how they + * relate to the tracing_map API. The details of the algorithms + * aren't discussed here - this is just a general overview of the data + * structures and how they interact with the API. + * + * The central data structure of the tracing_map is an initially + * zeroed array of struct tracing_map_entry (stored in the map field + * of struct tracing_map). tracing_map_entry is a very simple data + * structure containing only two fields: a 32-bit unsigned 'key' + * variable and a pointer named 'val'. This array of struct + * tracing_map_entry is essentially a hash table which will be + * modified by a single function, tracing_map_insert(), but which can + * be traversed and read by a user at any time (though the user does + * this indirectly via an array of tracing_map_sort_entry - see the + * explanation of that data structure in the discussion of the + * sorting-related data structures below). + * + * The central function of the tracing_map API is + * tracing_map_insert(). tracing_map_insert() hashes the + * arbitrarily-sized key passed into it into a 32-bit unsigned key. + * It then uses this key, truncated to the array size, as an index + * into the array of tracing_map_entries. If the value of the 'key' + * field of the tracing_map_entry found at that location is 0, then + * that entry is considered to be free and can be claimed, by + * replacing the 0 in the 'key' field of the tracing_map_entry with + * the new 32-bit hashed key. Once claimed, that tracing_map_entry's + * 'val' field is then used to store a unique element which will be + * forever associated with that 32-bit hashed key in the + * tracing_map_entry. + * + * That unique element now in the tracing_map_entry's 'val' field is + * an instance of tracing_map_elt, where 'elt' in the latter part of + * that variable name is short for 'element'. The purpose of a + * tracing_map_elt is to hold values specific to the particular + * 32-bit hashed key it's assocated with. Things such as the unique + * set of aggregated sums associated with the 32-bit hashed key, along + * with a copy of the full key associated with the entry, and which + * was used to produce the 32-bit hashed key. + * + * When tracing_map_create() is called to create the tracing map, the + * user specifies (indirectly via the map_bits param, the details are + * unimportant for this discussion) the maximum number of elements + * that the map can hold (stored in the max_elts field of struct + * tracing_map). This is the maximum possible number of + * tracing_map_entries in the tracing_map_entry array which can be + * 'claimed' as described in the above discussion, and therefore is + * also the maximum number of tracing_map_elts that can be associated + * with the tracing_map_entry array in the tracing_map. Because of + * the way the insertion algorithm works, the size of the allocated + * tracing_map_entry array is always twice the maximum number of + * elements (2 * max_elts). This value is stored in the map_size + * field of struct tracing_map. + * + * Because tracing_map_insert() needs to work from any context, + * including from within the memory allocation functions themselves, + * both the tracing_map_entry array and a pool of max_elts + * tracing_map_elts are pre-allocated before any call is made to + * tracing_map_insert(). + * + * The tracing_map_entry array is allocated as a single block by + * tracing_map_create(). + * + * Because the tracing_map_elts are much larger objects and can't + * generally be allocated together as a single large array without + * failure, they're allocated individually, by tracing_map_init(). + * + * The pool of tracing_map_elts are allocated by tracing_map_init() + * rather than by tracing_map_create() because at the time + * tracing_map_create() is called, there isn't enough information to + * create the tracing_map_elts. Specifically,the user first needs to + * tell the tracing_map implementation how many fields the + * tracing_map_elts contain, and which types of fields they are (key + * or sum). The user does this via the tracing_map_add_sum_field() + * and tracing_map_add_key_field() functions, following which the user + * calls tracing_map_init() to finish up the tracing map setup. The + * array holding the pointers which make up the pre-allocated pool of + * tracing_map_elts is allocated as a single block and is stored in + * the elts field of struct tracing_map. + * + * There is also a set of structures used for sorting that might + * benefit from some minimal explanation. + * + * struct tracing_map_sort_key is used to drive the sort at any given + * time. By 'any given time' we mean that a different + * tracing_map_sort_key will be used at different times depending on + * whether the sort currently being performed is a primary or a + * secondary sort. + * + * The sort key is very simple, consisting of the field index of the + * tracing_map_elt field to sort on (which the user saved when adding + * the field), and whether the sort should be done in an ascending or + * descending order. + * + * For the convenience of the sorting code, a tracing_map_sort_entry + * is created for each tracing_map_elt, again individually allocated + * to avoid failures that might be expected if allocated as a single + * large array of struct tracing_map_sort_entry. + * tracing_map_sort_entry instances are the objects expected by the + * various internal sorting functions, and are also what the user + * ultimately receives after calling tracing_map_sort_entries(). + * Because it doesn't make sense for users to access an unordered and + * sparsely populated tracing_map directly, the + * tracing_map_sort_entries() function is provided so that users can + * retrieve a sorted list of all existing elements. In addition to + * the associated tracing_map_elt 'elt' field contained within the + * tracing_map_sort_entry, which is the object of interest to the + * user, tracing_map_sort_entry objects contain a number of additional + * fields which are used for caching and internal purposes and can + * safely be ignored. +*/ + +struct tracing_map_field { + tracing_map_cmp_fn_t cmp_fn; + union { + atomic64_t sum; + unsigned int offset; + }; +}; + +struct tracing_map_elt { + struct tracing_map *map; + struct tracing_map_field *fields; + void *key; + void *private_data; +}; + +struct tracing_map_entry { + u32 key; + struct tracing_map_elt *val; +}; + +struct tracing_map_sort_key { + unsigned int field_idx; + bool descending; +}; + +struct tracing_map_sort_entry { + void *key; + struct tracing_map_elt *elt; + bool elt_copied; + bool dup; +}; + +struct tracing_map_array { + unsigned int entries_per_page; + unsigned int entry_size_shift; + unsigned int entry_shift; + unsigned int entry_mask; + unsigned int n_pages; + void **pages; +}; + +#define TRACING_MAP_ARRAY_ELT(array, idx) \ + (array->pages[idx >> array->entry_shift] + \ + ((idx & array->entry_mask) << array->entry_size_shift)) + +#define TRACING_MAP_ENTRY(array, idx) \ + ((struct tracing_map_entry *)TRACING_MAP_ARRAY_ELT(array, idx)) + +#define TRACING_MAP_ELT(array, idx) \ + ((struct tracing_map_elt **)TRACING_MAP_ARRAY_ELT(array, idx)) + +struct tracing_map { + unsigned int key_size; + unsigned int map_bits; + unsigned int map_size; + unsigned int max_elts; + atomic_t next_elt; + struct tracing_map_array *elts; + struct tracing_map_array *map; + const struct tracing_map_ops *ops; + void *private_data; + struct tracing_map_field fields[TRACING_MAP_FIELDS_MAX]; + unsigned int n_fields; + int key_idx[TRACING_MAP_KEYS_MAX]; + unsigned int n_keys; + struct tracing_map_sort_key sort_key; + atomic64_t hits; + atomic64_t drops; +}; + +/** + * struct tracing_map_ops - callbacks for tracing_map + * + * The methods in this structure define callback functions for various + * operations on a tracing_map or objects related to a tracing_map. + * + * For a detailed description of tracing_map_elt objects please see + * the overview of tracing_map data structures at the beginning of + * this file. + * + * All the methods below are optional. + * + * @elt_alloc: When a tracing_map_elt is allocated, this function, if + * defined, will be called and gives clients the opportunity to + * allocate additional data and attach it to the element + * (tracing_map_elt->private_data is meant for that purpose). + * Element allocation occurs before tracing begins, when the + * tracing_map_init() call is made by client code. + * + * @elt_copy: At certain points in the lifetime of an element, it may + * need to be copied. The copy should include a copy of the + * client-allocated data, which can be copied into the 'to' + * element from the 'from' element. + * + * @elt_free: When a tracing_map_elt is freed, this function is called + * and allows client-allocated per-element data to be freed. + * + * @elt_clear: This callback allows per-element client-defined data to + * be cleared, if applicable. + * + * @elt_init: This callback allows per-element client-defined data to + * be initialized when used i.e. when the element is actually + * claimed by tracing_map_insert() in the context of the map + * insertion. + */ +struct tracing_map_ops { + int (*elt_alloc)(struct tracing_map_elt *elt); + void (*elt_copy)(struct tracing_map_elt *to, + struct tracing_map_elt *from); + void (*elt_free)(struct tracing_map_elt *elt); + void (*elt_clear)(struct tracing_map_elt *elt); + void (*elt_init)(struct tracing_map_elt *elt); +}; + +extern struct tracing_map * +tracing_map_create(unsigned int map_bits, + unsigned int key_size, + const struct tracing_map_ops *ops, + void *private_data); +extern int tracing_map_init(struct tracing_map *map); + +extern int tracing_map_add_sum_field(struct tracing_map *map); +extern int tracing_map_add_key_field(struct tracing_map *map, + unsigned int offset, + tracing_map_cmp_fn_t cmp_fn); + +extern void tracing_map_destroy(struct tracing_map *map); +extern void tracing_map_clear(struct tracing_map *map); + +extern struct tracing_map_elt * +tracing_map_insert(struct tracing_map *map, void *key); +extern struct tracing_map_elt * +tracing_map_lookup(struct tracing_map *map, void *key); + +extern tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size, + int field_is_signed); +extern int tracing_map_cmp_string(void *val_a, void *val_b); +extern int tracing_map_cmp_none(void *val_a, void *val_b); + +extern void tracing_map_update_sum(struct tracing_map_elt *elt, + unsigned int i, u64 n); +extern u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i); +extern void tracing_map_set_field_descr(struct tracing_map *map, + unsigned int i, + unsigned int key_offset, + tracing_map_cmp_fn_t cmp_fn); +extern int +tracing_map_sort_entries(struct tracing_map *map, + struct tracing_map_sort_key *sort_keys, + unsigned int n_sort_keys, + struct tracing_map_sort_entry ***sort_entries); + +extern void +tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries, + unsigned int n_entries); +#endif /* __TRACING_MAP_H */ diff --git a/tools/testing/selftests/ftrace/test.d/functions b/tools/testing/selftests/ftrace/test.d/functions index 5d8cd06d920f..c37262f6c269 100644 --- a/tools/testing/selftests/ftrace/test.d/functions +++ b/tools/testing/selftests/ftrace/test.d/functions @@ -14,3 +14,12 @@ enable_tracing() { # start trace recording reset_tracer() { # reset the current tracer echo nop > current_tracer } + +reset_trigger() { # reset all current setting triggers + grep -v ^# events/*/*/trigger | + while read line; do + cmd=`echo $line | cut -f2- -d: | cut -f1 -d" "` + echo "!$cmd" > `echo $line | cut -f1 -d:` + done +} + diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc new file mode 100644 index 000000000000..1a9445021bf1 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc @@ -0,0 +1,64 @@ +#!/bin/sh +# description: event trigger - test event enable/disable trigger + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +FEATURE=`grep enable_event events/sched/sched_process_fork/trigger` +if [ -z "$FEATURE" ]; then + echo "event enable/disable trigger is not supported" + exit_unsupported +fi + +echo "Test enable_event trigger" +echo 0 > events/sched/sched_switch/enable +echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger +( echo "forked") +if [ `cat events/sched/sched_switch/enable` != '1*' ]; then + fail "enable_event trigger on sched_process_fork did not work" +fi + +reset_trigger + +echo "Test disable_event trigger" +echo 1 > events/sched/sched_switch/enable +echo 'disable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger +( echo "forked") +if [ `cat events/sched/sched_switch/enable` != '0*' ]; then + fail "disable_event trigger on sched_process_fork did not work" +fi + +reset_trigger + +echo "Test semantic error for event enable/disable trigger" +! echo 'enable_event:nogroup:noevent' > events/sched/sched_process_fork/trigger +! echo 'disable_event+1' > events/sched/sched_process_fork/trigger +echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger +! echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger +! echo 'disable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc new file mode 100644 index 000000000000..514e466e198b --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc @@ -0,0 +1,59 @@ +#!/bin/sh +# description: event trigger - test trigger filter + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +echo "Test trigger filter" +echo 1 > tracing_on +echo 'traceoff if child_pid == 0' > events/sched/sched_process_fork/trigger +( echo "forked") +if [ `cat tracing_on` -ne 1 ]; then + fail "traceoff trigger on sched_process_fork did not work" +fi + +reset_trigger + +echo "Test semantic error for trigger filter" +! echo 'traceoff if a' > events/sched/sched_process_fork/trigger +! echo 'traceoff if common_pid=0' > events/sched/sched_process_fork/trigger +! echo 'traceoff if common_pid==b' > events/sched/sched_process_fork/trigger +echo 'traceoff if common_pid == 0' > events/sched/sched_process_fork/trigger +echo '!traceoff' > events/sched/sched_process_fork/trigger +! echo 'traceoff if common_pid == child_pid' > events/sched/sched_process_fork/trigger +echo 'traceoff if common_pid <= 0' > events/sched/sched_process_fork/trigger +echo '!traceoff' > events/sched/sched_process_fork/trigger +echo 'traceoff if common_pid >= 0' > events/sched/sched_process_fork/trigger +echo '!traceoff' > events/sched/sched_process_fork/trigger +echo 'traceoff if parent_pid >= 0 && child_pid >= 0' > events/sched/sched_process_fork/trigger +echo '!traceoff' > events/sched/sched_process_fork/trigger +echo 'traceoff if parent_pid >= 0 || child_pid >= 0' > events/sched/sched_process_fork/trigger +echo '!traceoff' > events/sched/sched_process_fork/trigger + + + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc new file mode 100644 index 000000000000..c2b61c4fda11 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc @@ -0,0 +1,75 @@ +#!/bin/sh +# description: event trigger - test histogram modifiers + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +FEATURE=`grep hist events/sched/sched_process_fork/trigger` +if [ -z "$FEATURE" ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + +echo "Test histogram with execname modifier" + +echo 'hist:keys=common_pid.execname' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +COMM=`cat /proc/$$/comm` +grep "common_pid: $COMM" events/sched/sched_process_fork/hist > /dev/null || \ + fail "execname modifier on sched_process_fork did not work" + +reset_trigger + +echo "Test histogram with hex modifier" + +echo 'hist:keys=parent_pid.hex' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +# Note that $$ is the parent pid. $PID is current PID. +HEX=`printf %x $PID` +grep "parent_pid: $HEX" events/sched/sched_process_fork/hist > /dev/null || \ + fail "hex modifier on sched_process_fork did not work" + +reset_trigger + +echo "Test histogram with syscall modifier" + +echo 'hist:keys=id.syscall' > events/raw_syscalls/sys_exit/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep "id: sys_" events/raw_syscalls/sys_exit/hist > /dev/null || \ + fail "syscall modifier on raw_syscalls/sys_exit did not work" + + +reset_trigger + +echo "Test histgram with log2 modifier" + +echo 'hist:keys=bytes_req.log2' > events/kmem/kmalloc/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep 'bytes_req: ~ 2^[0-9]*' events/kmem/kmalloc/hist > /dev/null || \ + fail "log2 modifier on kmem/kmalloc did not work" + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc new file mode 100644 index 000000000000..b2902d42a537 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc @@ -0,0 +1,83 @@ +#!/bin/sh +# description: event trigger - test histogram trigger + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +FEATURE=`grep hist events/sched/sched_process_fork/trigger` +if [ -z "$FEATURE" ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + +echo "Test histogram basic tigger" + +echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep parent_pid events/sched/sched_process_fork/hist > /dev/null || \ + fail "hist trigger on sched_process_fork did not work" +grep child events/sched/sched_process_fork/hist > /dev/null || \ + fail "hist trigger on sched_process_fork did not work" + +reset_trigger + +echo "Test histogram with compound keys" + +echo 'hist:keys=parent_pid,child_pid' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep '^{ parent_pid:.*, child_pid:.*}' events/sched/sched_process_fork/hist > /dev/null || \ + fail "compound keys on sched_process_fork did not work" + +reset_trigger + +echo "Test histogram with string key" + +echo 'hist:keys=parent_comm' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +COMM=`cat /proc/$$/comm` +grep "parent_comm: $COMM" events/sched/sched_process_fork/hist > /dev/null || \ + fail "string key on sched_process_fork did not work" + +reset_trigger + +echo "Test histogram with sort key" + +echo 'hist:keys=parent_pid,child_pid:sort=child_pid.ascending' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done + +check_inc() { + while [ $# -gt 1 ]; do + [ $1 -gt $2 ] && return 1 + shift 1 + done + return 0 +} +check_inc `grep -o "child_pid:[[:space:]]*[[:digit:]]*" \ + events/sched/sched_process_fork/hist | cut -d: -f2 ` || + fail "sort param on sched_process_fork did not work" + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc new file mode 100644 index 000000000000..03c4a46561fc --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc @@ -0,0 +1,73 @@ +#!/bin/sh +# description: event trigger - test multiple histogram triggers + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +FEATURE=`grep hist events/sched/sched_process_fork/trigger` +if [ -z "$FEATURE" ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + +reset_trigger + +echo "Test histogram multiple tiggers" + +echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger +echo 'hist:keys=parent_comm:vals=child_pid' >> events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep parent_pid events/sched/sched_process_fork/hist > /dev/null || \ + fail "hist trigger on sched_process_fork did not work" +grep child events/sched/sched_process_fork/hist > /dev/null || \ + fail "hist trigger on sched_process_fork did not work" +COMM=`cat /proc/$$/comm` +grep "parent_comm: $COMM" events/sched/sched_process_fork/hist > /dev/null || \ + fail "string key on sched_process_fork did not work" + +reset_trigger + +echo "Test histogram with its name" + +echo 'hist:name=test_hist:keys=common_pid' > events/sched/sched_process_fork/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep test_hist events/sched/sched_process_fork/hist > /dev/null || \ + fail "named event on sched_process_fork did not work" + +echo "Test same named histogram on different events" + +echo 'hist:name=test_hist:keys=common_pid' > events/sched/sched_process_exit/trigger +for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done +grep test_hist events/sched/sched_process_exit/hist > /dev/null || \ + fail "named event on sched_process_fork did not work" + +diffs=`diff events/sched/sched_process_exit/hist events/sched/sched_process_fork/hist | wc -l` +test $diffs -eq 0 || fail "Same name histograms are not same" + +reset_trigger + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc new file mode 100644 index 000000000000..f84b80d551a2 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc @@ -0,0 +1,56 @@ +#!/bin/sh +# description: event trigger - test snapshot-trigger + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +FEATURE=`grep snapshot events/sched/sched_process_fork/trigger` +if [ -z "$FEATURE" ]; then + echo "snapshot trigger is not supported" + exit_unsupported +fi + +echo "Test snapshot tigger" +echo 0 > snapshot +echo 1 > events/sched/sched_process_fork/enable +( echo "forked") +echo 'snapshot:1' > events/sched/sched_process_fork/trigger +( echo "forked") +grep sched_process_fork snapshot > /dev/null || \ + fail "snapshot trigger on sched_process_fork did not work" + +reset_trigger +echo 0 > snapshot +echo 0 > events/sched/sched_process_fork/enable + +echo "Test snapshot semantic errors" + +! echo "snapshot+1" > events/sched/sched_process_fork/trigger +echo "snapshot" > events/sched/sched_process_fork/trigger +! echo "snapshot" > events/sched/sched_process_fork/trigger + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc new file mode 100644 index 000000000000..9fa23b085def --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc @@ -0,0 +1,53 @@ +#!/bin/sh +# description: event trigger - test stacktrace-trigger + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +FEATURE=`grep stacktrace events/sched/sched_process_fork/trigger` +if [ -z "$FEATURE" ]; then + echo "stacktrace trigger is not supported" + exit_unsupported +fi + +echo "Test stacktrace tigger" +echo 0 > trace +echo 0 > options/stacktrace +echo 'stacktrace' > events/sched/sched_process_fork/trigger +( echo "forked") +grep "<stack trace>" trace > /dev/null || \ + fail "stacktrace trigger on sched_process_fork did not work" + +reset_trigger + +echo "Test stacktrace semantic errors" + +! echo "stacktrace:foo" > events/sched/sched_process_fork/trigger +echo "stacktrace" > events/sched/sched_process_fork/trigger +! echo "stacktrace" > events/sched/sched_process_fork/trigger + +do_reset + +exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc new file mode 100644 index 000000000000..87648e5f987c --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc @@ -0,0 +1,58 @@ +#!/bin/sh +# description: event trigger - test traceon/off trigger + +do_reset() { + reset_trigger + echo > set_event + clear_trace +} + +fail() { #msg + do_reset + echo $1 + exit $FAIL +} + +if [ ! -f set_event -o ! -d events/sched ]; then + echo "event tracing is not supported" + exit_unsupported +fi + +if [ ! -f events/sched/sched_process_fork/trigger ]; then + echo "event trigger is not supported" + exit_unsupported +fi + +reset_tracer +do_reset + +echo "Test traceoff trigger" +echo 1 > tracing_on +echo 'traceoff' > events/sched/sched_process_fork/trigger +( echo "forked") +if [ `cat tracing_on` -ne 0 ]; then + fail "traceoff trigger on sched_process_fork did not work" +fi + +reset_trigger + +echo "Test traceon trigger" +echo 0 > tracing_on +echo 'traceon' > events/sched/sched_process_fork/trigger +( echo "forked") +if [ `cat tracing_on` -ne 1 ]; then + fail "traceoff trigger on sched_process_fork did not work" +fi + +reset_trigger + +echo "Test semantic error for traceoff/on trigger" +! echo 'traceoff:badparam' > events/sched/sched_process_fork/trigger +! echo 'traceoff+0' > events/sched/sched_process_fork/trigger +echo 'traceon' > events/sched/sched_process_fork/trigger +! echo 'traceon' > events/sched/sched_process_fork/trigger +! echo 'traceoff' > events/sched/sched_process_fork/trigger + +do_reset + +exit 0 |