summaryrefslogtreecommitdiffstats
path: root/tools/perf/util/hist.c
Commit message (Collapse)AuthorAgeFilesLines
* perf sort: Add sort option for data page sizeKan Liang2020-12-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new sort option "data_page_size" for --mem-mode sort. With this option applied, perf can sort and report by sample's data page size. Here is an example: perf report --stdio --mem-mode --sort=comm,symbol,phys_daddr,data_page_size # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 9K of event 'mem-loads:uP' # Total weight : 9028 # Sort order : comm,symbol,phys_daddr,data_page_size # # Overhead Command Symbol Data Physical # Address # Data Page Size # ........ ....... ............................ # ...................... ...................... # 11.19% dtlb [.] touch_buffer [.] 0x00000003fec82ea8 4K 8.61% dtlb [.] GetTickCount [.] 0x00000003c4f2c8a8 4K 4.52% dtlb [.] GetTickCount [.] 0x00000003fec82f58 4K 4.33% dtlb [.] __gettimeofday [.] 0x00000003fec82f48 4K 4.32% dtlb [.] GetTickCount [.] 0x00000003fec82f78 4K 4.28% dtlb [.] GetTickCount [.] 0x00000003fec82f50 4K 4.23% dtlb [.] GetTickCount [.] 0x00000003fec82f70 4K 4.11% dtlb [.] GetTickCount [.] 0x00000003fec82f68 4K 4.00% dtlb [.] Calibrate [.] 0x00000003fec82f98 4K 3.91% dtlb [.] Calibrate [.] 0x00000003fec82f90 4K 3.43% dtlb [.] touch_buffer [.] 0x00000003fec82e98 4K 3.42% dtlb [.] touch_buffer [.] 0x00000003fec82e90 4K 0.09% dtlb [.] DoDependentLoads [.] 0x000000036ea084c0 2M 0.08% dtlb [.] DoDependentLoads [.] 0x000000032b010b80 2M Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Use the right prefix for 'struct evlist' print methodsArnaldo Carvalho de Melo2020-11-301-1/+1
| | | | | | | | | | | perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/, go on completing this split. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename perf_evsel__resort*() to evsel__resort*()Arnaldo Carvalho de Melo2020-05-281-4/+4
| | | | | | | | | | As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename perf_evsel__object_config() to evsel__object_config()Arnaldo Carvalho de Melo2020-05-281-3/+2
| | | | | | | | | | As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename perf_evsel__is_*() to evsel__is*()Arnaldo Carvalho de Melo2020-05-051-1/+1
| | | | | | | | | | As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc()Arnaldo Carvalho de Melo2020-05-051-1/+1
| | | | | | | | | | As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename *perf_evsel__*name() to *evsel__*name()Arnaldo Carvalho de Melo2020-05-051-2/+2
| | | | | | | | | | As they are 'struct evsel' methods or related routines, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Add fast path for duplicate entries checkKan Liang2020-04-181-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used to compare the new entry with the old entries. It will go through all the available sorts in the sort_list, and call the specific cmp of each sort, which is very slow. Actually, for most cases, there are no duplicate entries in callchain. The symbols are usually different. It's much faster to do a quick check for symbols first. Only do the full cmp when the symbols are exactly the same. The quick check is only to check symbols, not dso. Export _sort__sym_cmp. $ perf record --call-graph lbr ./tchain_edit_64 Without the patch $time perf report --stdio real 0m21.142s user 0m21.110s sys 0m0.033s With the patch $time perf report --stdio real 0m10.977s user 0m10.948s sys 0m0.027s Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Add 'cgroup' sort keyNamhyung Kim2020-04-031-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cgroup sort key is to show cgroup membership of each task. Currently it shows full path in the cgroupfs (not relative to the root of cgroup namespace) since it'd be more intuitive IMHO. Otherwise root cgroup in different namespaces will all show same name - "/". The cgroup sort key should come before cgroup_id otherwise sort_dimension__add() will match it to cgroup_id as it only matches with the given substring. For example it will look like following. Note that record patch adding --all-cgroups patch will come later. $ perf record -a --namespace --all-cgroups cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ] $ perf report -s cgroup_id,cgroup,pid ... # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 93.96% 0/0x0 / 0:swapper 1.25% 3/0xeffffffb / 278:looper0 0.86% 3/0xf000015f /sub/cgrp1 280:cgtest 0.37% 3/0xf0000160 /sub/cgrp2 281:cgtest 0.34% 3/0xf0000163 /sub/cgrp3 282:cgtest 0.22% 3/0xeffffffb /sub 278:looper0 0.20% 3/0xeffffffb / 280:cgtest 0.15% 3/0xf0000163 /sub/cgrp3 285:looper3 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add hw_idx in struct branch_stackKan Liang2020-03-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it. However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0]. To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt. Apply the new struct branch_stack for synthetic events as well. Extend test case sample-parsing to support new struct branch_stack. Committer notes: Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually. Add 'static' to that inline as it is in a header. Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf map_symbol: Rename ms->mg to ms->mapsArnaldo Carvalho de Melo2019-11-261-4/+4
| | | | | | | | | | | One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf addr_location: Rename al->mg to al->mapsArnaldo Carvalho de Melo2019-11-261-4/+4
| | | | | | | | | | | One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-foo95pyyp3bhocbt7yd8qrvq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol'Arnaldo Carvalho de Melo2019-11-121-0/+4
| | | | | | | | | | | | And fill it whenever we setup a a 'struct map_symbol', now we need to use it, next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fzwfcnddenz1o7uj1fzw3g46@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol'Arnaldo Carvalho de Melo2019-11-121-27/+27
| | | | | | | | | | | | | | | | | | | So that we pass that substructure around and with it consolidate lots of functions that receive a (map, symbol) pair and now can receive just a 'struct map_symbol' pointer. This further paves the way to add 'struct map_groups' to 'struct map_symbol' so that we can have all we need for annotation so that we can ditch 'struct map'->groups, i.e. have the map_groups pointer in a more central place, avoiding the pointer in the 'struct map' that have tons of instances. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge tag 'perf-core-for-mingo-5.5-20191107' of ↵Ingo Molnar2019-11-121-1/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf report: Jin Yao: - Introduce --total-cycles, for basic block profiling, further using data obtained from LBR, an example should suffice: # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Sampled Avg Avg # Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object # ....... ...... ....... ..... .................................... ................ # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] perf record: Adrian Hunter: - Allow storing perf.data in a directory together with a copy of /proc/kcore. Jiwei Sun: - Add support for limit perf output file size, i.e.: # perf record --all-cpus -F 10000 --max-size=4M sleep 10h [ perf record: perf size limit reached (4097 KB), stopping session ] [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ] Terminated # ls -lah perf.data -rw-------. 1 root root 4.1M Nov 7 15:27 perf.data # perf stat: Jiri Olsa: - Add --per-node agregation support: In live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles ... perf probe: Masami Hiramatsu: Various fixes related to recent additions to the DWARF format: - Fix to find range-only function instance - Walk function lines in lexical blocks - Fix to show function entry line as probe-able - Fix wrong address verification - Fix to probe a function which has no entry pc - Fix to probe an inline function which has no entry pc - Fix to list probe event with correct line number - Fix to show inlined function callsite without entry_pc - Fix to show ranges of variables in functions without entry_pc - Return a better scope DIE if there is no best scope - Skip end-of-sequence and non statement lines - Filter out instances except for inlined subroutine and subprogram - Fix to show calling lines of inlined functions - Skip overlapped location on searching variables perf inject: Adrian Hunter: - Do not strip evsels with --strip, as they are needed for create_gcov (see the autofdo example in tools/perf/Documentation/intel-pt.txt). Intel PT: Adrian Hunter: - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid repeated decoding. Add an auxtrace_cache__remove to handle text poke events. core: Andi Kleen: - Always preserve errno while cleaning up perf_event_open failures. llvm: Arnaldo Carvalho de Melo: - No need to tell that the request for saving a .o file for BPF events, as expressed in ~/.perfconfig was satisfied, make that a debug message. perf vendor events: Intel: Haiyan Song: - Update CascadelakeX events to v1.05. - Update all the Intel JSON metrics from TMAM 3.6. Treewide: Ian Rogers: - Improve error paths, plugging leaks found using LLVM tools such as libFuzzer. jevents: Yunfeng Ye: - Fix resource leak in process_mapfile() and main() perf kvm: Igor Lubashev: - Use evlist layer api when possible. libsubcmd: James Clark: - Move EXTRA_FLAGS to the end to allow overriding existing flags. - Use -O0 with DEBUG=1 perf diff: Jin Yao: - Don't use hack to skip column length calculation CoreSight ETM: Leo yan: - Fix definition of macro TO_CS_QUEUE_NR ARM64: John Garry: - Do not try to include libelf header files when its feature detection failed, fixing the cross build for ARM64. perf tests: Leo Yan: - Fix out of bounds memory access in the backward ring buffer test. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf hist: Support block formats with compare/sort/displayJin Yao2019-11-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides helper routines to support new columns for block info output. The new columns are: Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object v5: --- 1. Move more block related functions from builtin-report.c to block-info.c 2. Set ms (map+sym) in block hist_entry. Because this info is needed for reporting the block range (i.e. source line) Committer notes: Remove unused set_fmt() function, some build were not completing with: util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function] static inline void set_fmt(struct block_fmt *block_fmt, ^ 1 error generated. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf hist: Count the total cycles of all samplesJin Yao2019-11-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can get the per sample cycles by hist__account_cycles(). It's also useful to know the total cycles of all samples in order to get the cycles coverage for a single program block in further. For example: coverage = per block sampled cycles / total sampled cycles This patch creates a new argument 'total_cycles' in hist__account_cycles(), which will be added with the cycles of each sample. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf block: Cleanup and refactor block info functionsJin Yao2019-11-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have already implemented some block-info related functions. Now it's time to do some cleanup, refactoring and move the functions and structures to new block-info.h/block-info.c. v4: --- Move code for skipping column length calculation to patch: 'perf diff: Don't use hack to skip column length calculation' v3: --- 1. Rename the patch title 2. Rename from block.h/block.c to block-info.h/block-info.c 3. Move more common part to block-info, such as block_info__process_sym. 4. Remove the nasty hack for skipping calculation of column length Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf diff: Don't use hack to skip column length calculationJin Yao2019-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we use a nasty hack to skip the hists__calc_col_len for block since this function is not very suitable for block column length calculation. This patch removes the hack code and add a check at the entry of hists__calc_col_len to skip for block case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Fix time sortingJiri Olsa2019-11-051-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | The final sort might get confused when the comparison is done over bigger numbers than int like for -s time. Check the following report for longer workloads: $ perf report -s time -F time,overhead --stdio Fix hist_entry__sort() to properly return int64_t and not possible cut int. Fixes: 043ca389a318 ("perf tools: Use hpp formats to sort final output") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org # v3.16+ Link: http://lore.kernel.org/lkml/20191104232711.16055-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf symbols: Move mem_info and branch_info out of symbol.hArnaldo Carvalho de Melo2019-08-311-0/+3
| | | | | | | | | | | | The mem_info struct goes to mem-events.h and branch_info goes to branch.h, where they belong, this way we can remove several headers from symbols.h and trim the include dependency tree more. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-aupw71xnravcsu2xoabfmhpc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove needless sort.h include directivesArnaldo Carvalho de Melo2019-08-311-0/+1
| | | | | | | | | | | | Now that sort.h isn't included by any other header, we can check where it is really needed, i.e. we can remove it and be sure that it isn't being obtained indirectly. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-tom8k0lbsxd9joprr8zpu6w1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf dsos: Move the dsos struct and its methods to separate source filesArnaldo Carvalho de Melo2019-08-311-0/+1
| | | | | | | | | | | | | So that we can reduce the header dependency tree further, in the process noticed that lots of places were getting even things like build-id routines and 'struct perf_tool' definition indirectly, so fix all those too. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ti0btma9ow5ndrytyoqdk62j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf debug: Remove needless include directives from debug.hArnaldo Carvalho de Melo2019-08-311-0/+1
| | | | | | | | | | | | All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places using things from this indirect include. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove debug.h from header files not needing itArnaldo Carvalho de Melo2019-08-291-0/+1
| | | | | | | | | | | And fix the fallout, adding it to places that must have it since they use its definitions. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-1s3jel4i26chq2g0lydoz7i3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf top: Fix event group with more than two eventsNamhyung Kim2019-08-281-18/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | The event group feature links relevant hist entries among events so that they can be displayed together. During the link process, each hist entry in non-leader events is connected to a hist entry in the leader event. This is done in order of events specified in the command line so it assumes that events are linked in the order. But 'perf top' can break the assumption since it does the link process multiple times. For example, a hist entry can be in the third event only at first so it's linked after the leader. Some time later, second event has a hist entry for it and it'll be linked after the entry of the third event. This makes the code compilicated to deal with such unordered entries. This patch simply unlink all the entries after it's printed so that they can assume the correct order after the repeated link process. Also it'd be easy to deal with decaying old entries IMHO. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190827231555.121411-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Fix --ns time sort key outputAndi Kleen2019-08-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user specified --ns, the column to print the sort time stamp wasn't wide enough to actually print the full nanoseconds. Widen the time key column width when --ns is specified. Before: % perf record -a sleep 1 % perf report --sort time,overhead,symbol --stdio --ns ... 2.39% 187851.10000 [k] smp_call_function_single - - 1.53% 187851.10000 [k] intel_idle - - 0.59% 187851.10000 [.] __wcscmp_ifunc - - 0.33% 187851.10000 [.] 0000000000000000 - - 0.28% 187851.10000 [k] cpuidle_enter_state - - After: % perf report --sort time,overhead,symbol --stdio --ns ... 2.39% 187851.100000000 [k] smp_call_function_single - - 1.53% 187851.100000000 [k] intel_idle - - 0.59% 187851.100000000 [.] __wcscmp_ifunc - - 0.33% 187851.100000000 [.] 0000000000000000 - - 0.28% 187851.100000000 [k] cpuidle_enter_state - - Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190823210338.12360-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Remove dummy entries when finding real ones.Arnaldo Carvalho de Melo2019-08-121-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When he have an event group we have multiple struct hist instances, one per evsel, and in each of these hists we may have hist_entries that point to the same thing being observed, say a symbol, i.e. if we're looking at instructions and cycles, then we'll have one hist_entry in the "instructions" evsel and another in the "cycles" evsel. We need to link those to then show one column for each. When we're looking at some other pair of events, say instructions and cache misses, we may have just the "instructions" hist entry and not one for "cache misses", as instructions not necessarily generate cache misses, as the logic expects one hist_entry per evsel, we end up adding "dummy" hist_entries. This is enough for 'perf report', that does this matching operation (hists__match()) just once after processing all events, but for 'perf top', we do this at each refresh, so we may finally find events matching and then we need to trow away the dummies and link with the real events. So if we find a match, traverse the link of matches and trow away dummies for that hists. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-dwvtjqqifsbsczeb35q6mqkk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hists: Do not link a pair if already linkedArnaldo Carvalho de Melo2019-08-121-1/+1
| | | | | | | | | | | | | | | | | | | When we have multiple events in a group we link hist_entries in the non-leader evsel hists to the one in the leader that points to the same sorting criteria, in hists__match(). For 'perf report' we do this just once and then print the results, but for 'perf top' we need to look if this was already done in the previous refresh of the screen, so check for that and don't try to link again. This is part of having 'perf top' using the hists browser for showing multiple events in multiple columns. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-iwvb37rgb7upswhruwpcdnhw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* libperf: Move nr_members from perf's evsel to libperf's perf_evselJiri Olsa2019-07-291-1/+1
| | | | | | | | | | | | | | Move the nr_members member from perf's evsel to libperf's perf_evsel. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-60-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* libperf: Move perf_event_attr field from perf's evsel to libperf's perf_evselJiri Olsa2019-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the perf_event_attr struct fron 'struct evsel' to 'struct perf_evsel'. Committer notes: Fixed up these: tools/perf/arch/arm/util/auxtrace.c tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/arm-spe.c tools/perf/arch/s390/util/auxtrace.c tools/perf/util/cs-etm.c Also cc1: warnings being treated as errors tests/sample-parsing.c: In function 'do_test': tests/sample-parsing.c:162: error: missing initializer tests/sample-parsing.c:162: error: (near initialization for 'evsel.core.cpus') struct evsel evsel = { .needs_swap = false, - .core.attr = { - .sample_type = sample_type, - .read_format = read_format, + .core = { + . attr = { + .sample_type = sample_type, + .read_format = read_format, + }, [perfbuilder@a70e4eeb5549 /]$ gcc --version |& head -1 gcc (GCC) 4.4.7 Also we don't need to include perf_event.h in tools/perf/lib/include/perf/evsel.h, forward declaring 'struct perf_event_attr' is enough. And this even fixes the build in some systems where things are used somewhere down the include path from perf_event.h without defining __always_inline. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-43-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Rename struct perf_evlist to struct evlistJiri Olsa2019-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | Rename struct perf_evlist to struct evlist, so we don't have a name clash when we add struct perf_evlist in libperf. Committer notes: Added fixes to build on arm64, from Jiri and from me (tools/perf/util/cs-etm.c) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename struct perf_evsel to struct evselJiri Olsa2019-07-291-13/+13
| | | | | | | | | | | | | | | | | | | Rename struct perf_evsel to struct evsel, so we don't have a name clash when we add struct perf_evsel in libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Use list_del_init() more thorouglyArnaldo Carvalho de Melo2019-07-091-2/+2
| | | | | | | | | | | | To allow for destructors to check if they're operating on a object still in a list, and to avoid going from use after free list entries into still valid, or even also other already removed from list entries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Use zfree() where applicableArnaldo Carvalho de Melo2019-07-091-7/+7
| | | | | | | | | | | | | | | | | | In places where the equivalent was already being done, i.e.: free(a); a = NULL; And in placs where struct members are being freed so that if we have some erroneous reference to its struct, then accesses to freed members will result in segfaults, which we can detect faster than use after free to areas that may still have something seemingly valid. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo2019-07-091-1/+1
| | | | | | | | | | Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf diff: Print the basic block cycles diffJin Yao2019-07-021-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | $ perf record -b ./div $ perf record -b ./div Following is the default perf diff output $ perf diff # Event 'cycles' # # Baseline Delta Abs Shared Object Symbol # ........ ......... ................ .................................. # 48.75% +0.33% div [.] main 8.21% -0.20% div [.] compute_flag 19.02% -0.12% libc-2.23.so [.] __random_r 16.17% -0.09% libc-2.23.so [.] __random 2.27% -0.03% div [.] rand@plt +0.02% [i915] [k] gen8_irq_handler 5.52% +0.02% libc-2.23.so [.] rand This patch creates a new computation selection 'cycles'. $ perf diff -c cycles # Event 'cycles' # # Baseline [Program Block Range] Cycles Diff Shared Object Symbol # ........ ....................................... ......................................... # 48.75% [div.c:42 -> div.c:45] 147 div [.] main 48.75% [div.c:31 -> div.c:40] 4 div [.] main 48.75% [div.c:40 -> div.c:40] 0 div [.] main 48.75% [div.c:42 -> div.c:42] 0 div [.] main 48.75% [div.c:42 -> div.c:44] 0 div [.] main 19.02% [random_r.c:357 -> random_r.c:360] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:373] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:376] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:380] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:392] 0 libc-2.23.so [.] __random_r 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:295] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:297] 0 libc-2.23.so [.] __random 16.17% [random.c:291 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:293 -> random.c:293] 0 libc-2.23.so [.] __random 8.21% [div.c:22 -> div.c:22] 148 div [.] compute_flag 8.21% [div.c:22 -> div.c:25] 0 div [.] compute_flag 8.21% [div.c:27 -> div.c:28] 0 div [.] compute_flag 5.52% [rand.c:26 -> rand.c:27] 0 libc-2.23.so [.] rand 5.52% [rand.c:26 -> rand.c:28] 0 libc-2.23.so [.] rand 2.27% [rand@plt+0 -> rand@plt+0] 0 div [.] rand@plt 0.01% [entry_64.S:694 -> entry_64.S:694] 16 [vmlinux] [k] native_irq_return_iret 0.00% [fair.c:7676 -> fair.c:7665] 162 [vmlinux] [k] update_blocked_averages "[Program Block Range]" indicates the range of program basic block (start -> end). If we can find the source line it prints the source line otherwise it prints the symbol+offset instead. v4: --- Use source lines or symbol+offset to indicate the basic block. It should be easier to understand. v3: --- Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf. Use symbol_conf.report_block to check if executing hist_entry__block_fprintf. v2: --- Keep standard perf diff format and display the 'Baseline' and 'Shared Object'. The output is sorted by "Baseline" and the basic blocks in the same function are sorted by cycles diff. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1561713784-30533-7-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf diff: Use hists to manage basic blocks per symbolJin Yao2019-07-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hist__account_cycles() can account cycles per basic block. The basic block information is saved in cycles_hist structure. This patch processes each symbol, get basic blocks from cycles_hist and add the basic block entries to a new hists (in 'struct block_hist'). Using a hists is because we need to compare, sort and print the basic blocks later. v6: --- Since 'ops' argument is removed from hists__add_entry_block, update the code accordingly. No functional change. v5: --- Since now we still carry block_info in 'struct hist_entry' we don't need to use our own new/free ops for hist entries. And the block_info is released in hist_entry__delete. v3: --- 1. In v2, we put block stuffs in 'struct hist_entry', but it's not a good design. In v3, we create a new 'struct block_hist' and cast the 'struct hist_entry' to 'struct block_hist' in some places, which can avoid adding new stuffs in 'struct hist_entry'. 2. abs() -> labs(), in block_cycles_diff_cmp(). v2: --- v1 adds the basic block entries to per data-file hists but v2 adds the basic block entries to per symbol hists. That is to keep current perf-diff format. Will show the result in next patches. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1561713784-30533-5-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hists: Add block_info in hist_entryJin Yao2019-07-021-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | The block_info contains the program basic block information, i.e, contains the start address and the end address of this basic block and how much cycles it takes. We need to compare, sort and even print out the basic block by some orders, i.e. sort by cycles. For this purpose, we add block_info field to hist_entry. In order not to impact current interface, we creates a new function hists__add_entry_block. v6: --- Remove the 'ops' argument in hists__add_entry_block Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1561713784-30533-3-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove const from thread read accessorsNamhyung Kim2019-05-281-1/+1
| | | | | | | | | | | | | | The namespaces and comm fields of a thread are protected by rwsem and require write access for it. So it ended up using a cast to remove the const qualifier. Let's get rid of the const then. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Link: http://lkml.kernel.org/r/20190527061149.168640-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Add missing map__put() in error caseChangbin Du2019-03-191-1/+3
| | | | | | | | | | | | | | | | | | | We need to map__put() before returning from failure of sample__resolve_callchain(). Detected with gcc's ASan. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 9c68ae98c6f7 ("perf callchain: Reference count maps") Link: http://lkml.kernel.org/r/20190316080556.3075-10-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Implement browsing of individual samplesAndi Kleen2019-03-111-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now 'perf report' can show whole time periods with 'perf script', but the user still has to find individual samples of interest manually. It would be expensive and complicated to search for the right samples in the whole perf file. Typically users only need to look at a small number of samples for useful analysis. Also the full scripts tend to show samples of all CPUs and all threads mixed up, which can be very confusing on larger systems. Add a new --samples option to save a small random number of samples per hist entry. Use a reservoir sample technique to select a representatve number of samples. Then allow browsing the samples using 'perf script' as part of the hist entry context menu. This automatically adds the right filters, so only the thread or cpu of the sample is displayed. Then we use less' search functionality to directly jump the to the time stamp of the selected sample. It uses different menus for assembler and source display. Assembler needs xed installed and source needs debuginfo. Currently it only supports as many samples as fit on the screen due to some limitations in the slang ui code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Support time sort keyAndi Kleen2019-03-111-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a time sort key to perf report to display samples for different time quantums separately. This allows easier analysis of workloads that change over time, and also will allow looking at the context of samples. % perf record ... % perf report --sort time,overhead,symbol --time-quantum 1ms --stdio ... 0.67% 277061.87300 [.] _dl_start 0.50% 277061.87300 [.] f1 0.50% 277061.87300 [.] f2 0.33% 277061.87300 [.] main 0.29% 277061.87300 [.] _dl_lookup_symbol_x 0.29% 277061.87300 [.] dl_main 0.29% 277061.87300 [.] do_lookup_x 0.17% 277061.87300 [.] _dl_debug_initialize 0.17% 277061.87300 [.] _dl_init_paths 0.08% 277061.87300 [.] check_match 0.04% 277061.87300 [.] _dl_count_modids 1.33% 277061.87400 [.] f1 1.33% 277061.87400 [.] f2 1.33% 277061.87400 [.] main 1.17% 277061.87500 [.] main 1.08% 277061.87500 [.] f1 1.08% 277061.87500 [.] f2 1.00% 277061.87600 [.] main 0.83% 277061.87600 [.] f1 0.83% 277061.87600 [.] f2 1.00% 277061.87700 [.] main Committer notes: Rename 'time' argument to hist_time() to htime to overcome this in older distros: cc1: warnings being treated as errors util/hist.c: In function 'hist_time': util/hist.c:251: error: declaration of 'time' shadows a global declaration /usr/include/time.h:186: error: shadowed declaration is here Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190311144502.15423-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Fix memory leak of srclineJiri Olsa2019-03-061-2/+12
| | | | | | | | | | | | | | | | | | We can't allocate he->srcline unconditionaly, only when new hist_entry is created. Moving he->srcline allocation into hist_entry__init function. Original-patch-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de> Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Nageswara R Sastry <nasastry@in.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lkml.kernel.org/r/20190305152536.21035-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Add error path into hist_entry__initJiri Olsa2019-03-061-20/+19
| | | | | | | | | | | | | | | | Adding error path into hist_entry__init to unify error handling, so every new member does not need to free everything else. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: nageswara r sastry <nasastry@in.ibm.com> Link: http://lkml.kernel.org/r/20190305152536.21035-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Add output_resort_cb methodJiri Olsa2019-02-061-2/+8
| | | | | | | | | | | | | | Add perf_evsel__output_resort_cb() so we have an interface with a callback for each hist entry. It will be used in the following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190204141808.23031-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hists: Add argument to hists__resort_cb_t callbackJiri Olsa2019-02-061-5/+6
| | | | | | | | | | | | | | | Add argument to hists__resort_cb_t so that we can pass data from upper layers to the callback function. It will be used in the following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190204141808.23031-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add missing include <callchain.h> in various placesArnaldo Carvalho de Melo2019-02-061-0/+1
| | | | | | | | | | | Its getting it from hist.h and that will go away, as that header doesn't need callchain.h at all. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-6ebl3mwwiqocl79yts44qltu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add missing include for symbols.hArnaldo Carvalho de Melo2019-02-061-0/+1
| | | | | | | | | | | | | Several places were using definitions found in symbols.h but not including it, getting it by sheer luck from some other headers that now are in the process of removing that include because they don't need it or because simply having struct forward declarations is enough, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-xbcvvx296d70kpg9wb0qmeq9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Use cached rbtreesDavidlohr Bueso2019-01-251-87/+112
| | | | | | | | | | | | | | | | | | | | | | At the cost of an extra pointer, we can avoid the O(logN) cost of finding the first element in the tree (smallest node), which is something heavily required for histograms. Specifically, the following are converted to rb_root_cached, and users accordingly: hist::entries_in_array hist::entries_in hist::entries hist::entries_collapsed hist_entry::hroot_in hist_entry::hroot_out Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net [ Added some missing conversions to rb_first_cached() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>