summaryrefslogtreecommitdiffstats
path: root/tools/perf/util
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'perf-tools-for-v5.13-2021-04-29' of ↵Linus Torvalds2021-05-0190-300/+2588
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "perf stat: - Add support for hybrid PMUs to support systems such as Intel Alderlake and its BIG/little core/atom cpus. - Introduce 'bperf' to share hardware PMCs with BPF. - New --iostat option to collect and present IO stats on Intel hardware. This functionality is based on recently introduced sysfs attributes for Intel® Xeon® Scalable processor family (code name Skylake-SP) in commit bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") It is intended to provide four I/O performance metrics in MB per each PCIe root port: - Inbound Read: I/O devices below root port read from the host memory - Inbound Write: I/O devices below root port write to the host memory - Outbound Read: CPU reads from I/O devices below root port - Outbound Write: CPU writes to I/O devices below root port - Align CSV output for summary. - Clarify --null use cases: Assess raw overhead of 'perf stat' or measure just wall clock time. - Improve readability of shadow stats. perf record: - Change the COMM when starting tha workload so that --exclude-perf doesn't seem to be not honoured. - Improve 'Workload failed' message printing events + what was exec'ed. - Fix cross-arch support for TIME_CONV. perf report: - Add option to disable raw event ordering. - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'. - Improvements to --stat output, that shows information about PERF_RECORD_ events. - Preserve identifier id in OCaml demangler. perf annotate: - Show full source location with 'l' hotkey in the 'perf annotate' TUI. - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode. - Add --demangle and --demangle-kernel to 'perf annotate'. - Allow configuring annotate.demangle{,_kernel} in 'perf config'. - Fix sample events lost in stdio mode. perf data: - Allow converting a perf.data file to JSON. libperf: - Add support for user space counter access. - Update topdown documentation to permit rdpmc calls. perf test: - Add 'perf test' for 'perf stat' CSV output. - Add 'perf test' entries to test the hybrid PMU support. - Cleanup 'perf test daemon' if its 'perf test' is interrupted. - Handle metric reuse in pmu-events parsing 'perf test' entry. - Add test for PE executable support. - Add timeout for wait for daemon start in its 'perf test' entries. Build: - Enable libtraceevent dynamic linking. - Improve feature detection output. - Fix caching of feature checks caching. - First round of updates for tools copies of kernel headers. - Enable warnings when compiling BPF programs. Vendor specific events: - Intel: - Add missing skylake & icelake model numbers. - arm64: - Add Hisi hip08 L1, L2 and L3 metrics. - Add Fujitsu A64FX PMU events. - PowerPC: - Initial JSON/events list for power10 platform. - Remove unsupported power9 metrics. - AMD: - Add Zen3 events. - Fix broken L2 Cache Hits from L2 HWPF metric. - Use lowercases for all the eventcodes and umasks. Hardware tracing: - arm64: - Update CoreSight ETM metadata format. - Fix bitmap for CS-ETM option. - Support PID tracing in config. - Detect pid in VMID for kernel running at EL2. Arch specific updates: - MIPS: - Support MIPS unwinding and dwarf-regs. - Generate mips syscalls_n64.c syscall table. - PowerPC: - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC. - Support pipeline stage cycles for powerpc. libbeauty: - Fix fsconfig generator" * tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits) perf build: Defer printing detected features to the end of all feature checks tools build: Allow deferring printing the results of feature detection perf build: Regenerate the FEATURE_DUMP file after extra feature checks perf session: Dump PERF_RECORD_TIME_CONV event perf session: Add swap operation for event TIME_CONV perf jit: Let convert_timestamp() to be backwards-compatible perf tools: Change fields type in perf_record_time_conv perf tools: Enable libtraceevent dynamic linking perf Documentation: Document intel-hybrid support perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid perf tests: Support 'Convert perf time to TSC' test for hybrid perf tests: Support 'Session topology' test for hybrid perf tests: Support 'Parse and process metrics' test for hybrid perf tests: Support 'Track with sched_switch' test for hybrid perf tests: Skip 'Setup struct perf_event_attr' test for hybrid perf tests: Add hybrid cases for 'Roundtrip evsel->name' test perf tests: Add hybrid cases for 'Parse event definition strings' test perf record: Uniquify hybrid event name perf stat: Warn group events from different hybrid PMU perf stat: Filter out unmatched aggregation for hybrid event ...
| * perf session: Dump PERF_RECORD_TIME_CONV eventLeo Yan2021-04-293-1/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now perf tool uses the common stub function process_event_op2_stub() for dumping TIME_CONV event, thus it doesn't output the clock parameters contained in the event. This patch adds the callback function for dumping the hardware clock parameters in TIME_CONV event. Before: # perf report -D 0x978 [0x38]: event: 79 . . ... raw event: size 56 bytes . 0000: 4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00 O.....8......... . 0010: 00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff ..@........<BF><DF><FF><FF><FF> . 0020: d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00 <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>. . 0030: 01 01 00 00 00 00 00 00 ........ 0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV : unhandled! [...] After: # perf report -D 0x978 [0x38]: event: 79 . . ... raw event: size 56 bytes . 0000: 4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00 O.....8......... . 0010: 00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff ..@........<BF><DF><FF><FF><FF> . 0020: d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00 <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>. . 0030: 01 01 00 00 00 00 00 00 ........ 0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV ... Time Shift 21 ... Time Muliplier 20971520 ... Time Zero 18446743935180835206 ... Time Cycles 13852918225 ... Time Mask 0xffffffffffffff ... Cap Time Zero 1 ... Cap Time Short 1 : unhandled! [...] Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve MacLean <Steve.MacLean@Microsoft.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20210428120915.7123-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf session: Add swap operation for event TIME_CONVLeo Yan2021-04-291-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV"), the event PERF_RECORD_TIME_CONV has extended the data structure for clock parameters. To be backwards-compatible, this patch adds a dedicated swap operation for the event PERF_RECORD_TIME_CONV, based on checking if the event contains field "time_cycles", it can support both for the old and new event formats. Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve MacLean <Steve.MacLean@Microsoft.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20210428120915.7123-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf jit: Let convert_timestamp() to be backwards-compatibleLeo Yan2021-04-291-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d110162cafc80dad ("perf tsc: Support cap_user_time_short for event TIME_CONV") supports the extended parameters for event TIME_CONV, but it broke the backwards compatibility, so any perf data file with old event format fails to convert timestamp. This patch introduces a helper event_contains() to check if an event contains a specific member or not. For the backwards-compatibility, if the event size confirms the extended parameters are supported in the event TIME_CONV, then copies these parameters. Committer notes: To make this compiler backwards compatible add this patch: - struct perf_tsc_conversion tc = { 0 }; + struct perf_tsc_conversion tc = { .time_shift = 0, }; Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve MacLean <Steve.MacLean@Microsoft.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20210428120915.7123-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Warn group events from different hybrid PMUJin Yao2021-04-295-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a group has events which are from different hybrid PMUs, shows a warning: "WARNING: events in group from different hybrid PMUs!" This is to remind the user not to put the core event and atom event into one group. Next, just disable grouping. # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1 WARNING: events in group from different hybrid PMUs! WARNING: grouped events cpus do not match, disabling group: anon group { cpu_core/cycles/, cpu_atom/cycles/ } Performance counter stats for 'system wide': 5,438,125 cpu_core/cycles/ 3,914,586 cpu_atom/cycles/ 1.004250966 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-17-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Filter out unmatched aggregation for hybrid eventJin Yao2021-04-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf-stat has supported some aggregation modes, such as --per-core, --per-socket and etc. While for hybrid event, it may only available on part of cpus. So for --per-core, we need to filter out the unavailable cores, for --per-socket, filter out the unavailable sockets, and so on. Before: # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 479,530 cpu_core/cycles/ S0-D0-C4 2 175,007 cpu_core/cycles/ S0-D0-C8 2 166,240 cpu_core/cycles/ S0-D0-C12 2 704,673 cpu_core/cycles/ S0-D0-C16 2 865,835 cpu_core/cycles/ S0-D0-C20 2 2,958,461 cpu_core/cycles/ S0-D0-C24 2 163,988 cpu_core/cycles/ S0-D0-C28 2 164,729 cpu_core/cycles/ S0-D0-C32 0 <not counted> cpu_core/cycles/ S0-D0-C33 0 <not counted> cpu_core/cycles/ S0-D0-C34 0 <not counted> cpu_core/cycles/ S0-D0-C35 0 <not counted> cpu_core/cycles/ S0-D0-C36 0 <not counted> cpu_core/cycles/ S0-D0-C37 0 <not counted> cpu_core/cycles/ S0-D0-C38 0 <not counted> cpu_core/cycles/ S0-D0-C39 0 <not counted> cpu_core/cycles/ 1.003597211 seconds time elapsed After: # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 210,428 cpu_core/cycles/ S0-D0-C4 2 444,830 cpu_core/cycles/ S0-D0-C8 2 435,241 cpu_core/cycles/ S0-D0-C12 2 423,976 cpu_core/cycles/ S0-D0-C16 2 859,350 cpu_core/cycles/ S0-D0-C20 2 1,559,589 cpu_core/cycles/ S0-D0-C24 2 163,924 cpu_core/cycles/ S0-D0-C28 2 376,610 cpu_core/cycles/ 1.003621290 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Co-developed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf record: Create two hybrid 'cycles' events by defaultJin Yao2021-04-296-5/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When evlist is empty, for example no '-e' specified in perf record, one default 'cycles' event is added to evlist. While on hybrid platform, it needs to create two default 'cycles' events. One is for cpu_core, the other is for cpu_atom. This patch actually calls evsel__new_cycles() two times to create two 'cycles' events. # ./perf record -vv -a -- sleep 1 ... ------------------------------------------------------------ perf_event_attr: size 120 config 0x400000000 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 21 ------------------------------------------------------------ perf_event_attr: size 120 config 0x800000000 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 27 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 28 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 29 ------------------------------------------------------------ We have to create evlist-hybrid.c otherwise due to the symbol dependency the perf test python would be failed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-14-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf parse-events: Support event inside hybrid pmuJin Yao2021-04-291-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On hybrid platform, user may want to enable events on one pmu. Following syntax are supported: cpu_core/<event>/ cpu_atom/<event>/ But the syntax doesn't work for cache event. Before: # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1 event syntax error: 'cpu_core/LLC-loads/' \___ unknown term 'LLC-loads' for pmu 'cpu_core' Cache events are a bit complex. We can't create aliases for them. We use another solution. For example, if we use "cpu_core/LLC-loads/", in parse_events_add_pmu(), term->config is "LLC-loads". Then we create a new parser to scan "LLC-loads". The parse_events_add_cache() would be called during parsing. The parse_state->hybrid_pmu_name is used to identify the pmu where the event should be enabled on. After: # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1 Performance counter stats for 'system wide': 24,593 cpu_core/LLC-loads/ 1.003911601 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-13-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf parse-events: Compare with hybrid pmu nameJin Yao2021-04-295-8/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On hybrid platform, user may want to enable event only on one pmu. Following syntax will be supported: cpu_core/<event>/ cpu_atom/<event>/ For hardware event, hardware cache event and raw event, two events are created by default. We pass the specified pmu name in parse_state and it would be checked before event creation. So next only the event with the specified pmu would be created. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-12-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf parse-events: Create two hybrid raw eventsJin Yao2021-04-291-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On hybrid platform, same raw event is possible to be available on both cpu_core pmu and cpu_atom pmu. It's supported to create two raw events for one event encoding. For raw events, the attr.type is PMU type. # perf stat -e r3c -a -vv -- sleep 1 Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 4 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: type 4 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 ------------------------------------------------------------ perf_event_attr: type 8 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: type 8 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 r3c: 0: 434449 1001412521 1001412521 r3c: 1: 173162 1001482031 1001482031 r3c: 2: 231710 1001524974 1001524974 r3c: 3: 110012 1001563523 1001563523 r3c: 4: 191517 1001593221 1001593221 r3c: 5: 956458 1001628147 1001628147 r3c: 6: 416969 1001715626 1001715626 r3c: 7: 1047527 1001596650 1001596650 r3c: 8: 103877 1001633520 1001633520 r3c: 9: 70571 1001637898 1001637898 r3c: 10: 550284 1001714398 1001714398 r3c: 11: 1257274 1001738349 1001738349 r3c: 12: 107797 1001801432 1001801432 r3c: 13: 67471 1001836281 1001836281 r3c: 14: 286782 1001923161 1001923161 r3c: 15: 815509 1001952550 1001952550 r3c: 0: 95994 1002071117 1002071117 r3c: 1: 105570 1002142438 1002142438 r3c: 2: 115921 1002189147 1002189147 r3c: 3: 72747 1002238133 1002238133 r3c: 4: 103519 1002276753 1002276753 r3c: 5: 121382 1002315131 1002315131 r3c: 6: 80298 1002248050 1002248050 r3c: 7: 466790 1002278221 1002278221 r3c: 6821369 16026754282 16026754282 r3c: 1162221 8017758990 8017758990 Performance counter stats for 'system wide': 6,821,369 cpu_core/r3c/ 1,162,221 cpu_atom/r3c/ 1.002289965 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-11-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf parse-events: Create two hybrid cache eventsJin Yao2021-04-293-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For cache events, they have pre-defined configs. The kernel needs to know where the cache event comes from (e.g. from cpu_core pmu or from cpu_atom pmu). But the perf type PERF_TYPE_HW_CACHE can't carry pmu information. Now the type PERF_TYPE_HW_CACHE is extended to be PMU aware type. The PMU type ID is stored at attr.config[63:32]. When enabling a hybrid cache event without specified pmu, such as, 'perf stat -e LLC-loads -a', two events are created automatically. One is for atom, the other is for core. # perf stat -e LLC-loads -a -vv -- sleep 1 Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 3 size 120 config 0x400000002 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: type 3 size 120 config 0x400000002 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 ------------------------------------------------------------ perf_event_attr: type 3 size 120 config 0x800000002 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: type 3 size 120 config 0x800000002 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 LLC-loads: 0: 1507 1001800280 1001800280 LLC-loads: 1: 666 1001812250 1001812250 LLC-loads: 2: 3353 1001813453 1001813453 LLC-loads: 3: 514 1001848795 1001848795 LLC-loads: 4: 627 1001952832 1001952832 LLC-loads: 5: 4399 1001451154 1001451154 LLC-loads: 6: 1240 1001481052 1001481052 LLC-loads: 7: 478 1001520348 1001520348 LLC-loads: 8: 691 1001551236 1001551236 LLC-loads: 9: 310 1001578945 1001578945 LLC-loads: 10: 1018 1001594354 1001594354 LLC-loads: 11: 3656 1001622355 1001622355 LLC-loads: 12: 882 1001661416 1001661416 LLC-loads: 13: 506 1001693963 1001693963 LLC-loads: 14: 3547 1001721013 1001721013 LLC-loads: 15: 1399 1001734818 1001734818 LLC-loads: 0: 1314 1001793826 1001793826 LLC-loads: 1: 2857 1001752764 1001752764 LLC-loads: 2: 646 1001830694 1001830694 LLC-loads: 3: 1612 1001864861 1001864861 LLC-loads: 4: 2244 1001912381 1001912381 LLC-loads: 5: 1255 1001943889 1001943889 LLC-loads: 6: 4624 1002021109 1002021109 LLC-loads: 7: 2703 1001959302 1001959302 LLC-loads: 24793 16026838264 16026838264 LLC-loads: 17255 8015078826 8015078826 Performance counter stats for 'system wide': 24,793 cpu_core/LLC-loads/ 17,255 cpu_atom/LLC-loads/ 1.001970988 seconds time elapsed 0x4 in 0x400000002 indicates the cpu_core pmu. 0x8 in 0x800000002 indicates the cpu_atom pmu. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-10-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf parse-events: Create two hybrid hardware eventsJin Yao2021-04-295-0/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current hardware events has special perf types PERF_TYPE_HARDWARE. But it doesn't pass the PMU type in the user interface. For a hybrid system, the perf kernel doesn't know which PMU the events belong to. So now this type is extended to be PMU aware type. The PMU type ID is stored at attr.config[63:32]. PMU type ID is retrieved from sysfs. root@lkp-adl-d01:/sys/devices/cpu_atom# cat type 8 root@lkp-adl-d01:/sys/devices/cpu_core# cat type 4 When enabling a hybrid hardware event without specified pmu, such as, 'perf stat -e cycles -a', two events are created automatically. One is for atom, the other is for core. # perf stat -e cycles -a -vv -- sleep 1 Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: size 120 config 0x400000000 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: size 120 config 0x400000000 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 ------------------------------------------------------------ perf_event_attr: size 120 config 0x800000000 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: size 120 config 0x800000000 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 cycles: 0: 836272 1001525722 1001525722 cycles: 1: 628564 1001580453 1001580453 cycles: 2: 872693 1001605997 1001605997 cycles: 3: 70417 1001641369 1001641369 cycles: 4: 88593 1001726722 1001726722 cycles: 5: 470495 1001752993 1001752993 cycles: 6: 484733 1001840440 1001840440 cycles: 7: 1272477 1001593105 1001593105 cycles: 8: 209185 1001608616 1001608616 cycles: 9: 204391 1001633962 1001633962 cycles: 10: 264121 1001661745 1001661745 cycles: 11: 826104 1001689904 1001689904 cycles: 12: 89935 1001728861 1001728861 cycles: 13: 70639 1001756757 1001756757 cycles: 14: 185266 1001784810 1001784810 cycles: 15: 171094 1001825466 1001825466 cycles: 0: 129624 1001854843 1001854843 cycles: 1: 122533 1001840421 1001840421 cycles: 2: 90055 1001882506 1001882506 cycles: 3: 139607 1001896463 1001896463 cycles: 4: 141791 1001907838 1001907838 cycles: 5: 530927 1001883880 1001883880 cycles: 6: 143246 1001852529 1001852529 cycles: 7: 667769 1001872626 1001872626 cycles: 6744979 16026956922 16026956922 cycles: 1965552 8014991106 8014991106 Performance counter stats for 'system wide': 6,744,979 cpu_core/cycles/ 1,965,552 cpu_atom/cycles/ 1.001882711 seconds time elapsed 0x4 in 0x400000000 indicates the cpu_core pmu. 0x8 in 0x800000000 indicates the cpu_atom pmu. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-9-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Uniquify hybrid event nameJin Yao2021-04-293-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would be useful to let user know the pmu which the event belongs to. perf-stat has supported '--no-merge' option and it can print the pmu name after the event name, such as: "cycles [cpu_core]" Now this option is enabled by default for hybrid platform but change the format to: "cpu_core/cycles/" If user configs the name, we still use the user specified name. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> ink: https://lore.kernel.org/r/20210427070139.25256-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf pmu: Add hybrid helper functionsJin Yao2021-04-294-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu can be used to identify the hybrid platform and return the found hybrid cpu pmu. All the detected hybrid pmus have been saved in 'perf_pmu__hybrid_pmus' list. So we just need to search this list. perf_pmu__hybrid_type_to_pmu converts the user specified string to hybrid pmu name. This is used to support the '--cputype' option in next patches. perf_pmu__has_hybrid checks the existing of hybrid pmu. Note that, we have to define it in pmu.c (make pmu-hybrid.c no more symbol dependency), otherwise perf test python would be failed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf pmu: Save detected hybrid pmus to a global pmu listJin Yao2021-04-295-1/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We identify the cpu_core pmu and cpu_atom pmu by explicitly checking following files: For cpu_core, checks: "/sys/bus/event_source/devices/cpu_core/cpus" For cpu_atom, checks: "/sys/bus/event_source/devices/cpu_atom/cpus" If the 'cpus' file exists and it has data, the pmu exists. But in order not to hardcode the "cpu_core" and "cpu_atom", and make the code in a generic way. So if the path "/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the hybrid pmu exists. All the detected hybrid pmus are linked to a global list 'perf_pmu__hybrid_pmus' and then next we just need to iterate the list to get all hybrid pmu by using perf_pmu__for_each_hybrid_pmu. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf pmu: Save pmu nameJin Yao2021-04-292-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On hybrid platform, one event is available on one pmu (such as, available on cpu_core or on cpu_atom). This patch saves the pmu name to the pmu field of struct perf_pmu_alias. Then next we can know the pmu which the event can be enabled on. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf pmu: Simplify arguments of __perf_pmu__new_aliasJin Yao2021-04-291-20/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the arguments of __perf_pmu__new_alias() by passing the whole 'struct pme_event' pointer. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf report: Add --skip-empty option to suppress 0 event statNamhyung Kim2021-04-295-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make the output more readable, I think it's better to remove 0's in the output. Also the dummy event has no event stats so it just wasts the space. Let's use the --skip-empty option to suppress it. $ perf report --stat --skip-empty Aggregated stats: TOTAL events: 16530 MMAP events: 226 COMM events: 1596 EXIT events: 2 THROTTLE events: 121 UNTHROTTLE events: 117 FORK events: 1595 SAMPLE events: 719 MMAP2 events: 12147 CGROUP events: 2 FINISHED_ROUND events: 2 THREAD_MAP events: 1 CPU_MAP events: 1 TIME_CONV events: 1 cycles stats: SAMPLE events: 719 Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427013717.1651674-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf hists: Split hists_stats from events_statsNamhyung Kim2021-04-293-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each struct hists have events_stats but most of the fields were not used. It's to count number of samples and periods whether filtered or not. And other fields are used only by evlist. So it'd be better to split hists_stats and events_stats to reduce wasted memory in the struct hists. This makes the output of event statistics in the perf report compact by skipping 0 events in each evsel/hists. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427013717.1651674-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf data: Add JSON exportNicholas Fraser2021-04-295-12/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a feature to export perf data to JSON. The resolved symbols are exported into the JSON so that external tools don't need to load the dsos themselves (or even have access to them at all.) This makes it easy to load and analyze perf data with standalone tools where direct perf or libbabeltrace integration is impractical. The exporter uses a minimal inline JSON encoding without any external dependencies. Currently it only outputs some headers and sample metadata but it's easily extensible. Use it like this: $ perf data convert --to-json out.json Committer notes: Fixup a __printf() bug that broke the build: util/data-convert-json.c:103:11: error: expected ‘)’ before numeric constant 103 | __(printf, 5, 6) | ^~ | ) util/data-convert-json.c: In function ‘output_sample_callchain_entry’: util/data-convert-json.c:124:2: error: implicit declaration of function ‘output_json_key_format’; did you mean ‘output_json_format’? [-Werror=implicit-function-declaration] 124 | output_json_key_format(out, false, 5, "ip", "\"0x%" PRIx64 "\"", ip); | ^~~~~~~~~~~~~~~~~~~~~~ | output_json_format Also had to add this patch to fix errors reported by various versions of clang: - if (al && al->sym && al->sym->name && strlen(al->sym->name) > 0) { + if (al && al->sym && al->sym->namelen) { al->sym->name is a zero sized array, to avoid one extra alloc in the symbol__new() constructor, sym->namelen carries its strlen. Committer testing: $ ls -la out.json ls: cannot access 'out.json': No such file or directory $ perf record sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ] $ perf report --stats | grep -w SAMPLE SAMPLE events: 8 $ perf data convert --to-json out.json [ perf data convert: Converted 'perf.data' into JSON data 'out.json' ] [ perf data convert: Converted and wrote 0.002 MB (8 samples) ] $ ls -la out.json -rw-rw-r--. 1 acme acme 2017 Apr 26 17:29 out.json $ cat out.json { "linux-perf-json-version": 1, "headers": { "header-version": 1, "captured-on": "2021-04-26T20:28:57Z", "data-offset": 432, "data-size": 1016, "feat-offset": 1448, "hostname": "five", "os-release": "5.11.14-200.fc33.x86_64", "arch": "x86_64", "cpu-desc": "AMD Ryzen 9 3900X 12-Core Processor", "cpuid": "AuthenticAMD,23,113,0", "nrcpus-online": 24, "nrcpus-avail": 24, "perf-version": "5.12.gee134f3189bd", "cmdline": [ "/home/acme/bin/perf", "record", "sleep", "0.1" ] }, "samples": [ { "timestamp": 170517539043684, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0xffffffffa6268827" } ] }, { "timestamp": 170517539048443, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0xffffffffa661359d" } ] }, { "timestamp": 170517539051018, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0xffffffffa6311e18" } ] }, { "timestamp": 170517539053652, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0x7fdb77b4812b", "symbol": "_dl_start", "dso": "ld-2.32.so" } ] }, { "timestamp": 170517539055306, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0xffffffffa6269286" } ] }, { "timestamp": 170517539057590, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0xffffffffa62abd8b" } ] }, { "timestamp": 170517539067559, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0x7fdb77b5e9e9", "symbol": "__GI___tunables_init", "dso": "ld-2.32.so" } ] }, { "timestamp": 170517539282452, "pid": 375844, "tid": 375844, "comm": "sleep", "callchain": [ { "ip": "0x7fdb779978d2", "symbol": "getenv", "dso": "libc-2.32.so" } ] } ] } $ Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/3884969f-804d-2f53-c648-e2b0bd85edff@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Introduce bpf_counter_ops->disable()Song Liu2021-04-294-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce bpf_counter_ops->disable(), which is used stop counting the event. Committer notes: Added a dummy bpf_counter__disable() to the python binding to avoid having 'perf test python' failing. bpf_counter isn't supported in the python binding. Signed-off-by: Song Liu <song@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: kernel-team@fb.com Link: https://lore.kernel.org/r/20210425214333.1090950-6-song@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Introduce ':b' modifierSong Liu2021-04-294-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce 'b' modifier to event parser, which means use BPF program to manage this event. This is the same as --bpf-counters option, but only applies to this event. For example, perf stat -e cycles:b,cs # use bpf for cycles, but not cs perf stat -e cycles,cs --bpf-counters # use bpf for both cycles and cs Suggested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20210425214333.1090950-5-song@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Introduce config stat.bpf-counter-eventsSong Liu2021-04-295-6/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, to use BPF to aggregate perf event counters, the user uses --bpf-counters option. Enable "use bpf by default" events with a config option, stat.bpf-counter-events. Events with name in the option will use BPF. This also enables mixed BPF event and regular event in the same sesssion. For example: perf config stat.bpf-counter-events=instructions perf stat -e instructions,cs The second command will use BPF for "instructions" but not "cs". Signed-off-by: Song Liu <song@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20210425214333.1090950-4-song@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf bpf: check perf_attr_map is compatible with the perf binarySong Liu2021-04-291-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_attr_map could be shared among different version of perf binary. Add bperf_attr_map_compatible() to check whether the existing attr_map is compatible with current perf binary. Signed-off-by: Song Liu <song@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: kernel-team@fb.com Link: https://lore.kernel.org/r/20210425214333.1090950-3-song@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf util: Move bpf_perf definitions to a libperf headerSong Liu2021-04-291-24/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By following the same protocol, other tools can share hardware PMCs with perf. Move perf_event_attr_map_entry and BPF_PERF_DEFAULT_ATTR_MAP_PATH to bpf_perf.h for other tools to use. Signed-off-by: Song Liu <song@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: kernel-team@fb.com Link: https://lore.kernel.org/r/20210425214333.1090950-2-song@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo2021-04-262-3/+6
| |\ | | | | | | | | | | | | | | | To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf data: Fix error return code in perf_data__create_dir()Zhen Lei2021-04-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although 'ret' has been initialized to -1, but it will be reassigned by the "ret = open(...)" statement in the for loop. So that, the value of 'ret' is unknown when asprintf() failed. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf annotate: Add line number like in TUI and source location at EOLMartin Liška2021-04-201-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch changes the output format in 2 ways: - line number is displayed for all source lines (matching TUI mode) - source locations for the hottest lines are printed at the line end in order to preserve layout Before: 0.00 : 405ef1: inc %r15 : tmpsd * (TD + tmpsd * TDD))); 0.01 : 405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3 # 4318b0 <_IO_stdin_used+0x8b0> : tmpsd * (TC + eff.c:1811 0.67 : 405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3 # 4318b8 <_IO_stdin_used+0x8b8> : TA + tmpsd * (TB + 0.35 : 405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3 # 4318c0 <_IO_stdin_used+0x8c0> : dumbo = eff.c:1809 1.41 : 405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3 # 4318c8 <_IO_stdin_used+0x8c8> : sumi -= sj * tmpsd * dij2i * dumbo; eff.c:1813 2.58 : 405f18: vmulsd %xmm3,%xmm0,%xmm0 2.81 : 405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0 3.78 : 405f23: vmovsd %xmm0,0x30(%rsp) : for (k = 0; k < lpears[i] + upears[i]; k++) { eff.c:1761 0.90 : 405f29: cmp %r15d,%r12d After: 0.00 : 405ef1: inc %r15 : 1812 tmpsd * (TD + tmpsd * TDD))); 0.01 : 405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3 # 4318b0 <_IO_stdin_used+0x8b0> : 1811 tmpsd * (TC + 0.67 : 405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3 # 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811 : 1810 TA + tmpsd * (TB + 0.35 : 405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3 # 4318c0 <_IO_stdin_used+0x8c0> : 1809 dumbo = 1.41 : 405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3 # 4318c8 <_IO_stdin_used+0x8c8> // eff.c:1809 : 1813 sumi -= sj * tmpsd * dij2i * dumbo; 2.58 : 405f18: vmulsd %xmm3,%xmm0,%xmm0 // eff.c:1813 2.81 : 405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0 3.78 : 405f23: vmovsd %xmm0,0x30(%rsp) : 1761 for (k = 0; k < lpears[i] + upears[i]; k++) { Where e.g. '// eff.c:1811' shares the same color as the percentantage at the line beginning. Signed-off-by: Martin Liška <mliska@suse.cz> Link: http://lore.kernel.org/lkml/a0d53f31-f633-5013-c386-a4452391b081@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf stat: Basic support for iostat in perfAlexander Antonov2021-04-206-11/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic flow for a new iostat mode in perf. Mode is intended to provide four I/O performance metrics per each PCIe root port: Inbound Read, Inbound Write, Outbound Read, Outbound Write. The actual code to compute the metrics and attribute it to root port is in follow-on patches. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210419094147.15909-2-alexander.antonov@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf evlist: Add a method to return the list of evsels as a stringArnaldo Carvalho de Melo2021-04-152-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a 'scnprintf' method to obtain the list of evsels in a evlist as a string, excluding the "dummy" event used for things like receiving metadata events (PERF_RECORD_FORK, MMAP, etc) when synthesizing preexisting threads. Will be used to improve the error message for workload failure in 'perf record. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20210414131628.2064862-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo2021-04-132-4/+6
| |\ \ | | | | | | | | | | | | | | | | | | | | To pick up fixes from perf/urgent that got into upstream. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf pmu: Add pmu_events_map__find() function to find the common PMU map for ↵John Garry2021-04-084-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the system Add a function to find the common PMU map for the system. For arm64, a special variant is added. This is because arm64 supports heterogeneous CPU systems. As such, it cannot be guaranteed that the cpumap is same for all CPUs. So in case of heterogeneous systems, don't return a cpumap. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Paul A. Clarke <pc@us.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1617791570-165223-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf metricgroup: Make find_metric() public with name changeJohn Garry2021-04-082-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function find_metric() is required for the metric processing in the pmu-events testcase, so make it public. Also change the name to include "metricgroup". Tested-by: Paul A. Clarke <pc@us.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/1617791570-165223-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf mem-events: Remove unnecessary 'struct mem_info' forward declarationWan Jiabing2021-04-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'struct mem_info' is defined at 22nd line. The declaration here is unnecessary. Remove it. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kael_w@yeah.net Link: http://lore.kernel.org/lkml/20210406105104.675879-1-wanjiabing@vivo.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf evsel: Remove duplicate 'struct target' forward declarationWan Jiabing2021-04-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'struct target' is declared twice. One has been declared at 21st line. Remove the duplicate. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kael_w@yeah.net Link: http://lore.kernel.org/lkml/20210401062424.991737-1-wanjiabing@vivo.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf tools: Preserve identifier id in OCaml demanglerFabian Hemmer2021-03-301-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some OCaml developers reported that this bit of information is sometimes useful for disambiguating functions for which the OCaml compiler assigns the same name, e.g. nested or inlined functions. Signed-off-by: Fabian Hemmer <copy@copy.sh> Link: http://lore.kernel.org/lkml/20210226075223.p3s5oz4jbxwnqjtv@nyu Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo2021-03-297-12/+57
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | To pick up fixes sent via perf/urgent and in the BPF tools/ directories. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf sort: Display sort dimension p_stage_cyc only on supported archsAthira Rajeev2021-03-262-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sort dimension "p_stage_cyc" is used to represent pipeline stage cycle information. Presently, this is used only in powerpc. For unsupported platforms, we don't want to display it in the perf report output columns. Hence add check in sort_dimension__add() and skip the sort key incase it is not applicable for the particular arch. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-6-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf tools: Support pipeline stage cycles for powerpcAthira Rajeev2021-03-266-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pipeline stage cycles details can be recorded on powerpc from the contents of Performance Monitor Unit (PMU) registers. On ISA v3.1 platform, sampling registers exposes the cycles spent in different pipeline stages. Patch adds perf tools support to present two of the cycle counter information along with memory latency (weight). Re-use the field 'ins_lat' for storing the first pipeline stage cycle. This is stored in 'var2_w' field of 'perf_sample_weight'. Add a new field 'p_stage_cyc' to store the second pipeline stage cycle which is stored in 'var3_w' field of perf_sample_weight. Add new sort function 'Pipeline Stage Cycle' and include this in default_mem_sort_order[]. This new sort function may be used to denote some other pipeline stage in another architecture. So add this to list of sort entries that can have dynamic header string. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf sort: Add dynamic headers for perf report columnsAthira Rajeev2021-03-262-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the header string for different columns in perf report is fixed. Some fields of perf sample could have different meaning for different architectures than the meaning conveyed by the header string. An example is the new field 'var2_w' of perf_sample_weight structure. This is presently captured as 'Local INSTR Latency' in perf mem report. But this could be used to denote a different latency cycle in another architecture. Introduce a weak function arch_perf_header_entry() to set the arch specific header string for the fields which can contain dynamic header. If the architecture do not have this function, fall back to the default header string value. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-3-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf tools: Remove duplicate struct forward declarationsWan Jiabing2021-03-252-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'struct evlist' has been declared at 10th line. 'struct comm' has been declared at 15th line. Remove the duplicates Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kael_w@yeah.net Link: http://lore.kernel.org/lkml/20210325043947.846093-1-wanjiabing@vivo.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf stat: Align CSV output for summary modeJin Yao2021-03-243-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'perf stat' subcommand supports the request for a summary of the interval counter readings. But the summary lines break the CSV output so it's hard for scripts to parse the result. Before: # perf stat -x, -I1000 --interval-count 1 --summary 1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized 1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec 1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec 1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec 1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz 1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle 1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec 1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches 8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized 270,,context-switches,8013513297,100.00,0.034,K/sec 13,,cpu-migrations,8013530032,100.00,0.002,K/sec 184,,page-faults,8013546992,100.00,0.023,K/sec 20574191,,cycles,8013551506,100.00,0.003,GHz 10562267,,instructions,8013564958,100.00,0.51,insn per cycle 2019244,,branches,8013575673,100.00,0.252,M/sec 106152,,branch-misses,8013585776,100.00,5.26,of all branches The summary line loses the timestamp column, which breaks the CSV output. We add a column at the original 'timestamp' position and it just says 'summary' for the summary line. After: # perf stat -x, -I1000 --interval-count 1 --summary 1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized 1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec 1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec 1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec 1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz 1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle 1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec 1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized summary,218,,context-switches,8012753271,100.00,0.027,K/sec summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec summary,0,,page-faults,8012786257,100.00,0.000,K/sec summary,15004518,,cycles,8012790637,100.00,0.002,GHz summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle summary,1590259,,branches,8012814766,100.00,0.198,M/sec summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches Now it's easy for script to analyse the summary lines. Of course, we also consider not to break possible existing scripts which can continue to use the broken CSV format by using a new '--no-csv-summary.' option. # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary 1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized 1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec 1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec 1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec 1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz 1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle 1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec 1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches 8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized 197,,context-switches,8012703742,100.00,24.586,/sec 9,,cpu-migrations,8012720902,100.00,1.123,/sec 644,,page-faults,8012738266,100.00,80.373,/sec 18350698,,cycles,8012744109,100.00,0.002,GHz 12745021,,instructions,8012759001,100.00,0.69,insn per cycle 2458033,,branches,8012770864,100.00,306.768,K/sec 102107,,branch-misses,8012781751,100.00,4.15,of all branches This option can be enabled in perf config by setting the variable 'stat.no-csv-summary'. # perf config stat.no-csv-summary=true # perf config -l stat.no-csv-summary=true # perf stat -x, -I1000 --interval-count 1 --summary 1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized 1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec 1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec 1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec 1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz 1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle 1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec 1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches 8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized 205,,context-switches,8013308394,100.00,25.583,/sec 10,,cpu-migrations,8013324681,100.00,1.248,/sec 0,,page-faults,8013340926,100.00,0.000,/sec 8027742,,cycles,8013344503,100.00,0.001,GHz 2871717,,instructions,8013356501,100.00,0.36,insn per cycle 553564,,branches,8013366204,100.00,69.081,K/sec 54021,,branch-misses,8013375952,100.00,9.76,of all branches Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf stat: Introduce 'bperf' to share hardware PMCs with BPFSong Liu2021-03-237-7/+679
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf tool uses performance monitoring counters (PMCs) to monitor system performance. The PMCs are limited hardware resources. For example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu. Modern data center systems use these PMCs in many different ways: system level monitoring, (maybe nested) container level monitoring, per process monitoring, profiling (in sample mode), etc. In some cases, there are more active perf_events than available hardware PMCs. To allow all perf_events to have a chance to run, it is necessary to do expensive time multiplexing of events. On the other hand, many monitoring tools count the common metrics (cycles, instructions). It is a waste to have multiple tools create multiple perf_events of "cycles" and occupy multiple PMCs. bperf tries to reduce such wastes by allowing multiple perf_events of "cycles" or "instructions" (at different scopes) to share PMUs. Instead of having each perf-stat session to read its own perf_events, bperf uses BPF programs to read the perf_events and aggregate readings to BPF maps. Then, the perf-stat session(s) reads the values from these BPF maps. Please refer to the comment before the definition of bperf_ops for the description of bperf architecture. bperf is off by default. To enable it, pass --bpf-counters option to perf-stat. bperf uses a BPF hashmap to share information about BPF programs and maps used by bperf. This map is pinned to bpffs. The default path is /sys/fs/bpf/perf_attr_map. The user could change the path with option --bpf-attr-map. Committer testing: # dmesg|grep "Performance Events" -A5 [ 0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver. [ 0.225280] ... version: 0 [ 0.225280] ... bit width: 48 [ 0.225281] ... generic registers: 6 [ 0.225281] ... value mask: 0000ffffffffffff [ 0.225281] ... max period: 00007fffffffffff # # for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done [1] 2436231 [2] 2436232 [3] 2436233 [4] 2436234 [5] 2436235 [6] 2436236 # perf stat -a -e cycles,instructions sleep 0.1 Performance counter stats for 'system wide': 310,326,987 cycles (41.87%) 236,143,290 instructions # 0.76 insn per cycle (41.87%) 0.100800885 seconds time elapsed # We can see that the counters were enabled for this workload 41.87% of the time. Now with --bpf-counters: # for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done [1] 2436514 [2] 2436515 [3] 2436516 [4] 2436517 [5] 2436518 [6] 2436519 [7] 2436520 [8] 2436521 [9] 2436522 [10] 2436523 [11] 2436524 [12] 2436525 [13] 2436526 [14] 2436527 [15] 2436528 [16] 2436529 [17] 2436530 [18] 2436531 [19] 2436532 [20] 2436533 [21] 2436534 [22] 2436535 [23] 2436536 [24] 2436537 [25] 2436538 [26] 2436539 [27] 2436540 [28] 2436541 [29] 2436542 [30] 2436543 [31] 2436544 [32] 2436545 # # ls -la /sys/fs/bpf/perf_attr_map -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map # bpftool map | grep bperf | wc -l 64 # # bpftool map | tail 1265: percpu_array name accum_readings flags 0x0 key 4B value 24B max_entries 1 memlock 4096B 1266: hash name filter flags 0x0 key 4B value 4B max_entries 1 memlock 4096B 1267: array name bperf_fo.bss flags 0x400 key 4B value 8B max_entries 1 memlock 4096B btf_id 996 pids perf(2436545) 1268: percpu_array name accum_readings flags 0x0 key 4B value 24B max_entries 1 memlock 4096B 1269: hash name filter flags 0x0 key 4B value 4B max_entries 1 memlock 4096B 1270: array name bperf_fo.bss flags 0x400 key 4B value 8B max_entries 1 memlock 4096B btf_id 997 pids perf(2436541) 1285: array name pid_iter.rodata flags 0x480 key 4B value 4B max_entries 1 memlock 4096B btf_id 1017 frozen pids bpftool(2437504) 1286: array flags 0x0 key 4B value 32B max_entries 1 memlock 4096B # # bpftool map dump id 1268 | tail value (CPU 21): 8f f3 bc ca 00 00 00 00 80 fd 2a d1 4d 00 00 00 80 fd 2a d1 4d 00 00 00 value (CPU 22): 7e d5 64 4d 00 00 00 00 a4 8a 2e ee 4d 00 00 00 a4 8a 2e ee 4d 00 00 00 value (CPU 23): a7 78 3e 06 01 00 00 00 b2 34 94 f6 4d 00 00 00 b2 34 94 f6 4d 00 00 00 Found 1 element # bpftool map dump id 1268 | tail value (CPU 21): c6 8b d9 ca 00 00 00 00 20 c6 fc 83 4e 00 00 00 20 c6 fc 83 4e 00 00 00 value (CPU 22): 9c b4 d2 4d 00 00 00 00 3e 0c df 89 4e 00 00 00 3e 0c df 89 4e 00 00 00 value (CPU 23): 18 43 66 06 01 00 00 00 5b 69 ed 83 4e 00 00 00 5b 69 ed 83 4e 00 00 00 Found 1 element # bpftool map dump id 1268 | tail value (CPU 21): f2 6e db ca 00 00 00 00 92 67 4c ba 4e 00 00 00 92 67 4c ba 4e 00 00 00 value (CPU 22): dc 8e e1 4d 00 00 00 00 d9 32 7a c5 4e 00 00 00 d9 32 7a c5 4e 00 00 00 value (CPU 23): bd 2b 73 06 01 00 00 00 7c 73 87 bf 4e 00 00 00 7c 73 87 bf 4e 00 00 00 Found 1 element # # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1 Performance counter stats for 'system wide': 119,410,122 cycles 152,105,479 instructions # 1.27 insn per cycle 0.101395093 seconds time elapsed # See? We had the counters enabled all the time. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kernel-team@fb.com Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf tools: Fix various typos in commentsIngo Molnar2021-03-2338-73/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf stat: Improve readability of shadow statsChangbin Du2021-03-153-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds function convert_unit_double() and selects appropriate unit for shadow stats between K/M/G. $ sudo perf stat -a -- sleep 1 Before: Unit 'M' is selected even the number is very small. Performance counter stats for 'system wide': 4,003.06 msec cpu-clock # 3.998 CPUs utilized 16,179 context-switches # 0.004 M/sec 161 cpu-migrations # 0.040 K/sec 4,699 page-faults # 0.001 M/sec 6,135,801,925 cycles # 1.533 GHz (83.21%) 5,783,308,491 stalled-cycles-frontend # 94.26% frontend cycles idle (83.21%) 4,543,694,050 stalled-cycles-backend # 74.05% backend cycles idle (66.49%) 4,720,130,587 instructions # 0.77 insn per cycle # 1.23 stalled cycles per insn (83.28%) 753,848,078 branches # 188.318 M/sec (83.61%) 37,457,747 branch-misses # 4.97% of all branches (83.48%) 1.001283725 seconds time elapsed After: $ sudo perf stat -a -- sleep 2 Performance counter stats for 'system wide': 8,005.52 msec cpu-clock # 3.999 CPUs utilized 10,715 context-switches # 1.338 K/sec 785 cpu-migrations # 98.057 /sec 102 page-faults # 12.741 /sec 1,948,202,279 cycles # 0.243 GHz 2,816,470,932 stalled-cycles-frontend # 144.57% frontend cycles idle 2,661,172,207 stalled-cycles-backend # 136.60% backend cycles idle 464,172,105 instructions # 0.24 insn per cycle # 6.07 stalled cycles per insn 91,567,662 branches # 11.438 M/sec 7,756,054 branch-misses # 8.47% of all branches 2.002040043 seconds time elapsed v2: o do not change 'sec' to 'cpu-sec'. o use convert_unit_double to implement convert_unit. Signed-off-by: Changbin Du <changbin.du@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210315143047.3867-1-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf evlist: Change the COMM when preparing the workloadArnaldo Carvalho de Melo2021-03-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was reported that --exclude-perf wasn't working, as tracepoints were appearing in 'perf script' output as having the 'perf' COMM, that is just the window in evlist__prepare_workload() after the fork() and before the execvp() call for workloads specified in the command line. Example: # perf record -e kmem:kmalloc --filter 'bytes_alloc<650 && bytes_alloc>620' --exclude-perf -e kmem:kfree --exclude-perf -aR sleep 30 Then: # perf script perf 15905 [009] 1498.356094: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [009] 1498.356116: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) perf 15905 [009] 1498.356116: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00 perf 15905 [009] 1498.356138: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [009] 1498.356148: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) perf 15905 [009] 1498.356148: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00 perf 15905 [009] 1498.356168: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [009] 1498.356176: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) <SNIP> perf 15905 [009] 1498.356348: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [014] 1498.356386: kmem:kfree: call_site=security_compute_sid.part.0+0x3b2 ptr=(nil) perf 15905 [014] 1498.356423: kmem:kfree: call_site=load_elf_binary+0x207 ptr=0xffff9cf5b2a34220 perf 15905 [014] 1498.356694: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf6d0b3b000 sleep 15905 [014] 1498.356739: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) Use prctl() to show that that is just the preparation of the workload: # perf script perf-exec 19036 [009] 2199.357582: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf786459800 perf-exec 19036 [009] 2199.357630: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) <SNIP> perf-exec 19036 [000] 2199.358277: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786fb9c00 perf-exec 19036 [000] 2199.358278: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458200 perf-exec 19036 [000] 2199.358279: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458600 sleep 19036 [000] 2199.358316: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) sleep 19036 [000] 2199.358323: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) sleep 19036 [000] 2199.358330: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 sleep 19036 [000] 2199.358337: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 sleep 19036 [000] 2199.358339: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 sleep 19036 [000] 2199.358341: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 Reporter: zhanweiw <wingfancy@hotmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212213 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf machine: Assign boolean values to a bool variableJiapeng Chong2021-03-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following coccicheck warnings: ./tools/perf/util/machine.c:2041:9-10: WARNING: return of 0/1 in function 'symbol__match_regex' with return type bool. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/1615284669-82139-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf stat: Fixup __perf_stat_evsel__is() prefixArnaldo Carvalho de Melo2021-03-092-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a perf_stat_evsel method, so should have that as its prefix, previously it was swapped as __perf_evsel_stat__is(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf symbols: Fix dso__fprintf_symbols_by_name() to return the number of ↵Arnaldo Carvalho de Melo2021-03-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | printed chars The 'ret' variable was initialized to zero but then it was not updated from the fprintf() return, fix it. Reported-by: Yang Li <yang.lee@linux.alibaba.com> cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> cc: Ingo Molnar <mingo@redhat.com> cc: Jiri Olsa <jolsa@redhat.com> cc: Mark Rutland <mark.rutland@arm.com> cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Fixes: 90f18e63fbd00513 ("perf symbols: List symbols in a dso in ascending name order") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo2021-03-0811-21/+74
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick up the fixes sent for v5.12 and continue development based on v5.12-rc2, i.e. without the swap on file bug. This also gets a slightly newer and better tools/perf/arch/arm/util/cs-etm.c patch version, using the BIT() macro, that had already been slated to v5.13 but ended up going to v5.12-rc1 on an older version. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>