summaryrefslogtreecommitdiffstats
path: root/tools/perf/util
Commit message (Collapse)AuthorAgeFilesLines
* perf buildid-cache: Don't skip 16-byte build-idsNicholas Fraser2021-02-182-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | lsdir_bid_tail_filter() ignored any build-id that wasn't exactly 20 bytes. This worked only for SHA-1 build-ids. The build-id for a PE file is always a 16-byte GUID and ELF files can also have MD5 or UUID build-ids. This fix changes the filter to allow build-ids between 16 and 20 bytes. Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/597788e4-661d-633f-857c-3de700115d02@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf symbol: Remove redundant libbfd checksNicholas Fraser2021-02-181-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | This removes the redundant checks bfd_check_format() and bfd_target_elf_flavour. They were previously checking different files. Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: https://lore.kernel.org/r/94758ca1-0031-d7c6-6c6a-900fd77ef695@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix arm64 build error with gcc-11Jianlin Lv2021-02-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc version: 11.0.0 20210208 (experimental) (GCC) Following build error on arm64: ....... In function ‘printf’, inlined from ‘regs_dump__printf’ at util/session.c:1141:3, inlined from ‘regs__printf’ at util/session.c:1169:2: /usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \ error: ‘%-5s’ directive argument is null [-Werror=format-overflow=] 107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \ __va_arg_pack ()); ...... In function ‘fprintf’, inlined from ‘perf_sample__fprintf_regs.isra’ at \ builtin-script.c:622:14: /usr/include/aarch64-linux-gnu/bits/stdio2.h:100:10: \ error: ‘%5s’ directive argument is null [-Werror=format-overflow=] 100 | return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt, 101 | __va_arg_pack ()); cc1: all warnings being treated as errors ....... This patch fixes Wformat-overflow warnings. Add helper function to convert NULL to "unknown". Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: iecedge@gmail.com Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: http://lore.kernel.org/lkml/20210218031245.2078492-1-Jianlin.Lv@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Split VM-Entry and VM-Exit branchesAdrian Hunter2021-02-181-1/+21
| | | | | | | | | | | | | | | | | | Events record a single cpumode so the tools cannot handle a branch from the host machine to a virtual machine, or vice versa. Split it in two so that each branch can have a different cpumode. E.g. host ip -> guest ip becomes: host ip -> 0 0 -> guest ip Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Adjust sample flags for VM-ExitAdrian Hunter2021-02-181-4/+7
| | | | | | | | | | | | | | Use the change of NR to detect whether an asynchronous branch is a VM-Exit. Note VM-Entry is determined from the vmlaunch or vmresume instruction, in which case, sample flags will show "VMentry" even if the VM-Entry fails. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Allow for a guest kernel address filterAdrian Hunter2021-02-181-1/+7
| | | | | | | | | | | | | Handling TIP.PGD for an address filter for a guest kernel is the same as a host kernel, but user space decoding, and hence address filters, are not supported. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Support decoding of guest kernelAdrian Hunter2021-02-181-12/+69
| | | | | | | | | | | | | | | | | | The guest kernel can be found from any guest thread belonging to the guest machine. The guest machine is associated with the current host process pid. An idle thread (pid=tid=0) is created as a vehicle from which to find the guest kernel map. Decoding guest user space is not supported. Synthesized samples just need the cpumode set for the guest. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf machine: Factor out machine__idle_thread()Adrian Hunter2021-02-183-22/+22
| | | | | | | | | | | | | | Factor out machine__idle_thread() so it can be re-used for guest machines. A thread is needed to find executable code, even for the guest kernel. To avoid possible future pid number conflicts, the idle thread can be used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf machine: Factor out machines__find_guest()Adrian Hunter2021-02-183-6/+11
| | | | | | | | | | | Factor out machines__find_guest() so it can be re-used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Amend decoder to track the NR flagAdrian Hunter2021-02-182-9/+53
| | | | | | | | | | | | | | | | | | | | | | | The PIP packet NR (non-root) flag indicates whether or not a virtual machine is being traced (NR=1 => VM). Add support for tracking its value. In particular note that the PIP packet (outside of PSB+) will be associated with a TIP packet from which address the NR value takes effect. At that point, there is a branch from_ip, to_ip with corresponding from_nr and to_nr. In the event of VM-Entry failure, there should still PIP and TIP packets that can be followed in the same way. Also note that this assumes that a host VMM is not employing VMX controls that affect Intel PT, e.g. to hide the host from a guest using Intel PT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Retain the last PIP packet payload as isAdrian Hunter2021-02-184-16/+14
| | | | | | | | | | | | Retain the PIP packet payload as is, instead of just the CR3, because it contains also the VMX NR flag which is needed to track VM-Entry. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel_pt: Add vmlaunch and vmresume as branchesAdrian Hunter2021-02-182-0/+16
| | | | | | | | | | | | | | In preparation to support Intel PT decoding of virtual machine traces, add vmlaunch and vmresume as branch instructions. Note, sample flags will show "VMentry" even if the VM-Entry fails. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf script: Add branch types for VM-Entry and VM-ExitAdrian Hunter2021-02-182-1/+7
| | | | | | | | | | | | | | | In preparation to support Intel PT decoding of virtual machine traces, add branch types for VM-Entry and VM-Exit. Note they are both treated as "calls" because the VM-Exit transfers control to a different address. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf auxtrace: Automatically group aux-output eventsAdrian Hunter2021-02-182-0/+21
| | | | | | | | | | | | | | | | | | | aux-output events need to have an AUX area event as the group leader. However, grouping events does not allow the AUX area event to be given an address filter because the --filter option must come after the event, which conflicts with the grouping syntax. To allow filtering in that case, automatically create a group since that is the requirement anyway. Example: (requires Intel Tremont) perf record -c 500 -e 'intel_pt//u' --filter 'filter main @ /bin/ls' -e 'cycles/aux-output/pp' ls Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210121140418.14705-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processingKan Liang2021-02-183-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction latency. Current perf forces the var2_w to the data->ins_lat in the generic code. It works well for now because X86 is the only architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may bring problems once other architectures support the sample type. For example, the var2_w may be used to capture something else on PowerPC. Create two architecture specific functions to parse and synthesize the weight related samples. Move the X86 specific codes to the X86 version functions. Other architectures can implement their own functions later separately. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Add PSB eventsAdrian Hunter2021-02-184-51/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis of code with Intel PT, it is useful to know if a timing bubble was caused by Intel PT or not. Add reporting of PSB events via perf script. PSB events are printed with the existing itrace 'p' option which also prints power and frequency changes. The PSB event contains the trace offset at which the PSB occurs, to allow easy reference back to the PSB+ packets. The PSB event timestamp is always the timestamp from the PSB+ TSC packet, and the ip is always the address from the PSB+ FUP packet. The code changes are non-trivial because the decoder must walk to the PSB+ FUP address before outputting the PSB event. Example: $ perf record -e intel_pt/cyc,psb_period=0/u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB perf.data ] $ perf script --itrace=p --ns perf 17981 [006] 25617.510820383: psb: psb offs: 0 0 [unknown] ([unknown]) perf 17981 [006] 25617.510820383: cbr: cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) uname 17981 [006] 25617.510889753: psb: psb offs: 0xb50 7f78c12a212e __GI___tunables_init+0xee (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510899162: psb: psb offs: 0x12d0 7f78c128af1c dl_main+0x93c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510939242: psb: psb offs: 0x1a50 7f78c128eefc _dl_map_object_from_fd+0x13c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510981274: psb: psb offs: 0x21c8 7f78c1296307 _dl_relocate_object+0x927 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510993034: psb: psb offs: 0x2948 7f78c12940e4 _dl_lookup_symbol_x+0x14 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511003871: psb: psb offs: 0x30c8 7f78c12937b3 do_lookup_x+0x2f3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511019854: psb: psb offs: 0x3850 7f78c1295eed _dl_relocate_object+0x50d (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511029015: psb: psb offs: 0x4390 7f78c12a855a strcmp+0xf6a (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511064876: psb: psb offs: 0x4b10 0 [unknown] ([unknown]) uname 17981 [006] 25617.511080762: psb: psb offs: 0x5290 7f78c11db53d _dl_addr+0x13d (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511086035: psb: psb offs: 0x5a08 7f78c11db538 _dl_addr+0x138 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511091381: psb: psb offs: 0x6190 7f78c11db534 _dl_addr+0x134 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511096681: psb: psb offs: 0x6910 7f78c11db4c3 _dl_addr+0xc3 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511119520: psb: psb offs: 0x7090 7f78c10ada5e _nl_intern_locale_data+0x12e (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511126584: psb: psb offs: 0x7818 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511132775: psb: psb offs: 0x8358 7f78c10c20c0 getenv+0xa0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511134598: psb: psb offs: 0x8ad0 7f78c10ada09 _nl_intern_locale_data+0xd9 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511135685: psb: psb offs: 0x9258 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511138322: psb: psb offs: 0x99d0 7f78c11fffd9 __strncmp_avx2+0x39 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511158907: psb: psb offs: 0xa150 0 [unknown] ([unknown]) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Fix IPC with CYC thresholdAdrian Hunter2021-02-183-0/+41
| | | | | | | | | | | | | The code assumed every CYC-eligible packet has a CYC packet, which is not the case when CYC thresholds are used. Fix by checking if a CYC packet is actually present in that case. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Fix premature IPCAdrian Hunter2021-02-183-11/+17
| | | | | | | | | | | | | | | The code assumed a change in cycle count means accurate IPC. That is not correct, for example when sampling both branches and instructions, or at a FUP packet (which is not CYC-eligible) address. Fix by using an explicit flag to indicate when IPC can be sampled. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20210205175350.23817-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Fix missing CYC processing in PSBAdrian Hunter2021-02-181-0/+3
| | | | | | | | | | | | | Add missing CYC packet processing when walking through PSB+. This improves the accuracy of timestamps that follow PSB+, until the next MTC. Fixes: 3d49807870f08 ("perf tools: Add new Intel PT packet definitions") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf unwind: Set userdata for all __report_module() pathsDave Rigby2021-02-181-3/+8
| | | | | | | | | | | | | | | | | | | | When locating the DWARF module for a given address, __find_debuginfo() requires a 'struct dso' passed via the userdata argument. However, this field is only set in __report_module() if the module is found in via dwfl_addrmodule(), not if it is found later via dwfl_report_elf(). Set userdata irrespective of how the DWARF module was found, as long as we found a module. Fixes: bf53fc6b5f41 ("perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder") Signed-off-by: Dave Rigby <d.rigby@me.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211801 Acked-by: Jan Kratochvil <jan.kratochvil@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/linux-perf-users/20210218165654.36604-1-d.rigby@me.com/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Fix continue profiling after draining the bufferYang Jihong2021-02-182-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit da231338ec9c0987 ("perf record: Use an eventfd to wakeup when done") uses eventfd() to solve a rare race where the setting and checking of 'done' which add done_fd to pollfd. When draining buffer, revents of done_fd is 0 and evlist__filter_pollfd function returns a non-zero value. As a result, perf record does not stop profiling. The following simple scenarios can trigger this condition: # sleep 10 & # perf record -p $! After the sleep process exits, perf record should stop profiling and exit. However, perf record keeps running. If pollfd revents contains only POLLERR or POLLHUP, perf record indicates that buffer is draining and need to stop profiling. Use fdarray_flag__nonfilterable() to set done eventfd to nonfilterable objects, so that evlist__filter_pollfd() does not filter and check done eventfd. Fixes: da231338ec9c0987 ("perf record: Use an eventfd to wakeup when done") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: zhangjinhao2@huawei.com Link: http://lore.kernel.org/lkml/20210205065001.23252-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Simplify the calculation of variablesJiapeng Chong2021-02-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following coccicheck warnings: ./tools/perf/util/header.c:3809:18-20: WARNING !A || A && B is equivalent to !A || B. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/1612497255-87189-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf metricgroup: Remove unneeded semicolonYang Li2021-02-171-1/+1
| | | | | | | | | | | | | | | | Eliminate the following coccicheck warning: ./tools/perf/util/metricgroup.c:382:3-4: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yang li <yang.lee@linux.alibaba.com> Link: http://lore.kernel.org/lkml/1612165277-95878-1-git-send-email-yang.lee@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add OCaml demanglingFabian Hemmer2021-02-176-2/+106
| | | | | | | | | | | | | | | | | | | | | | | | Detect symbols generated by the OCaml compiler based on their prefix. Demangle OCaml symbols, returning a newly allocated string (like the existing Java demangling functionality). Move a helper function (hex) from tests/code-reading.c to util/string.c To test: echo 'Printf.printf "%d\n" (Random.int 42)' > test.ml perf record ocamlopt.opt test.ml perf report -d ocamlopt.opt Signed-off-by: Fabian Hemmer <copy@copy.sh> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> LPU-Reference: 20210203211537.b25ytjb6dq5jfbwx@nyu Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf symbols: Resolve symbols against debug file firstJiri Slaby2021-02-171-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With LTO, there are symbols like these: /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug 10305: 0000000000955fa4 0 NOTYPE LOCAL DEFAULT 29 Predicate.cpp.2bc410e7 This comes from a runtime/debug split done by the standard way: objcopy --only-keep-debug $runtime $debug objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line --strip-all $runtime perf currently cannot resolve such symbols (relicts of LTO), as section 29 exists only in the debug file (29 is .debug_info). And perf resolves symbols only against runtime file. This results in all symbols from such a library being unresolved: 0.38% main2 libantlr4-runtime.so.4.8 [.] 0x00000000000671e0 So try resolving against the debug file first. And only if it fails (the section has NOBITS set), try runtime file. We can do this, as "objcopy --only-keep-debug" per documentation preserves all sections, but clears data of some of them (the runtime ones) and marks them as NOBITS. The correct result is now: 0.38% main2 libantlr4-runtime.so.4.8 [.] antlr4::IntStream::~IntStream Note that these LTO symbols are properly skipped anyway as they belong neither to *text* nor to *data* (is_label && !elf_sec__filter(&shdr, secstrs) is true). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210217122125.26416-1-jslaby@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge branch 'perf/urgent' into perf/coreArnaldo Carvalho de Melo2021-02-164-10/+24
|\ | | | | | | | | | | To get some fixes that didn't made into 5.11. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf probe: Fix kretprobe issue caused by GCC bugJianlin Lv2021-02-121-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf failed to add a kretprobe event with debuginfo of vmlinux which is compiled by gcc with -fpatchable-function-entry option enabled. The same issue with kernel module. Issue: # perf probe -v 'kernel_clone%return $retval' ...... Writing event: r:probe/kernel_clone__return _text+599624 $retval Failed to write event: Invalid argument Error: Failed to add events. Reason: Invalid argument (Code: -22) # cat /sys/kernel/debug/tracing/error_log [156.75] trace_kprobe: error: Retprobe address must be an function entry Command: r:probe/kernel_clone__return _text+599624 $retval ^ # llvm-dwarfdump vmlinux |grep -A 10 -w 0x00df2c2b 0x00df2c2b: DW_TAG_subprogram DW_AT_external (true) DW_AT_name ("kernel_clone") DW_AT_decl_file ("/home/code/linux-next/kernel/fork.c") DW_AT_decl_line (2423) DW_AT_decl_column (0x07) DW_AT_prototyped (true) DW_AT_type (0x00dcd492 "pid_t") DW_AT_low_pc (0xffff800010092648) DW_AT_high_pc (0xffff800010092b9c) DW_AT_frame_base (DW_OP_call_frame_cfa) # cat /proc/kallsyms |grep kernel_clone ffff800010092640 T kernel_clone # readelf -s vmlinux |grep -i kernel_clone 183173: ffff800010092640 1372 FUNC GLOBAL DEFAULT 2 kernel_clone # objdump -d vmlinux |grep -A 10 -w \<kernel_clone\>: ffff800010092640 <kernel_clone>: ffff800010092640: d503201f nop ffff800010092644: d503201f nop ffff800010092648: d503233f paciasp ffff80001009264c: a9b87bfd stp x29, x30, [sp, #-128]! ffff800010092650: 910003fd mov x29, sp ffff800010092654: a90153f3 stp x19, x20, [sp, #16] The entry address of kernel_clone converted by debuginfo is _text+599624 (0x92648), which is consistent with the value of DW_AT_low_pc attribute. But the symbolic address of kernel_clone from /proc/kallsyms is ffff800010092640. This issue is found on arm64, -fpatchable-function-entry=2 is enabled when CONFIG_DYNAMIC_FTRACE_WITH_REGS=y; Just as objdump displayed the assembler contents of kernel_clone, GCC generate 2 NOPs at the beginning of each function. kprobe_on_func_entry detects that (_text+599624) is not the entry address of the function, which leads to the failure of adding kretprobe event. kprobe_on_func_entry ->_kprobe_addr ->kallsyms_lookup_size_offset ->arch_kprobe_on_func_entry // FALSE The cause of the issue is that the first instruction in the compile unit indicated by DW_AT_low_pc does not include NOPs. This issue exists in all gcc versions that support -fpatchable-function-entry option. I have reported it to the GCC community: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 Currently arm64 and PA-RISC may enable fpatchable-function-entry option. The kernel compiled with clang does not have this issue. FIX: This GCC issue only cause the registration failure of the kretprobe event which doesn't need debuginfo. So, stop using debuginfo for retprobe. map will be used to query the probe function address. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: http://lore.kernel.org/lkml/20210210062646.2377995-1-Jianlin.Lv@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf symbols: Fix return value when loading PE DSONicholas Fraser2021-02-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first time dso__load() was called on a PE file it always returned -1 error. This caused the first call to map__find_symbol() to always fail on a PE file so the first sample from each PE file always had symbol <unknown>. Subsequent samples succeed however because the DSO is already loaded. This fixes dso__load() to return 0 when successfully loading a DSO with libbfd. Fixes: eac9a4342e5447ca ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/1671b43b-09c3-1911-dbf8-7f030242fbf7@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf symbols: Make dso__load_bfd_symbols() load PE files from debug cache onlyNicholas Fraser2021-02-121-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dso__load_bfd_symbols() attempts to load a DSO at its original path, then closes it and loads the file in the debug cache. This is incorrect. It should ignore the original file and work with only the debug cache. The original file may have changed or may not even exist, for example if the debug cache has been transferred to another machine via "perf archive". This fix makes it only load the file in the debug cache. Further notes from Nicholas: dso__load_bfd_symbols() is called in a loop from dso__load() for a variety of paths. These are generated by the various DSO_BINARY_TYPEs in the binary_type_symtab list at the top of util/symbol.c. In each case the debugfile passed to dso__load_bfd_symbols() is the path to try. One of those iterations (the first one I believe) passes the original path as the debugfile. If the file still exists at the original path, this is the one that ends up being used in case the debugcache was deleted or the PE file doesn't have a build-id. A later iteration (BUILD_ID_CACHE) passes debugfile as the file in the debugcache if it has a build-id. Even if the file was previously loaded at its original path, (if I understand correctly) this load will override it so the debugcache file ends up being used. Committer notes: So if it fails to find in the cache, it will eventually hope for the best and look at the path in the local filesystem, which in many cases is enough. At some point we need to switch from this "hope for the best" approach to one that warns the user that there is no guarantee, if no buildid is present, that just by looking at the pathname the symbolisation will work. Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/e58e1237-94ab-e1c9-a7b9-473531906954@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf symbols: Use (long) for iterator for bfd symbolsDmitry Safonov2021-02-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC (GCC) 8.4.0 20200304 fails to build perf with: : util/symbol.c: In function 'dso__load_bfd_symbols': : util/symbol.c:1626:16: error: comparison of integer expressions of different signednes : for (i = 0; i < symbols_count; ++i) { : ^ : util/symbol.c:1632:16: error: comparison of integer expressions of different signednes : while (i + 1 < symbols_count && : ^ : util/symbol.c:1637:13: error: comparison of integer expressions of different signednes : if (i + 1 < symbols_count && : ^ : cc1: all warnings being treated as errors It's unlikely that the symtable will be that big, but the fix is an oneliner and as perf has CORE_CFLAGS += -Wextra, which makes build to fail together with CORE_CFLAGS += -Werror Fixes: eac9a4342e54 ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Jacek Caban <jacek@codeweavers.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Link: http://lore.kernel.org/lkml/20210209145148.178702-1-dima@arista.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf annotate: Fix jump parsing for C++ code.Martin Liška2021-02-112-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Considering the following testcase: int foo(int a, int b) { for (unsigned i = 0; i < 1000000000; i++) a += b; return a; } int main() { foo (3, 4); return 0; } 'perf annotate' displays: 86.52 │40055e: → ja 40056c <foo(int, int)+0x26> 13.37 │400560: mov -0x18(%rbp),%eax │400563: add %eax,-0x14(%rbp) │400566: addl $0x1,-0x4(%rbp) 0.11 │40056a: → jmp 400557 <foo(int, int)+0x11> │40056c: mov -0x14(%rbp),%eax │40056f: pop %rbp and the 'ja 40056c' does not link to the location in the function. It's caused by fact that comma is wrongly parsed, it's part of function signature. With my patch I see: 86.52 │ ┌──ja 26 13.37 │ │ mov -0x18(%rbp),%eax │ │ add %eax,-0x14(%rbp) │ │ addl $0x1,-0x4(%rbp) 0.11 │ │↑ jmp 11 │26:└─→mov -0x14(%rbp),%eax and 'o' output prints: 86.52 │4005┌── ↓ ja 40056c <foo(int, int)+0x26> 13.37 │4005│0: mov -0x18(%rbp),%eax │4005│3: add %eax,-0x14(%rbp) │4005│6: addl $0x1,-0x4(%rbp) 0.11 │4005│a: ↑ jmp 400557 <foo(int, int)+0x11> │4005└─→ mov -0x14(%rbp),%eax On the contrary, compiling the very same file with gcc -x c, the parsing is fine because function arguments are not displayed: jmp 400543 <foo+0x1d> Committer testing: Before: $ cat cpp_args_annotate.c int foo(int a, int b) { for (unsigned i = 0; i < 1000000000; i++) a += b; return a; } int main() { foo (3, 4); return 0; } $ gcc --version |& head -1 gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) $ gcc -g cpp_args_annotate.c -o cpp_args_annotate $ perf record ./cpp_args_annotate [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ] $ perf annotate --stdio2 foo Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo>: foo(): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) ↓ jmp 1d a += b; 13.45 13: mov -0x18(%rbp),%eax add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.09 1d: cmpl $0x3b9ac9ff,-0x4(%rbp) 86.46 ↑ jbe 13 return a; mov -0x14(%rbp),%eax } pop %rbp ← retq $ I.e. works for C, now lets switch to C++: $ g++ -g cpp_args_annotate.c -o cpp_args_annotate $ perf record ./cpp_args_annotate [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ] $ perf annotate --stdio2 foo Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo(int, int)>: foo(int, int): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) cmpl $0x3b9ac9ff,-0x4(%rbp) 86.53 → ja 40112c <foo(int, int)+0x26> a += b; 13.32 mov -0x18(%rbp),%eax 0.00 add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.15 → jmp 401117 <foo(int, int)+0x11> return a; mov -0x14(%rbp),%eax } pop %rbp ← retq $ Reproduced. Now with this patch: Reusing the C++ built binary, as we can see here: $ readelf -wi cpp_args_annotate | grep producer <c> DW_AT_producer : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g $ And furthermore: $ file cpp_args_annotate cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped $ perf buildid-list -i cpp_args_annotate 4fe3cab260204765605ec630d0dc7a7e93c361a9 $ perf buildid-list | grep cpp_args_annotate 4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate $ It now works: $ perf annotate --stdio2 foo Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo(int, int)>: foo(int, int): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) 11: cmpl $0x3b9ac9ff,-0x4(%rbp) 86.53 ↓ ja 26 a += b; 13.32 mov -0x18(%rbp),%eax 0.00 add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.15 ↑ jmp 11 return a; 26: mov -0x14(%rbp),%eax } pop %rbp ← retq $ Signed-off-by: Martin Liška <mliska@suse.cz> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Link: http://lore.kernel.org/lkml/13e1a405-edf9-e4c2-4327-a9b454353730@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf arm-spe: Set sample's data source fieldLeo Yan2021-02-161-9/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sample structure contains the field 'data_src' which is used to tell the data operation attributions, e.g. operation type is loading or storing, cache level, it's snooping or remote accessing, etc. At the end, the 'data_src' will be parsed by perf mem/c2c tools to display human readable strings. This patch is to fill the 'data_src' field in the synthesized samples base on different types. Currently perf tool can display statistics for L1/L2/L3 caches but it doesn't support the 'last level cache'. To fit to current implementation, 'data_src' field uses L3 cache for last level cache. Before this commit, perf mem report looks like this: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 81.56% 61945 0 N/A [.] 0x00000000000009d8 serial_c [.] 0000000000000000 [unknown] N/A N/A 18.44% 14003 0 N/A [.] 0x0000000000000828 serial_c [.] 0000000000000000 [unknown] N/A N/A Now on a system with Arm SPE, addresses and access types are displayed: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 0.43% 324 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794e00 anon N/A Walker hit 0.42% 322 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794580 anon N/A Walker hit Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf arm-spe: Synthesize memory eventLeo Yan2021-02-161-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory event can deliver two benefits: - The first benefit is the memory event can give out global view for memory accessing, rather than organizing events with scatter mode (e.g. uses separate event for L1 cache, last level cache, etc) which which can only display a event for single memory type, memory events include all memory accessing so it can display the data accessing cross memory levels in the same view; - The second benefit is the sample generation might introduce a big overhead and need to wait for long time for Perf reporting, we can specify itrace option '--itrace=M' to filter out other events and only output memory events, this can significantly reduce the overhead caused by generating samples. This patch is to enable memory event for Arm SPE. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf arm-spe: Fill address info for samplesLeo Yan2021-02-161-20/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To properly handle memory and branch samples, this patch divides into two functions for generating samples: arm_spe__synth_mem_sample() is for synthesizing memory and TLB samples; arm_spe__synth_branch_sample() is to synthesize branch samples. Arm SPE backend decoder has passed virtual and physical address through packets, the address info is stored into the synthesize samples in the function arm_spe__synth_mem_sample(). Committer notes: Fixed this: 36 46.77 fedora:27 : FAIL clang version 5.0.2 (tags/RELEASE_502/final) util/arm-spe.c:269:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; ^ util/arm-spe.c:288:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; By using = { .ip = 0, }; Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf arm-spe: Store operation type in packetLeo Yan2021-02-122-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to store operation type in packet structure. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf arm-spe: Store memory address in packetLeo Yan2021-02-122-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to store virtual and physical memory addresses in packet, which will be used for memory samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf arm-spe: Enable sample type PERF_SAMPLE_DATA_SRCLeo Yan2021-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to enable sample type PERF_SAMPLE_DATA_SRC for Arm SPE in the perf data, when output the tracing data, it tells tools that it contains data source in the memory event. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-1-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf env: Remove unneeded internal/cpumap inclusionsIan Rogers2021-02-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor cleanup. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210211183914.4093187-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Remove unused xyarray.c as it was moved to tools/lib/perfIan Rogers2021-02-121-33/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrated to libperf in: 4b247fa7314ce482 ("libperf: Adopt xyarray class from perf") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210212043803.365993-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Replace lkml.org links with loreKees Cook2021-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org links with lore"), replace lkml.org links with lore to better use a single source that's more likely to stay available long-term. Signed-off-by: Kees Kook <keescook@chromium.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Kees Kook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: http://lore.kernel.org/lkml/20210210234220.2401035-1-keescook@chromium.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf script: Support filtering by hex addressJin Yao2021-02-083-1/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf script' supports '-S' or '--symbol' options to only list the records for these symbols. A symbol is typically a name or hex address. If it's hex address, it is the start address of one symbol. While it would be useful if we can filter trace records by any hex address (not only the start address of symbol). So now we support filtering trace records by more conditions, such as: - symbol name - start address of symbol - any hexadecimal address - address range The comparison order is defined as: 1. symbol name comparison 2. symbol start address comparison. 3. any hexadecimal address comparison. 4. address range comparison. The idea is if we can get a valid address from -S list, we add the address to addr_list for address comparison otherwise we still leave it to sym_list for symbol comparison. Some examples: root@kbl-ppc:~# ./perf script -S ffffffff9a477308 perf 8562 [000] 347303.578858: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [000] 347303.578860: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [000] 347303.578861: 11 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [001] 347303.578903: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [001] 347303.578905: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [001] 347303.578906: 15 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [002] 347303.578952: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [002] 347303.578953: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) Filter the traced records by hex address ffffffff9a477308. root@kbl-ppc:~# ./perf script -S ffffffff9a4dd4ce,ffffffff9a4d2de9,ffffffff9a6bf9f4 perf 8562 [001] 347303.578911: 311706 cycles: ffffffff9a6bf9f4 __kmalloc_node+0x204 ([kernel.kallsyms]) perf 8562 [002] 347303.578960: 354477 cycles: ffffffff9a4d2de9 sched_setaffinity+0x49 ([kernel.kallsyms]) perf 8562 [003] 347303.579015: 450958 cycles: ffffffff9a4dd4ce dequeue_task_fair+0x1ae ([kernel.kallsyms]) Filter the traced records by hex address ffffffff9a4dd4ce, ffffffff9a4d2de9, ffffffff9a6bf9f4. root@kbl-ppc:~# ./perf script -S ffffffff9a477309 --addr-range 16 perf 8562 [000] 347303.578863: 291 cycles: ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms]) perf 8562 [001] 347303.578907: 411 cycles: ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms]) perf 8562 [002] 347303.578956: 462 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [003] 347303.579010: 497 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [004] 347303.579059: 429 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [005] 347303.579109: 408 cycles: ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms]) perf 8562 [006] 347303.579159: 460 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [007] 347303.579213: 436 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) Filter the traced records from address range [ffffffff9a477309, ffffffff9a477309 + 15]. root@kbl-ppc:~# ./perf script -S "ffffffff9b163046,rcu_nmi_exit" perf 8562 [004] 347303.579060: 12013 cycles: ffffffff9b163046 exc_nmi+0x166 ([kernel.kallsyms]) perf 8562 [007] 347303.579214: 12138 cycles: ffffffff9b165944 rcu_nmi_exit+0x34 ([kernel.kallsyms]) Filter by address + symbol Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210207080935.31784-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf intlist: Change 'struct intlist' int member to 'unsigned long'Jin Yao2021-02-083-17/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is to let intlist support addresses as its payload. One potential problem is it can't support negative number. But so far, there is no such kind of use case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210207080935.31784-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf stat: Use nftw() instead of ftw()Paul Cercueil2021-02-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftw() has been obsolete for about 12 years now. Committer notes: Further notes provided by the patch author: "NOTE: Not runtime-tested, I have no idea what I need to do in perf to test this. But at least it compiles now with my uClibc-based toolchain." I looked at the nftw()/ftw() man page and for the use made with cgroups in 'perf stat' the end result is equivalent. Fixes: bb1c15b60b98 ("perf stat: Support regex pattern in --for-each-cgroup") Signed-off-by: Paul Cercueil <paul@crapouillou.net> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: od@zcrc.me Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210208181157.1324550-1-paul@crapouillou.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf stat: Support L2 Topdown eventsKan Liang2021-02-083-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TMA method level 2 metrics is supported from the Intel Sapphire Rapids server, which expose four L2 Topdown metrics events to user space. There are eight L2 events in total. The other four L2 Topdown metrics events are calculated from the corresponding L1 and the exposed L2 events. Now, the --topdown prints the complete top-down metrics that supported by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events and 8 L2 events displyed in one line. Add a new option, --td-level, to display the top-down statistics that equal to or lower than the input level. The L2 event is marked only when both its L1 parent event and itself crosse the threshold. Here is an example: $ perf stat --topdown --td-level=2 --no-metric-only sleep 1 Topdown accuracy may decrease when measuring long periods. Please print the result regularly, e.g. -I1000 Performance counter stats for 'sleep 1': 16,734,390 slots 2,100,001 topdown-retiring # 12.6% retiring 2,034,376 topdown-bad-spec # 12.3% bad speculation 4,003,128 topdown-fe-bound # 24.1% frontend bound 328,125 topdown-heavy-ops # 2.0% heavy operations # 10.6% light operations 1,968,751 topdown-br-mispredict # 11.9% branch mispredict # 0.4% machine clears 2,953,127 topdown-fetch-lat # 17.8% fetch latency # 6.3% fetch bandwidth 5,906,255 topdown-mem-bound # 35.6% memory bound # 15.4% core bound Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Support instruction latencyKan Liang2021-02-089-10/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The instruction latency information can be recorded on some platforms, e.g., the Intel Sapphire Rapids server. With both memory latency (weight) and the new instruction latency information, users can easily locate the expensive load instructions, and also understand the time spent in different stages. The users can optimize their applications in different pipeline stages. The 'weight' field is shared among different architectures. Reusing the 'weight' field may impacts other architectures. Add a new field to store the instruction latency. Like the 'weight' support, introduce a 'ins_lat' for the global instruction latency, and a 'local_ins_lat' for the local instruction latency version. Add new sort functions, INSTR Latency and Local INSTR Latency, accordingly. Add local_ins_lat to the default_mem_sort_order[]. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Support PERF_SAMPLE_WEIGHT_STRUCTKan Liang2021-02-086-10/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. The new sample type shares the same space as the PERF_SAMPLE_WEIGHT sample type. The lower 32 bits are exactly the same for both sample type. The higher 32 bits may be different for different architecture. Add arch specific arch_evsel__set_sample_weight() to set the new sample type for X86. Only store the lower 32 bits for the sample->weight if the new sample type is applied. In practice, no memory access could last than 4G cycles. No data will be lost. If the kernel doesn't support the new sample type. Fall back to the PERF_SAMPLE_WEIGHT sample type. There is no impact for other architectures. Committer notes: Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core but not upstream yet. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf c2c: Support data block and addr blockKan Liang2021-02-082-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf c2c' is also a memory profiling tool. Apply the two new data source fields to 'perf c2c' as well. Extend 'perf c2c' to display the number of loads which blocked by data or address conflict. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Support data block and addr blockKan Liang2021-02-086-1/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two new data source fields, to indicate the block reasons of a load instruction, are introduced on the Intel Sapphire Rapids server. The fields can be used by the memory profiling. Add a new sort function, SORT_MEM_BLOCKED, for the two fields. For the previous platforms or the block reason is unknown, print "N/A" for the block reason. Add blocked as a default mem sort key for perf report and perf mem report. Committer testing: So in machines without this capability we get a "N/A" filling the new "Blocked" column: $ perf mem record ls arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools virt [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ] $ $ perf mem report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu' # Total weight : 1381 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... ....... # 32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A 25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A 22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A 8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A 6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A 3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A # Samples: 11 of event 'cpu/mem-stores/Pu' # Total weight : 11 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ ............. ....................... ............. ...................... ........... ..... .......... ...... ....... # 9.09% 1 0 L1 hit [.] __strcoll_l libc-2.31.so [.] 0x00007fffe5648fc8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_lookup_symbol_x ld-2.31.so [.] 0x00007fffe56490b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_name_match_p ld-2.31.so [.] 0x00007fffe56487d8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_start ld-2.31.so [.] start_time+0x0 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_sysdep_start ld-2.31.so [.] 0x00007fffe56494b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5648ff8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649064 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649130 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xaf8 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xc28 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] 0x00007fffe56495b8 [stack] N/A N/A N/A N/A # (Tip: Show user configuration overrides: perf config --user --list) $ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Support the auxiliary eventKan Liang2021-02-085-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the Intel Sapphire Rapids server, an auxiliary event has to be enabled simultaneously with the load latency event to retrieve complete Memory Info. Add X86 specific perf_mem_events__name() to handle the auxiliary event. - Users are only interested in the samples of the mem-loads event. Sample read the auxiliary event. - The auxiliary event must be in front of the load latency event in a group. Assume the second event to sample if the auxiliary event is the leader. - Add a weak is_mem_loads_aux_event() to check the auxiliary event for X86. For other ARCHs, it always return false. Parse the unique event name, mem-loads-aux, for the auxiliary event. Committer notes: According to 61b985e3e775a3a7 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids"), ENODATA is only returned by sys_perf_event_open() when used with these auxiliary events, with this in evsel__open_strerror(): case ENODATA: return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. " "Please add an auxiliary event in front of the load latency event."); This is Ok at this point in time, but fragile long term, I pointed this out in the e-mail thread, requesting a follow up patch to check if ENODATA is really for this specific case. Fixed up sizeof(MEM_LOADS_AUX_NAME) bug pointed out by Namhyung. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210205152648.GC920417@kernel.org Link: http://lore.kernel.org/lkml/1612296553-21962-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf probe: Add protection to avoid endless loopJianlin Lv2021-02-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if dwarf_offdie() returns NULL, the continue statement forces the next iteration of the loop without updating the 'off' variable. It will cause an endless loop in the process of traversing the compile unit. So add exception protection for looping CUs. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: jianlin.lv@arm.com Link: http://lore.kernel.org/lkml/20210203145702.1219509-1-Jianlin.Lv@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>