| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b81162302001f41157f6e93654aaccc30e817e2a upstream.
We'll need to check if an warning option introduced in clang 19 is
available on the clang version being used, so cover the error message
emitted when testing for a -W option.
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9af2efee41b27a0f386fb5aa95d8d0b4b5d9fede upstream.
The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.
So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage. So it shouldn't access it
unconditionally.
I got a segfault, when I wanted to see cgroup profiles.
$ sudo perf record -a --all-cgroups --synth=cgroup true
$ sudo perf report -s cgroup
Program received signal SIGSEGV, Segmentation fault.
0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
48 return RC_CHK_ACCESS(map)->dso;
(gdb) bt
#0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
#1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
#2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
#3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
at util/hist.c:644
#4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
#5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
#6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
#7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
at util/hist.c:1260
#8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
machine=0x5555560388e8) at builtin-report.c:334
#9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
at util/session.c:780
#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
file_path=0x555556038ff0 "perf.data") at util/session.c:1406
As you can see the entry->ms.map was NULL even if he->ms.map has a
value. This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same. I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).
Fixes: ac01c8c4246546fd ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit ac01c8c4246546fd8340a232f3ada1921dc0ee48 upstream.
AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.
==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
READ of size 1 at 0x60b00c48844b thread T193
#0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
#1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
#2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
#3 0x5650d804568f in __hists__add_entry util/hist.c:754
#4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
#5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
#6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
#7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
#8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
#9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
#10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
#11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
#12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
#13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().
While this bug was probably introduced with 5c24b67aae72f54c ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e9480cf33672 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.
Fixes: c087e9480cf33672 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <yunzhao@cloudflare.com>
Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com>
Cc: Ian Rogers <irogers@google.com>
Cc: kernel-team@cloudflare.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org # v5.13+
Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 00dc514612fe98cfa117193b9df28f15e7c9db9c upstream.
The -Wcast-function-type-mismatch option was introduced in clang 19 and
its enabled by default, since we use -Werror, and python bindings do
casts that are valid but trips this warning, disable it if present.
Closes: https://lore.kernel.org/all/CA+icZUXoJ6BS3GMhJHV3aZWyb5Cz2haFneX0C5pUMUUhG-UVKQ@mail.gmail.com
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org # To allow building with the upcoming clang 19
Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 599c19397b17d197fc1184bbc950f163a292efc9 ]
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps
and map members are reference counted. Ensure these values use a _get
routine to increment the reference counts and use map_symbol__exit() to
release the reference counts.
Do similar for 'struct thread's prev_lbr_cursor, but save the size of
the prev_lbr_cursor array so that it may be iterated.
Ensure that when stitch_nodes are placed on the free list the
map_symbols are exited.
Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to
list_splice_init(), so the whole list is moved and nodes aren't leaked.
A reproduction of the memory leaks is possible with a leak sanitizer
build in the perf report command of:
```
$ perf record -e cycles --call-graph lbr perf test -w thloop
$ perf report --stitch-lbr
```
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack")
Signed-off-by: Ian Rogers <irogers@google.com>
[ Basic tests after applying the patch, repeating the example above ]
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5ad7db2c3f941cde3045ce38a9c4c40b0c7d56b9 ]
The p-core mem events are missed when launching 'perf mem record' on ADL
and RPL.
root@number:~# perf mem record sleep 1
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.032 MB perf.data ]
root@number:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
A variable 'record' in the 'struct perf_mem_event' is to indicate
whether a mem event in a mem_events[] should be recorded. The current
code only configure the variable for the first eligible PMU.
It's good enough for a non-hybrid machine or a hybrid machine which has
the same mem_events[].
However, if a different mem_events[] is used for different PMUs on a
hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never
get a chance to be set.
The mem_events[] of the second PMU are always ignored.
'perf mem' doesn't support the per-PMU configuration now. A per-PMU
mem_events[] 'record' variable doesn't make sense. Make it global.
That could also avoid searching for the per-PMU mem_events[] via
perf_pmu__mem_events_ptr every time.
Committer testing:
root@number:~# perf evlist -g
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
cpu_core/mem-stores/P
dummy:u
root@number:~#
The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
not being added by 'perf evlist -g', to be checked.
Fixes: abbdd79b786e036e ("perf mem: Clean up perf_mem_events__name()")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
Link: https://lore.kernel.org/r/20240905170737.4070743-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6e05d28ff232cf445cc6ae59336b7f2081ef9b96 ]
The current perf_pmu__mem_events_init() only checks the availability of
the mem_events for the first eligible PMU. It works for non-hybrid
machines and hybrid machines that have the same mem_events.
However, it may bring issues if a hybrid machine has a different
mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A
mem-loads-aux event is only required for the p-core. The mem_events on
both e-core and p-core should be checked and marked.
The issue was not found, because it's hidden by another bug, which only
records the mem-events for the e-core. The wrong check for the p-core
events didn't yell.
Fixes: abbdd79b786e036e ("perf mem: Clean up perf_mem_events__name()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 38e2648a81204c9fc5b4c87a8ffce93a6ed91b65 ]
The "time utils" test fails in 32-bit builds:
...
parse_nsec_time("18446744073.709551615")
Failed. ptime 4294967295709551615 expected 18446744073709551615
...
Switch strtoul to strtoull as an unsigned long in 32-bit build isn't
64-bits.
Fixes: c284d669a20d408b ("perf tools: Move parse_nsec_time to time-utils.c")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sched_in time
[ Upstream commit 39c243411bdb8fb35777adf49ee32549633c4e12 ]
If sched_in event for current task is not recorded, sched_in timestamp
will be set to end_time of time window interest, causing an error in
timestamp show. In this case, we choose to ignore this event.
Test scenario:
perf[1229608] does not record the first sched_in event, run time and sch delay are both 0
# perf sched timehist
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
2090450.763231 [0000] perf[1229608] 0.000 0.000 0.000
2090450.763235 [0000] migration/0[15] 0.000 0.001 0.003
2090450.763263 [0001] perf[1229608] 0.000 0.000 0.000
2090450.763268 [0001] migration/1[21] 0.000 0.001 0.004
2090450.763302 [0002] perf[1229608] 0.000 0.000 0.000
2090450.763309 [0002] migration/2[27] 0.000 0.001 0.007
2090450.763338 [0003] perf[1229608] 0.000 0.000 0.000
2090450.763343 [0003] migration/3[33] 0.000 0.001 0.004
Before:
arbitrarily specify a time window of interest, timestamp will be set to an incorrect value
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
200.000000 [0000] perf[1229608] 0.000 0.000 0.000
200.000000 [0001] perf[1229608] 0.000 0.000 0.000
200.000000 [0002] perf[1229608] 0.000 0.000 0.000
200.000000 [0003] perf[1229608] 0.000 0.000 0.000
200.000000 [0004] perf[1229608] 0.000 0.000 0.000
200.000000 [0005] perf[1229608] 0.000 0.000 0.000
200.000000 [0006] perf[1229608] 0.000 0.000 0.000
200.000000 [0007] perf[1229608] 0.000 0.000 0.000
After:
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
Fixes: 853b74071110bed3 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit a11b4222bb579dcf9646f3c4ecd2212ae762a2c8 ]
The __die_find_member_offset_cb() missed to handle bitfield members
which don't have DW_AT_data_member_location. Like in adding member
types in __add_member_cb() it should fallback to check the bit offset
when it resolves the member type for an offset.
Fixes: 437683a9941891c1 ("perf dwarf-aux: Handle type transfer for memory access")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3ab0b8b238b5130ae3fa37ddaa329fc0e93b6b9a ]
The location list will have entries with half-open addressing like
[start, end) which means it doesn't include the end address. So it
should skip entries at the end address and match to the next entry.
An example location list looks like this (from readelf -wo):
00237876 ffffffff8110d32b (base address)
0023787f v000000000000000 v000000000000002 views at 00237868 for:
ffffffff8110d32b ffffffff8110d4eb (DW_OP_reg3 (rbx)) <<<--- 1
00237885 v000000000000002 v000000000000000 views at 0023786a for:
ffffffff8110d4eb ffffffff8110d50b (DW_OP_reg14 (r14)) <<<--- 2
0023788c v000000000000000 v000000000000001 views at 0023786c for:
ffffffff8110d50b ffffffff8110d7c4 (DW_OP_reg3 (rbx))
00237893 v000000000000000 v000000000000000 views at 0023786e for:
ffffffff8110d806 ffffffff8110d854 (DW_OP_reg3 (rbx))
0023789a v000000000000000 v000000000000000 views at 00237870 for:
ffffffff8110d876 ffffffff8110d88e (DW_OP_reg3 (rbx))
The first entry at 0023787f has [8110d32b, 8110d4eb) (omitting the
ffffffff at the beginning), and the second one has [8110d4eb, 8110d50b).
Fixes: 2bc3cf575a162a2c ("perf annotate-data: Improve debug message with location info")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e8bb03ed6850c6ed4ce2f1600ea73401fc2ebd95 ]
It missed to call check_allowed_ops() in __die_collect_vars_cb() so it
can take variables with complex location expression incorrectly.
For example, I found some variable has this expression.
015d8df8 ffffffff81aacfb3 (base address)
015d8e01 v000000000000004 v000000000000000 views at 015d8df2 for:
ffffffff81aacfb3 ffffffff81aacfd2 (DW_OP_fbreg: -176; DW_OP_deref;
DW_OP_plus_uconst: 332; DW_OP_deref_size: 4;
DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64;
DW_OP_minus; DW_OP_stack_value)
015d8e14 v000000000000000 v000000000000000 views at 015d8df4 for:
ffffffff81aacfd2 ffffffff81aacfd7 (DW_OP_reg3 (rbx))
015d8e19 v000000000000000 v000000000000000 views at 015d8df6 for:
ffffffff81aacfd7 ffffffff81aad020 (DW_OP_fbreg: -176; DW_OP_deref;
DW_OP_plus_uconst: 332; DW_OP_deref_size: 4;
DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64;
DW_OP_minus; DW_OP_stack_value)
015d8e2c <End of list>
It looks like '((int *)(-176(%rbp) + 332) >> 1) - 64' but the current
code thought it's just -176(%rbp) and processed the variable incorrectly.
It should reject such a complex expression if check_allowed_ops()
doesn't like it. :)
Fixes: 932dcc2c39aedf54 ("perf dwarf-aux: Add die_collect_vars()")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 2615639352420e6e3115952c5b8f46846e1c6d0e ]
Currently we'll only print metric headers for metric leader in
aggregration mode. This will make `perf iostat` header not shown
since it'll aggregrated globally but don't have metric events:
root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000
Performance counter stats for 'system wide':
port
0000:00 0 0 0 0
0000:80 0 0 0 0
[...]
Fix this by excluding the iostat in the check of printing metric
headers. Then we can see the headers:
root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000
Performance counter stats for 'system wide':
port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
0000:00 0 0 0 0
0000:80 0 0 0 0
[...]
Fixes: 193a9e30207f5477 ("perf stat: Don't display metric header for non-leader uncore events")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: linuxarm@huawei.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Link: https://lore.kernel.org/r/20240802065800.48774-1-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6bdf5168b6fb19541b0c1862bdaa596d116c7bfb ]
When perf_time__parse_str() fails in perf_sched__timehist(),
need to free session that was previously created, fix it.
Fixes: 853b74071110bed3 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240806023533.1316348-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3ef44458071a19e5b5832cdfe6f75273aa521b6e ]
The --total-cycles may output wrong information with the --stdio.
For example:
# perf record -e "{cycles,instructions}",cache-misses -b sleep 1
# perf report --total-cycles --stdio
The total cycles output of {cycles,instructions} and cache-misses are
almost the same.
# Samples: 938 of events 'anon group { cycles, instructions }'
# Event count (approx.): 938
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range]
# ............... .............. ........... .......... ..................................................>
#
11.19% 2.6K 0.10% 21 [perf_iterate_ctx+48 -> >
5.79% 1.4K 0.45% 97 [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
5.11% 1.2K 0.33% 71 [native_write_msr+0 ->>
# Samples: 293 of event 'cache-misses'
# Event count (approx.): 293
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range]
# ............... .............. ........... .......... ..................................................>
#
11.19% 2.6K 0.13% 21 [perf_iterate_ctx+48 -> >
5.79% 1.4K 0.59% 97 [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
5.11% 1.2K 0.43% 71 [native_write_msr+0 ->>
With the symbol_conf.event_group, the 'perf report' should only report the
block information of the leader event in a group.
However, the current implementation retrieves the next event's block
information, rather than the next group leader's block information.
Make sure the index is updated even if the event is skipped.
With the patch,
# Samples: 293 of event 'cache-misses'
# Event count (approx.): 293
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range]
# ............... .............. ........... .......... ..................................................>
#
37.98% 9.0K 4.05% 299 [perf_event_addr_filters_exec+0 -> perf_event_a>
11.19% 2.6K 0.28% 21 [perf_iterate_ctx+48 -> >
5.79% 1.4K 1.32% 97 [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
Fixes: 6f7164fa231a5f36 ("perf report: Sort by sampled cycles percent per block for stdio")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 79bcd34e0f3da39fda841406ccc957405e724852 ]
The processing of leader samples would turn an individual sample with
a group of read values into multiple samples. 'perf inject' would pass
through the additional samples increasing the output data file size:
$ perf record -g -e "{instructions,cycles}:S" -o perf.orig.data true
$ perf script -D -i perf.orig.data | sed -e 's/perf.orig.data/perf.data/g' > orig.txt
$ perf inject -i perf.orig.data -o perf.new.data
$ perf script -D -i perf.new.data | sed -e 's/perf.new.data/perf.data/g' > new.txt
$ diff -u orig.txt new.txt
--- orig.txt 2024-07-29 14:29:40.606576769 -0700
+++ new.txt 2024-07-29 14:30:04.142737434 -0700
...
-0xc550@perf.data [0x30]: event: 3
+0xc550@perf.data [0xd0]: event: 9
+.
+. ... raw event: size 208 bytes
+. 0000: 09 00 00 00 01 00 d0 00 fc 72 01 86 ff ff ff ff .........r......
+. 0010: 74 7d 2c 00 74 7d 2c 00 fb c3 79 f9 ba d5 05 00 t},.t},...y.....
+. 0020: e6 cb 1a 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
+. 0030: 02 00 00 00 00 00 00 00 76 01 00 00 00 00 00 00 ........v.......
+. 0040: e6 cb 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+. 0050: 62 18 00 00 00 00 00 00 f6 cb 1a 00 00 00 00 00 b...............
+. 0060: 00 00 00 00 00 00 00 00 0c 00 00 00 00 00 00 00 ................
+. 0070: 80 ff ff ff ff ff ff ff fc 72 01 86 ff ff ff ff .........r......
+. 0080: f3 0e 6e 85 ff ff ff ff 0c cb 7f 85 ff ff ff ff ..n.............
+. 0090: bc f2 87 85 ff ff ff ff 44 af 7f 85 ff ff ff ff ........D.......
+. 00a0: bd be 7f 85 ff ff ff ff 26 d0 7f 85 ff ff ff ff ........&.......
+. 00b0: 6d a4 ff 85 ff ff ff ff ea 00 20 86 ff ff ff ff m......... .....
+. 00c0: 00 fe ff ff ff ff ff ff 57 14 4f 43 fc 7e 00 00 ........W.OC.~..
+
+1642373909693435 0xc550 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 2915700/2915700: 0xffffffff860172fc period: 1 addr: 0
+... FP chain: nr:12
+..... 0: ffffffffffffff80
+..... 1: ffffffff860172fc
+..... 2: ffffffff856e0ef3
+..... 3: ffffffff857fcb0c
+..... 4: ffffffff8587f2bc
+..... 5: ffffffff857faf44
+..... 6: ffffffff857fbebd
+..... 7: ffffffff857fd026
+..... 8: ffffffff85ffa46d
+..... 9: ffffffff862000ea
+..... 10: fffffffffffffe00
+..... 11: 00007efc434f1457
+... sample_read:
+.... group nr 2
+..... id 00000000001acbe6, value 0000000000000176, lost 0
+..... id 00000000001acbf6, value 0000000000001862, lost 0
+
+0xc620@perf.data [0x30]: event: 3
...
This behavior is incorrect as in the case above 'perf inject' should
have done nothing. Fix this behavior by disabling separating samples
for a tool that requests it. Only request this for `perf inject` so as
to not affect other perf tools. With the patch and the test above
there are no differences between the orig.txt and new.txt.
Fixes: e4caec0d1af3d608 ("perf evsel: Add PERF_SAMPLE_READ sample related processing")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240729220620.2957754-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 040c0f887fdcfe747a3f63c94e9cd29e9ed0b872 ]
The bpf_get_stackid() helper returns a signed type to check whether it
failed to get a stacktrace or not. But it saved the result in u32 and
checked if the value is negative.
376 if (needs_callstack) {
377 pelem->stack_id = bpf_get_stackid(ctx, &stacks,
378 BPF_F_FAST_STACK_CMP | stack_skip);
--> 379 if (pelem->stack_id < 0)
./tools/perf/util/bpf_skel/lock_contention.bpf.c:379 contention_begin()
warn: unsigned 'pelem->stack_id' is never less than zero.
Let's change the type to s32 instead.
Fixes: 6d499a6b3d90277d ("perf lock: Print the number of lost entries for BPF")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812172533.2015291-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3da209bb1177462b6fe8e3021a5527a5a49a9336 ]
The get_sort_order() returns either a new string (from strdup) or NULL
but it never gets freed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 2e7f545096f954a9 ("perf mem: Factor out a function to generate sort order")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-3-namhyung@kernel.org
[ Added Fixes tag ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ae8e4f4048b839c1cb333d9e3d20e634b430139e ]
The linked commit moved the early return on the first sample to before
the verbose log, so move the log earlier too. Now the first sample is
also logged and not skipped.
Fixes: 2d98dbb4c9c5b09c ("perf scripts python arm-cs-trace-disasm.py: Do not ignore disam first sample")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: coresight@lists.linaro.org
Cc: gankulkarni@os.amperecomputing.com
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20240723132858.12747-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 287bd5cf06e0f2c02293ce942777ad1f18059ed3 ]
The spinlock and rwlock use a single-element per-cpu array to track
current locks due to performance reason. But this means the key is
always available and it cannot simply account lock stats in the array
because some of them are invalid.
In fact, the contention_end() program in the BPF invalidates the entry
by setting the 'lock' value to 0 instead of deleting the entry for the
hashmap. So it should skip entries with the lock value of 0 in the
account_end_timestamp().
Otherwise, it'd have spurious high contention on an idle machine:
$ sudo perf lock con -ab -Y spinlock sleep 3
contended total wait max wait avg wait type caller
8 4.72 s 1.84 s 590.46 ms spinlock rcu_core+0xc7
8 1.87 s 1.87 s 233.48 ms spinlock process_one_work+0x1b5
2 1.87 s 1.87 s 933.92 ms spinlock worker_thread+0x1a2
3 1.81 s 1.81 s 603.93 ms spinlock tmigr_update_events+0x13c
2 1.72 s 1.72 s 861.98 ms spinlock tick_do_update_jiffies64+0x25
6 42.48 us 13.02 us 7.08 us spinlock futex_q_lock+0x2a
1 13.03 us 13.03 us 13.03 us spinlock futex_wake+0xce
1 11.61 us 11.61 us 11.61 us spinlock rcu_core+0xc7
I don't believe it has contention on a spinlock longer than 1 second.
After this change, it only reports some small contentions.
$ sudo perf lock con -ab -Y spinlock sleep 3
contended total wait max wait avg wait type caller
4 133.51 us 43.29 us 33.38 us spinlock tick_do_update_jiffies64+0x25
4 69.06 us 31.82 us 17.27 us spinlock process_one_work+0x1b5
2 50.66 us 25.77 us 25.33 us spinlock rcu_core+0xc7
1 28.45 us 28.45 us 28.45 us spinlock rcu_core+0xc7
1 24.77 us 24.77 us 24.77 us spinlock tmigr_update_events+0x13c
1 23.34 us 23.34 us 23.34 us spinlock raw_spin_rq_lock_nested+0x15
Fixes: b5711042a1c8 ("perf lock contention: Use per-cpu array map for spinlocks")
Reported-by: Xi Wang <xii@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 64e166099b69bfc09f667253358a15160b86ea43 ]
Commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")
removed the last use of the absolute kallsyms.
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20240221202655.2423854-1-jannh@google.com/
[masahiroy@kernel.org: rebase the code and reword the commit description]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 63ba5b0fb4f54db256ec43b3062b2606b383055d ]
Currently, the RISC-V firmware JSON file has duplicate event name
"FW_SFENCE_VMA_RECEIVED". According to the RISC-V SBI PMU extension[1],
the event name should be "FW_SFENCE_VMA_ASID_SENT".
Before this patch:
$ perf list
firmware:
fw_access_load
[Load access trap event. Unit: cpu]
fw_access_store
[Store access trap event. Unit: cpu]
....
fw_set_timer
[Set timer event. Unit: cpu]
fw_sfence_vma_asid_received
[Received SFENCE.VMA with ASID request from other HART event. Unit: cpu]
fw_sfence_vma_received
[Sent SFENCE.VMA with ASID request to other HART event. Unit: cpu]
After this patch:
$ perf list
firmware:
fw_access_load
[Load access trap event. Unit: cpu]
fw_access_store
[Store access trap event. Unit: cpu]
.....
fw_set_timer
[Set timer event. Unit: cpu]
fw_sfence_vma_asid_received
[Received SFENCE.VMA with ASID request from other HART event. Unit: cpu]
fw_sfence_vma_asid_sent
[Sent SFENCE.VMA with ASID request to other HART event. Unit: cpu]
fw_sfence_vma_received
[Received SFENCE.VMA request from other HART event. Unit: cpu]
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-pmu.adoc#event-firmware-events-type-15 [1]
Fixes: 8f0dcb4e7364 ("perf arch events: riscv sbi firmware std event files")
Fixes: c4f769d4093d ("perf vendor events riscv: add Sifive U74 JSON file")
Fixes: acbf6de674ef ("perf vendor events riscv: Add StarFive Dubhe-80 JSON file")
Fixes: 7340c6df49df ("perf vendor events riscv: add T-HEAD C9xx JSON file")
Fixes: f5102e31c209 ("riscv: andes: Support specifying symbolic firmware and hardware raw event")
Signed-off-by: Eric Lin <eric.lin@sifive.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Nikita Shubin <n.shubin@yadro.com>
Reviewed-by: Inochi Amaoto <inochiama@outlook.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240719115018.27356-1-eric.lin@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4c17736689ccfc44ec7dcc472577f25c34cf8724 ]
With 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions"),
when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR),
thread__find_map() could return with al->maps being NULL.
The path below could add a callchain_cursor_node with NULL ms.maps.
add_callchain_ip()
thread__find_symbol(.., &al)
thread__find_map(.., &al) // al->maps becomes NULL
ms.maps = maps__get(al.maps)
callchain_cursor_append(..., &ms, ...)
node->ms.maps = maps__get(ms->maps)
Then the path below would dereference NULL maps and get segfault.
fill_callchain_info()
maps__machine(node->ms.maps);
Fix it by checking if maps is NULL in fill_callchain_info().
Fixes: 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions")
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: yzhong@purestorage.com
Link: https://lore.kernel.org/r/20240722211548.61455-1-cachen@purestorage.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 92717bc077892d1ce60fee07aee3a33f33909b85 ]
Now that symsrc_filename is always accessed through an accessor, we also
need a free() function for it to avoid the following compilation error:
util/unwind-libunwind-local.c:416:12: error: lvalue required as unary
‘&’ operand
416 | zfree(&dso__symsrc_filename(dso));
Fixes: 1553419c3c10 ("perf dso: Fix address sanitizer build")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240715094715.3914813-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3612ca8e2935c4c142d99e33b8effa7045ce32b5 upstream.
The hard-coded metrics is wrongly calculated on the hybrid machine.
$ perf stat -e cycles,instructions -a sleep 1
Performance counter stats for 'system wide':
18,205,487 cpu_atom/cycles/
9,733,603 cpu_core/cycles/
9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle
4,268,965 cpu_core/instructions/ # 0.23 insn per cycle
The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44.
When finding the metric events, the find_stat() doesn't take the PMU
type into account. The cpu_atom/cycles/ is wrongly used to calculate
the IPC of the cpu_core.
In the hard-coded metrics, the events from a different PMU are only
SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type,
STAT_NSECS. Except the SW CLOCK events, check the PMU type as well.
Fixes: 0a57b910807a ("perf stat: Use counts rather than saved_value")
Reported-by: Khalil, Amiri <amiri.khalil@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240606180316.4122904-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 1553419c3c10cf386496e68b90b5d0ce966ac614 ]
Various files had been missed from having accessor functions added for
the sake of dso reference count checking. Add the function calls and
missing dso accessor functions.
Fixes: ee756ef7491e ("perf dso: Add reference count checking and accessor functions")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240704011745.1021288-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b40934ae32232140e85dc7dc1c3ea0e296986723 ]
In the past, the exclude_guest setting has had no effect on Intel PT
tracing, but that may not be the case in the future.
Set the flag correctly based upon whether KVM is using Intel PT
"Host/Guest" mode, which is determined by the kvm_intel module
parameter pt_mode:
pt_mode=0 System-wide mode : host and guest output to host buffer
pt_mode=1 Host/Guest mode : host/guest output to host/guest
buffers respectively
Fixes: 6e86bfdc4a60 ("perf intel-pt: Support decoding of guest kernel")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240625104532.11990-3-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 36b4cd990a8fd3f5b748883050e9d8c69fe6398d ]
aux_watermark is a u32. For a 64-bit size, cap the aux_watermark
calculation at UINT_MAX instead of truncating it to 32-bits.
Fixes: 874fc35cdd55 ("perf intel-pt: Use aux_watermark")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240625104532.11990-2-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit caa463bb79a82ac5fce05a0dcf7893d94c84fc5e ]
The new --per-cluster option was added recently but it forgot to update
the aggr_header fields which are used for --metric-only option. And it
resulted in a segfault due to NULL string in fputs().
Fixes: cbc917a1b03b ("perf stat: Support per-cluster aggregation")
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20240628000604.1296808-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit cb39d05e67dc24985ff9f5150e71040fa4d60ab8 ]
It's expected that both hist entries are in the same hists when
comparing two. But the current code in the function checks one without
dso sort key and other with the key. This would make the condition true
in any case.
I guess the intention of the original commit was to add '!' for the
right side too. But as it should be the same, let's just remove it.
Fixes: 69849fc5d2119 ("perf hists: Move sort__has_dso into struct perf_hpp_list")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240621170528.608772-2-namhyung@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit dd9a426eade634bf794c7e0f1b0c6659f556942f ]
In the previous loop, all the members in the aliases[j-1] have been freed
and set to NULL. But in this loop, the function pmu_alias_is_duplicate()
compares the aliases[j] with the aliases[j-1] that has already been
disposed, so the function will always return false and duplicate aliases
will never be discarded.
If we find duplicate aliases, it skips the zfree aliases[j], which is
accompanied by a memory leak.
We can use the next aliases[j+1] to theck for duplicate aliases to
fixes the aliases NULL pointer dereference, then goto zfree code snippet
to release it.
After patch testing:
$ perf list --unit=hisi_sicl,cpa pmu
uncore cpa:
cpa_p0_rd_dat_32b
[Number of read ops transmitted by the P0 port which size is 32 bytes.
Unit: hisi_sicl,cpa]
cpa_p0_rd_dat_64b
[Number of read ops transmitted by the P0 port which size is 64 bytes.
Unit: hisi_sicl,cpa]
Fixes: c3245d2093c1 ("perf pmu: Abstract alias/event struct")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Cc: ravi.bangoria@amd.com
Cc: james.clark@arm.com
Cc: prime.zeng@hisilicon.com
Cc: cuigaosheng1@huawei.com
Cc: jonathan.cameron@huawei.com
Cc: linuxarm@huawei.com
Cc: yangyicong@huawei.com
Cc: robh@kernel.org
Cc: renyu.zj@linux.alibaba.com
Cc: kjain@linux.ibm.com
Cc: john.g.garry@oracle.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240614094318.11607-1-hejunhao3@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b0979f008f1352a44cd3c8877e3eb8a1e3e1c6f3 ]
Perf test for perf probe of function from different CU fails
as below:
./perf test -vv "test perf probe of function from different CU"
116: test perf probe of function from different CU:
--- start ---
test child forked, pid 2679
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.Msa7iy89bx/testfile
Error: Failed to add events.
--- Cleaning up ---
"foo" does not hit any event.
Error: Failed to delete events.
---- end(-1) ----
116: test perf probe of function from different CU : FAILED!
The test does below to probe function "foo" :
# gcc -g -Og -flto -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.c
-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
# gcc -g -Og -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.c
-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
# gcc -g -Og -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
# ./perf probe -x /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile foo
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
Error: Failed to add events.
Perf probe fails to find symbol foo in the executable placed in
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7
Simple reproduce:
# mktemp -d /tmp/perf-checkXXXXXXXXXX
/tmp/perf-checkcWpuLRQI8j
# gcc -g -o test test.c
# cp test /tmp/perf-checkcWpuLRQI8j/
# nm /tmp/perf-checkcWpuLRQI8j/test | grep foo
00000000100006bc T foo
# ./perf probe -x /tmp/perf-checkcWpuLRQI8j/test foo
Failed to find symbol foo in /tmp/perf-checkcWpuLRQI8j/test
Error: Failed to add events.
But it works with any files like /tmp/perf/test. Only for
patterns with "/tmp/perf-", this fails.
Further debugging, commit 80d496be89ed ("perf report: Add support
for profiling JIT generated code") added support for profiling JIT
generated code. This patch handles dso's of form
"/tmp/perf-$PID.map" .
The check used "if (strncmp(self->name, "/tmp/perf-", 10) == 0)"
to match "/tmp/perf-$PID.map". With this commit, any dso in
/tmp/perf- folder will be considered separately for processing
(not only JIT created map files ). Fix this by changing the
string pattern to check for "/tmp/perf-%d.map". Add a helper
function is_perf_pid_map_name to do this check. In "struct dso",
dso->long_name holds the long name of the dso file. Since the
/tmp/perf-$PID.map check uses the complete name, use dso___long_name for
the string name.
With the fix,
# ./perf test "test perf probe of function from different CU"
117: test perf probe of function from different CU : Ok
Fixes: 56cbeacf1435 ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: akanksha@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240623064850.83720-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ff16aeb9b83441b8458d4235496cf320189a0c60 ]
The 2 second sleep can cause the test to fail on very slow network file
systems because Perf ends up being killed before it finishes starting
up.
Fix it by making the leafloop workload end after a fixed time like the
other workloads so there is no need to kill it after 2 seconds.
Also remove the 1 second start sampling delay because it is similarly
fragile. Instead, search through all samples for a matching one, rather
than just checking the first sample and hoping it's in the right place.
Fixes: cd6382d82752 ("perf test arm64: Test unwinding using fame-pointer (fp) mode")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: Spoorthy S <spoorts2@in.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240612140316.3006660-1-james.clark@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 0b90dfda222e38b7ca8dad6e098e36f5186f0b94 ]
In the case 'before' and 'after' are broken out from pos,
maps_by_address may be changed by __maps__insert, as such it needs
re-reading.
Don't ignore the return value from __maps_insert.
Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: Steinar H . Gunderson <sesse@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240521165109.708593-2-irogers@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dsos__add would add at the end of the dso array possibly requiring a
later find to re-sort the array. Patterns of find then add were
becoming O(n*log n) due to the sorts. Change the add routine to be
O(n) rather than O(1) but to maintain the sorted-ness of the dsos
array so that later finds don't need the O(n*log n) sort.
Fixes: 3f4ac23a9908 ("perf dsos: Switch backing storage to array from rbtree/list")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The array is sorted, so just move the elements and insert in order.
Fixes: 13ca628716c6 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Matt Fleming <matt@readmodwrite.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Matt Fleming <matt@readmodwrite.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ingo reported that he was seeing these when hitting Control+C during a
perf tools build:
Makefile.perf:1149: *** Missing bpftool input for generating vmlinux.h. Stop.
The failure happens when you don't have vmlinux.h or vmlinux with BTF.
ifeq ($(VMLINUX_H),)
ifeq ($(VMLINUX_BTF),)
$(error Missing bpftool input for generating vmlinux.h)
endif
endif
VMLINUX_BTF can be empty if you didn't build a kernel or it doesn't have
a BTF section and the current kernel also has no BTF. This is totally
ok.
But VMLINUX_H should be set to the minimal version in the source tree
(unless you overwrite it manually) when you don't pass GEN_VMLINUX_H=1
(which requires VMLINUX_BTF should not be empty). The problem is that
it's defined in Makefile.config which is not included for `make clean`.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lore.kernel.org/lkml/CAM9d7ch5HTr+k+_GpbMrX0HUo5BZ11byh1xq0Two7B7RQACuNw@mail.gmail.com
Link: http://lore.kernel.org/lkml/ZjssGrj+abyC6mYP@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 7d1405c71df21f6c394b8a885aa8a133f749fa22.
This causes segfaults in some cases, as reported by Milian:
```
sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e
raw_syscalls:sys_enter ls
...
[ perf record: Woken up 3 times to write data ]
malloc(): invalid next size (unsorted)
Aborted
```
Backtrace with GDB + debuginfod:
```
malloc(): invalid next size (unsorted)
Thread 1 "perf" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
no_tid=no_tid@entry=0) at pthread_kill.c:44
Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO
(ret) : 0;
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>,
signo=6) at pthread_kill.c:78
#2 0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/
raise.c:26
#3 0x00007ffff6e384c3 in __GI_abort () at abort.c:79
#4 0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea
"%s\n") at ../sysdeps/posix/libc_fatal.c:132
#5 0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850
"malloc(): invalid next size (unsorted)") at malloc.c:5772
#6 0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0
<main_arena>, bytes=bytes@entry=368) at malloc.c:4081
#7 0x00007ffff6eb877e in __libc_calloc (n=<optimized out>,
elem_size=<optimized out>) at malloc.c:3754
#8 0x000055555569bdb6 in perf_session.do_write_header ()
#9 0x00005555555a373a in __cmd_record.constprop.0 ()
#10 0x00005555555a6846 in cmd_record ()
#11 0x000055555564db7f in run_builtin ()
#12 0x000055555558ed77 in main ()
```
Valgrind memcheck:
```
==45136== Invalid write of size 8
==45136== at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf)
==45136== by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf)
==45136== by 0x15A845: cmd_record (in /usr/bin/perf)
==45136== by 0x201B7E: run_builtin (in /usr/bin/perf)
==45136== by 0x142D76: main (in /usr/bin/perf)
==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
==45136== by 0x3574AB: zalloc (in /usr/bin/perf)
==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
==45136== by 0x15A845: cmd_record (in /usr/bin/perf)
==45136== by 0x201B7E: run_builtin (in /usr/bin/perf)
==45136== by 0x142D76: main (in /usr/bin/perf)
==45136==
==45136== Syscall param write(buf) points to unaddressable byte(s)
==45136== at 0x575953D: __libc_write (write.c:26)
==45136== by 0x575953D: write (write.c:24)
==45136== by 0x35761F: ion (in /usr/bin/perf)
==45136== by 0x357778: writen (in /usr/bin/perf)
==45136== by 0x1548F7: record__write (in /usr/bin/perf)
==45136== by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf)
==45136== by 0x15A845: cmd_record (in /usr/bin/perf)
==45136== by 0x201B7E: run_builtin (in /usr/bin/perf)
==45136== by 0x142D76: main (in /usr/bin/perf)
==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
==45136== by 0x3574AB: zalloc (in /usr/bin/perf)
==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
==45136== by 0x15A845: cmd_record (in /usr/bin/perf)
==45136== by 0x201B7E: run_builtin (in /usr/bin/perf)
==45136== by 0x142D76: main (in /usr/bin/perf)
==45136==
-----
Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/
Reported-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@kernel.org # 6.8+
Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
new 'mseal' syscall
But also to wire up shadow stacks on 32-bit x86, picking up those
changes from these csets:
ff388fe5c481d39c ("mseal: wire up mseal syscall")
2883f01ec37dd866 ("x86/shstk: Enable shadow stacks for x32")
This makes 'perf trace' support it, now its possible, for instance to
do:
# perf trace -e mseal --max-stack=16
Here is an example with the 'sendmmsg' syscall:
root@x1:~# perf trace -e sendmmsg --max-stack 16 --max-events=1
0.000 ( 0.062 ms): dbus-broker/1012 sendmmsg(fd: 150, mmsg: 0x7ffef57cca50, vlen: 1, flags: DONTWAIT|NOSIGNAL) = 1
syscall_exit_to_user_mode_prepare ([kernel.kallsyms])
syscall_exit_to_user_mode_prepare ([kernel.kallsyms])
syscall_exit_to_user_mode ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64 ([kernel.kallsyms])
[0x117ce7] (/usr/lib64/libc.so.6 (deleted))
root@x1:~#
To do a system wide tracing of the new 'mseal' syscall with a backtrace
of at most 16 entries.
This addresses these perf tools build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H J Lu <hjl.tools@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlXlo4TNcba4wnVZ@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the kernel sources to pick POSTED_MSI_NOTIFICATION
To pick up the change in:
f5a3562ec9dd29e6 ("x86/irq: Reserve a per CPU IDT vector for posted MSIs")
That picks up this new vector:
$ cp arch/x86/include/asm/irq_vectors.h tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h
$ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh > after
$ diff -u before after
--- before 2024-05-27 12:50:47.708863932 -0300
+++ after 2024-05-27 12:51:15.335113123 -0300
@@ -1,6 +1,7 @@
static const char *x86_irq_vectors[] = {
[0x02] = "NMI",
[0x80] = "IA32_SYSCALL",
+ [0xeb] = "POSTED_MSI_NOTIFICATION",
[0xec] = "LOCAL_TIMER",
[0xed] = "HYPERV_STIMER0",
[0xee] = "HYPERV_REENLIGHTENMENT",
$
Now those will be known when pretty printing the irq_vectors:*
tracepoints.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZlS34M0x30EFVhbg@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To pick up the fixes in:
0645fbe760afcc53 ("net: have do_accept() take a struct proto_accept_arg argument")
That just changes a function prototype, not touching things used by the
perf scrape scripts such as:
$ tools/perf/trace/beauty/sockaddr.sh | head -5
static const char *socket_families[] = {
[0] = "UNSPEC",
[1] = "LOCAL",
[2] = "INET",
[3] = "AX25",
$
This addresses this perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlSrceExgjrUiDb5@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no scrape script yet for those, but the warning pointed out we
need to update the array with the F_LINUX_SPECIFIC_BASE entries, do it.
Now 'perf trace' can decode that cmd and also use it in filter, as in:
root@number:~# perf trace -e syscalls:*enter_fcntl --filter 'cmd != SETFL && cmd != GETFL'
0.000 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLK, arg: 0x7fffdc6a8a50)
0.013 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a8aa0)
0.090 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a88e0)
^Croot@number:~#
This picks up the changes in:
c62b758bae6af16f ("fcntl: add F_DUPFD_QUERY fcntl()")
Addressing this perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlSqNQH9mFw2bmjq@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To pick the changes in:
628d701f2de5b9a1 ("powerpc/dexcr: Add DEXCR prctl interface")
6b9391b581fddd85 ("riscv: Include riscv_set_icache_flush_ctx prctl")
That adds some PowerPC and a RISC-V specific prctl options:
$ tools/perf/trace/beauty/prctl_option.sh > before
$ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h
$ tools/perf/trace/beauty/prctl_option.sh > after
$ diff -u before after
--- before 2024-05-27 12:14:21.358032781 -0300
+++ after 2024-05-27 12:14:32.364530185 -0300
@@ -65,6 +65,9 @@
[68] = "GET_MEMORY_MERGE",
[69] = "RISCV_V_SET_CONTROL",
[70] = "RISCV_V_GET_CONTROL",
+ [71] = "RISCV_SET_ICACHE_FLUSH_CTX",
+ [72] = "PPC_GET_DEXCR",
+ [73] = "PPC_SET_DEXCR",
};
static const char *prctl_set_mm_options[] = {
[1] = "START_CODE",
$
That now will be used to decode the syscall option and also to compose
filters, for instance:
[root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME
0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee)
0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670)
7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10)
7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970)
8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10)
8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970)
^C[root@five ~]#
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/lkml/ZlSklGWp--v_Ije7@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To get the changes in:
2a82bb02941fb53d ("statx: stx_subvol")
To pick up this change and support it:
$ tools/perf/trace/beauty/statx_mask.sh > before
$ cp include/uapi/linux/stat.h tools/perf/trace/beauty/include/uapi/linux/stat.h
$ tools/perf/trace/beauty/statx_mask.sh > after
$ diff -u before after
--- before 2024-05-22 13:39:49.742470571 -0300
+++ after 2024-05-22 13:39:59.157883101 -0300
@@ -14,4 +14,5 @@
[ilog2(0x00001000) + 1] = "MNT_ID",
[ilog2(0x00002000) + 1] = "DIOALIGN",
[ilog2(0x00004000) + 1] = "MNT_ID_UNIQUE",
+ [ilog2(0x00008000) + 1] = "SUBVOL",
};
$
Now we'll see it like we see these:
# perf trace -e statx
0.000 ( 0.015 ms): systemd-userwo/3982299 statx(dfd: 6, filename: ".", mask: TYPE|INO|MNT_ID, buffer: 0x7ffd8945e850) = 0
<SNIP>
180.559 ( 0.007 ms): (ostnamed)/3982957 statx(dfd: 4, filename: "sys", flags: SYMLINK_NOFOLLOW|NO_AUTOMOUNT|STATX_DONT_SYNC, mask: TYPE, buffer: 0x7fff13161190) = 0
180.918 ( 0.011 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/kernel/security", flags: SYMLINK_NOFOLLOW|NO_AUTOMOUNT|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0
180.956 ( 0.010 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/fs/cgroup", flags: SYMLINK_NOFOLLOW|NO_AUTOMOUNT|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0
<SNIP>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zk5nO9yT0oPezUoo@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 617824a7f0f73e4de325cf8add58e55b28c12493.
This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.
The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Ethan Adams <j.ethan.adams@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com
Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:
- Integrate the shellcheck utility with the build of perf to allow
catching shell problems early in areas such as 'perf test', 'perf
trace' scrape scripts, etc
- Add 'uretprobe' variant in the 'perf bench uprobe' tool
- Add script to run instances of 'perf script' in parallel
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on
systems where those tracepoints aren't available
- Add Kan Liang to MAINTAINERS as a perf tools reviewer
- Add support for using the 'capstone' disassembler library in
various tools, such as 'perf script' and 'perf annotate'. This is
an alternative for the use of the 'xed' and 'objdump' disassemblers
Data-type profiling improvements:
- Resolve types for a->b->c by backtracking the assignments until it
finds DWARF info for one of those members
- Support for global variables, keeping a cache to speed up lookups
- Handle the 'call' instruction, dealing with effects on registers
and handling its return when tracking register data types
- Handle x86's segment based addressing like %gs:0x28, to support
things like per CPU variables, the stack canary, etc
- Data-type profiling got big speedups when using capstone for
disassembling. The objdump outoput parsing method is left as a
fallback when capstone fails or isn't available. There are patches
posted for 6.11 that to use a LLVM disassembler
- Support event group display in the TUI when annotating types with
--data-type, for instance to show memory load and store events for
the data type fields
- Optimize the 'perf annotate' data structures, reducing memory usage
- Add a initial 'perf test' for 'perf annotate', checking that a
target symbol appears on the output, specifying objdump via the
command line, etc
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
erroneously in TopdownL1
- Add AMD's Zen 5 core and uncore events and metrics. Those come from
the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
Processors" document, with events that capture information on op
dispatch, execution and retirement, branch prediction, L1 and L2
cache activity, TLB activity, etc
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
AmpereOneX
Miscellaneous:
- Sync header copies with the kernel sources
- Move some header copies used only for generating translation string
tables for ioctl cmds and other syscall integer arguments to a new
directory under tools/perf/beauty/, to separate from copies in
tools/include/ that are used to build the tools
- Introduce scrape script for several syscall 'flags'/'mask'
arguments
- Improve cpumap utilization, fixing up pairing of refcounts, using
the right iterators (perf_cpu_map__for_each_cpu), etc
- Give more details about raw event encodings in 'perf list', show
tracepoint encoding in the detailed output
- Refactor the DSOs handling code, reducing memory usage
- Document the BPF event modifier and add a 'perf test' for it
- Improve the event parser, better error messages and add further
'perf test's for it
- Add reference count checking to 'struct comm_str' and 'struct
mem_info'
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust
- Tweak the ARM64's Coresight 'perf test's
- Improve ARM64's CoreSight ETM version detection and error reporting
- Fix handling of symbols when using kcore
- Fix PAI (Processor Activity Instrumentation) counter names for s390
virtual machines in 'perf report'
- Fix -g/--call-graph option failure in 'perf sched timehist'
- Add LIBTRACEEVENT_DIR build option to allow building with
libtraceevent installed in non-standard directories, such as when
doing cross builds
- Various 'perf test' and 'perf bench' fixes
- Improve 'perf probe' error message for long C++ probe names"
* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
tools lib subcmd: Show parent options in help
perf pmu: Count sys and cpuid JSON events separately
perf stat: Don't display metric header for non-leader uncore events
perf annotate-data: Ensure the number of type histograms
perf annotate: Fix segfault on sample histogram
perf daemon: Fix file leak in daemon_session__control
libsubcmd: Fix parse-options memory leak
perf lock: Avoid memory leaks from strdup()
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
perf tools: Ignore deleted cgroups
perf parse: Allow tracepoint names to start with digits
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse-events: pass parse_state to add_tracepoint
perf symbols: Fix ownership of string in dso__load_vmlinux()
perf symbols: Update kcore map before merging in remaining symbols
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Remove map from list before updating addresses
perf tracepoint: Don't scan all tracepoints to test if one exists
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf thread: Fixes to thread__new() related to initializing comm
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sys events are eagerly loaded as each event has a compat option that may
mean the event is or isn't associated with the PMU.
These shouldn't be counted as loaded_json_events as that is used for
JSON events matching the CPUID that may or may not have been loaded. The
mismatch causes issues on ARM64 that uses sys events.
Fixes: e6ff1eed3584362d ("perf pmu: Lazily add JSON events")
Closes: https://lore.kernel.org/lkml/20240510024729.1075732-1-justin.he@arm.com/
Reported-by: Jia He <justin.he@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240511003601.2666907-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On an Intel tigerlake laptop a metric like:
{
"BriefDescription": "Test",
"MetricExpr": "imc_free_running@data_read@ + imc_free_running@data_write@",
"MetricGroup": "Test",
"MetricName": "Test",
"ScaleUnit": "6.103515625e-5MiB"
},
Will have 4 events:
uncore_imc_free_running_0/data_read/
uncore_imc_free_running_0/data_write/
uncore_imc_free_running_1/data_read/
uncore_imc_free_running_1/data_write/
If aggregration is disabled with metric-only 2 column headers are
needed:
$ perf stat -M test --metric-only -A -a sleep 1
Performance counter stats for 'system wide':
MiB Test MiB Test
CPU0 1821.0 1820.5
But when not, the counts aggregated in the metric leader and only 1
column should be shown:
$ perf stat -M test --metric-only -a sleep 1
Performance counter stats for 'system wide':
MiB Test
5909.4
1.001258915 seconds time elapsed
Achieve this by skipping events that aren't metric leaders when
printing column headers and aggregation isn't disabled.
The bug is long standing, the fixes tag is set to a refactor as that
is as far back as is reasonable to backport.
Fixes: 088519f318be3a41 ("perf stat: Move the display functions to stat-display.c")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaige Ye <ye@kaige.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240510051309.2452468-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Arnaldo reported that there is a case where nr_histograms and histograms
don't agree each other.
It ended up in a segfault trying to access a NULL histograms array.
Let's make sure to update the nr_histograms when the histograms array is
changed.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240510210452.2449944-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A symbol can have no samples, then accessing the annotated_source->samples
hashmap will result in a segfault.
Fixes: a3f7768bcf48281d ("perf annotate: Fix memory leak in annotated_source")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240510210452.2449944-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|