summaryrefslogtreecommitdiffstats
path: root/tools/perf/Documentation
diff options
context:
space:
mode:
authorRoberto Agostino Vitillo <ravitillo@lbl.gov>2012-02-09 23:21:03 +0100
committerIngo Molnar <mingo@elte.hu>2012-03-09 08:26:05 +0100
commitb50311dc2ac1c04ad19163c2359910b25e16caf6 (patch)
tree80a4489b2b268b7512dd4eb566858a6bae8bfffe /tools/perf/Documentation
parentbdfebd848f2a14e639031a0b0e61d7c7ee5e5fd2 (diff)
downloadlinux-b50311dc2ac1c04ad19163c2359910b25e16caf6.tar.gz
linux-b50311dc2ac1c04ad19163c2359910b25e16caf6.tar.bz2
linux-b50311dc2ac1c04ad19163c2359910b25e16caf6.zip
perf report: Add support for taken branch sampling
This patch adds support for taken branch sampling, i.e, the PERF_SAMPLE_BRANCH_STACK feature to perf report. In other words, to display histograms based on taken branches rather than executed instructions addresses. The new option is called -b and it takes no argument. To generate meaningful output, the perf.data must have been obtained using perf record -b xxx ... where xxx is a branch filter option. The output shows symbols, modules, sorted by 'who branches where' the most often. The percentages reported in the first column refer to the total number of branches captured and not the usual number of samples. Here is a quick example. Here branchy is simple test program which looks as follows: void f2(void) {} void f3(void) {} void f1(unsigned long n) { if (n & 1UL) f2(); else f3(); } int main(void) { unsigned long i; for (i=0; i < N; i++) f1(i); return 0; } Here is the output captured on Nehalem, if we are only interested in user level function calls. $ perf record -b any_call,u -e cycles:u branchy $ perf report -b --sort=symbol 52.34% [.] main [.] f1 24.04% [.] f1 [.] f3 23.60% [.] f1 [.] f2 0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow 0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn 0.01% [k] _IO_vfprintf_internal [k] strchrnul 0.01% [k] __printf [k] _IO_vfprintf_internal 0.01% [k] main [k] __printf About half (52%) of the call branches captured are from main() -> f1(). The second half (24%+23%) is split in two equal shares between f1() -> f2(), f1() ->f3(). The output is as expected given the code. It should be noted, that using -b in perf record does not eliminate information in the perf.data file. Consequently, a typical profile can also be obtained by perf report by simply not using its -b option. It is possible to sort on branch related columns: - dso_from, symbol_from - dso_to, symbol_to - mispredict Signed-off-by: Roberto Agostino Vitillo <ravitillo@lbl.gov> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-14-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r--tools/perf/Documentation/perf-report.txt7
1 files changed, 7 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 9b430e98712e..19b9092cf8b7 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -153,6 +153,13 @@ OPTIONS
information which may be very large and thus may clutter the display.
It currently includes: cpu and numa topology of the host system.
+-b::
+--branch-stack::
+ Use the addresses of sampled taken branches instead of the instruction
+ address to build the histograms. To generate meaningful output, the
+ perf.data file must have been obtained using perf record -b xxx where
+ xxx is a branch filter option.
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-annotate[1]