summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* perf_counter: Provide a way to enable counters on execPaul Mackerras2009-06-303-4/+55
| | | | | | | | | | | | | | | This provides a way to mark a counter to be enabled on the next exec. This is useful for measuring the total activity of a program without including overhead from the process that launches it. This also changes the perf stat command to use this new facility. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Reduce perf stat measurement overhead/skewPaul Mackerras2009-06-291-14/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vince Weaver reported a 'perf stat' measurement overhead in the count of retired instructions, which can amount to a +6000 instructions inflated count in the reported count. At present, perf stat creates its counters on the perf process. Thus the counters count the fork and various other activity in both the parent and child, such as the resolver overhead for resolving PLT entries for any libc functions that haven't been called before, such as execvp. This reduces the overhead by creating the counters on the child process after the fork, using a couple of pipes to synchronize so that the child process waits until the parent has created the counters before doing the exec. To eliminate the PLT resolution overhead on calling execvp, this does a dummy execvp first which will always fail. With this, the overhead of executing a program goes down from over 4800 instructions to about 90 instructions on powerpc (32-bit). This was measured with a statically-linked program written in assembler which only does the 3 instructions needed to call _exit(0). Before: $ perf stat -e 0:1:u ./three Performance counter stats for './three': 4858 instructions 0.001274523 seconds time elapsed After: $ perf stat -e 0:1:u ./three Performance counter stats for './three': 92 instructions 0.000468153 seconds time elapsed Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Use percentages for scaling outputIngo Molnar2009-06-291-1/+2
| | | | | | | | | | | | | | Peter expressed a strong preference for percentage based display of scaled values - so revert to that from the recently introduced multiplication-factor unit. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Update x86_pmu after WARN()Yinghai Lu2009-06-291-2/+2
| | | | | | | | | The print out should read the value before changing the value. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A487017.4090007@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Micro-optimize the code: memcpy is only required if no event is ↵Jaswinder Singh Rajput2009-06-281-5/+6
| | | | | | | | | | | | | | | | | selected and !null_run Set attrs and nr_counters if no event is selected and !null_run. Setting of attrs should depend on number of counters, so we need to memcpy only for sizeof(default_attrs) Also set nr_counters as ARRAY_SIZE(default_attrs) in place of hardcoded value. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Improve outputJaswinder Singh Rajput2009-06-271-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase size for event name to handle bigger names like 'L1-d$-prefetch-misses' Changed scaled counters from percentage to a multiplicative factor because the latter is more expressive. Also aligned the scaling factor, otherwise sometimes it looks like: 384 iTLB-load-misses (4.74x scaled) 452029 branch-loads (8.00x scaled) 5892 branch-load-misses (20.39x scaled) 972315 iTLB-loads (3.24x scaled) Before: 150708 L1-d$-stores (scaled from 23.57%) 428804 L1-d$-prefetches (scaled from 23.47%) 314446 L1-d$-prefetch-misses (scaled from 23.42%) 252626137 L1-i$-loads (scaled from 23.24%) 5297550 dTLB-load-misses (scaled from 23.96%) 106992392 branch-loads (scaled from 23.67%) 5239561 branch-load-misses (scaled from 23.43%) After: 1731713 L1-d$-loads ( 14.25x scaled) 44241 L1-d$-prefetches ( 3.88x scaled) 21076 L1-d$-prefetch-misses ( 3.40x scaled) 5789421 L1-i$-loads ( 3.78x scaled) 29645 dTLB-load-misses ( 2.95x scaled) 461474 branch-loads ( 6.52x scaled) 7493 branch-load-misses ( 26.57x scaled) Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Fix multi-run statsIngo Molnar2009-06-271-4/+11
| | | | | | | | | | | | | | | In multi-run (-r/--repeat) printouts, print out the noise of the wall-clock average as well. Also, fix a bug in printing out scaled counters: if it was not scaled then we should not update the average with -1. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Add -n/--null option to run without countersIngo Molnar2009-06-271-1/+4
| | | | | | | | | | | | | Allow a no-counters run. This can be useful to measure just elapsed wall-clock time - or to assess the raw overhead of perf stat itself, without running any counters. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Remove dead codeIngo Molnar2009-06-275-132/+3
| | | | | | | | | | | | | | | | Vince Weaver reported that there's a handful of #ifdef __MINGW32__ sections in the code. Remove them as they are in essence dead code - as unlike upstream Git, the perf tool is unlikely to be ported to Windows. Reported-by: Vince Weaver <vince@deater.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Complete counter swapPeter Zijlstra2009-06-261-1/+6
| | | | | | | | | | Complete the counter swap by indeed switching the times too and updating the userpage after modifying the counter values. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1246014623.31755.195.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf report: Print sorted callchains per histogram entriesFrederic Weisbecker2009-06-261-11/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the newly created callchains radix tree to gather the chains stats from the recorded events and then print the callchains for all of them, sorted by hits, using the "-c" parameter with perf report. Example: 66.15% [k] atm_clip_exit 63.08% 0xffffffffffffff80 0xffffffff810196a8 0xffffffff810c14c8 0xffffffff8101a79c 0xffffffff810194f3 0xffffffff8106ab7f 0xffffffff8106abe5 0xffffffff8106acde 0xffffffff8100d94b 0xffffffff8153e7ea [...] 1.54% 0xffffffffffffff80 0xffffffff810196a8 0xffffffff810c14c8 0xffffffff8101a79c [...] Symbols are not yet resolved. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1246026481-8314-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Prepare a small callchain frameworkFrederic Weisbecker2009-06-265-5/+213
| | | | | | | | | | | | | | | | | We plan to display the callchains depending on some user-configurable parameters. To gather the callchains stats from the recorded stream in a fast way, this patch introduces an ad hoc radix tree adapted for callchains and also a rbtree to sort these callchains once we have gathered every events from the stream. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf record: Fix unhandled io return valueFrederic Weisbecker2009-06-251-1/+4
| | | | | | | | | | | | | | | | | | Building latest perfcounter fails on the following error: builtin-record.c: In function ‘create_counter’: builtin-record.c:451: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result make: *** [builtin-record.o] Erreur 1 Just check if we successfully read the perf file descriptor. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1245961287-5327-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Add alias for 'l1d' and 'l1i'Jaswinder Singh Rajput2009-06-251-2/+2
| | | | | | | | | | | | Add 'l1d' and 'l1i' aliases again as shortcuts - just dont make them the primary display alias. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1245945462.9157.11.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf-report: Add bare minimum PERF_EVENT_READ parsingPeter Zijlstra2009-06-251-0/+24
| | | | | | | | Provide the basic infrastructure to provide per task stats. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf-report: Add modes for inherited stats and no-samplesPeter Zijlstra2009-06-251-2/+17
| | | | | | | | | Now that we can collect per task statistics, add modes that make use of that facility. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Rework the sample ABIPeter Zijlstra2009-06-255-48/+49
| | | | | | | | | | | | | | | | | The PERF_EVENT_READ implementation made me realize we don't actually need the sample_type int the output sample, since we already have that in the perf_counter_attr information. Therefore, remove the PERF_EVENT_MISC_OVERFLOW bit and the event->type overloading, and imply put counter overflow samples in a PERF_EVENT_SAMPLE type. This also fixes the issue that event->type was only 32-bit and sample_type had 64 usable bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Implement more accurate per task statisticsPeter Zijlstra2009-06-252-4/+83
| | | | | | | | | | | | | | | | | | | | With the introduction of PERF_EVENT_READ we have the possibility to provide accurate counter values for individual tasks in a task hierarchy. However, due to the lazy context switching used for similar counter contexts our current per task counts are way off. In order to maintain some of the lazy switch benefits we don't disable it out-right, but simply iterate the active counters and flip the values between the contexts. This only reads the counters but does not need to reprogram the full PMU. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Add PERF_EVENT_READPeter Zijlstra2009-06-252-4/+80
| | | | | | | | | | | | | | Provide a read() like event which can be used to log the counter value at specific sites such as child->parent folding on exit. In order to be useful, we log the counter parent ID, not the actual counter ID, since userspace can only relate parent IDs to perf_counter_attr constructs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Add mmap counter read supportPeter Zijlstra2009-06-254-1/+20
| | | | | | | | | Update the mmap control page with the needed information to use the userspace RDPMC instruction for self monitoring. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Add scale information to the mmap control pagePeter Zijlstra2009-06-252-1/+9
| | | | | | | | Add the needed time scale to the self-profile mmap information. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Split the mmap control page in two partsPeter Zijlstra2009-06-251-0/+6
| | | | | | | | | Since there are two distinct sections to the control page, move them apart so that possible extentions don't overlap. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Rework the file formatPeter Zijlstra2009-06-259-54/+377
| | | | | | | | | | Create a structured file format that includes the full perf_counter_attr and all its relevant counter IDs so that the reporting program has full information. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Shorten names for eventsJaswinder Singh Rajput2009-06-251-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added new alias for events. On AMD box: $ ./perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses -- ls -lR /usr/include/ > /dev/null Before : Performance counter stats for 'ls -lR /usr/include/': 248064467 L1-data-Cache-Load-Referencees (scaled from 23.27%) 1001433 L1-data-Cache-Load-Misses (scaled from 23.34%) 153691 L1-data-Cache-Store-Referencees (scaled from 23.34%) 423248 L1-data-Cache-Prefetch-Referencees (scaled from 23.33%) 302138 L1-data-Cache-Prefetch-Misses (scaled from 23.25%) 251217546 L1-instruction-Cache-Load-Referencees (scaled from 23.25%) 5757005 L1-instruction-Cache-Load-Misses (scaled from 23.23%) 93435 L1-instruction-Cache-Prefetch-Referencees (scaled from 23.24%) 6496073 L2-Cache-Load-Referencees (scaled from 23.32%) 609485 L2-Cache-Load-Misses (scaled from 23.45%) 6876991 L2-Cache-Store-Referencees (scaled from 23.71%) 248922840 Data-TLB-Cache-Load-Referencees (scaled from 23.94%) 5828386 Data-TLB-Cache-Load-Misses (scaled from 24.17%) 257613506 Instruction-TLB-Cache-Load-Referencees (scaled from 24.20%) 6833 Instruction-TLB-Cache-Load-Misses (scaled from 23.88%) 109043606 Branch-Cache-Load-Referencees (scaled from 23.64%) 5552296 Branch-Cache-Load-Misses (scaled from 23.42%) 0.413702461 seconds time elapsed. After : Peformance counter stats for 'ls -lR /usr/include/': 266590464 L1-d$-loads (scaled from 23.03%) 1222273 L1-d$-load-misses (scaled from 23.58%) 146204 L1-d$-stores (scaled from 23.83%) 406344 L1-d$-prefetches (scaled from 24.09%) 283748 L1-d$-prefetch-misses (scaled from 24.10%) 249650965 L1-i$-loads (scaled from 23.80%) 3353961 L1-i$-load-misses (scaled from 23.82%) 104599 L1-i$-prefetches (scaled from 23.68%) 4836405 LLC-loads (scaled from 23.67%) 498214 LLC-load-misses (scaled from 23.66%) 4953994 LLC-stores (scaled from 23.64%) 243354097 dTLB-loads (scaled from 23.77%) 6468584 dTLB-load-misses (scaled from 23.74%) 249719549 iTLB-loads (scaled from 23.25%) 5060 iTLB-load-misses (scaled from 23.00%) 112343016 branch-loads (scaled from 22.76%) 5528876 branch-load-misses (scaled from 22.54%) 0.427154051 seconds time elapsed. Reported-by : Ingo Molnar <mingo@elte.hu> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245934522.5308.39.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Check for valid cache operationsJaswinder Singh Rajput2009-06-251-0/+33
| | | | | | | | | | | | | | | | | Made new table for cache operartion stat 'hw_cache_stat' as: L1I : Read and prefetch only ITLB and BPU : Read-only introduce is_cache_op_valid() for cache operation validity And checks for valid cache operations. Reported-by : Ingo Molnar <mingo@elte.hu> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245930367.5308.33.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf record: Fix filemap pathname parsing in /proc/pid/mapsJohannes Weiner2009-06-251-3/+2
| | | | | | | | | | | | | | | | | | | Looking backward for the first space from the end of a line in /proc/pid/maps does not find the start of the pathname of the mapped file if it contains a space. Since the only slashes we have in this file occur in the (absolute!) pathname column of file mappings, looking for the first slash in a line is a safe method to find the name. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Stefani Seibold <stefani@seibold.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090624190835.GA25548@cmpxchg.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Add CREDITS file for Git contributorsIngo Molnar2009-06-241-0/+30
| | | | | | | | | | | | | | | Much of perf's libraries comes from the Git project. I noticed that the files (in tools/perf/util/*.[ch] and elsewhere) are quite spartan wrt. credits, so lets add a CREDITS file that includes an (incomplete!) list of main contributors. Thanks guys, these libraries are really useful. Special thanks go to Johannes Schindelin and Junio C Hamano for coming up with this list. List-Composed-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Remove dead codeJaswinder Singh Rajput2009-06-241-31/+13
| | | | | | | | | Remove dead code and do some code alignment. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245847774.2681.2.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Set global control MSR correctlyYong Wang2009-06-241-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous code made an assumption that the power on value of global control MSR has enabled all fixed and general purpose counters properly. However, this is not the case for certain Intel processors, such as Atom - and it might also be firmware dependent. Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the enable bits for all privilege levels in the respective IA32_PERFEVTSELx or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of respective counters. Counting is enabled if the AND'ed results is true; counting is disabled when the result is false. The end result is that all fixed counters are always disabled on Atom processors because the assumption is just invalid. Fix this by not initializing the ctrl-mask out of the global MSR, but setting it to perf_counter_mask. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Fix strbuf_fread() error path handlingRoel Kluin2009-06-241-1/+1
| | | | | | | | | | | | | size_t res cannot be less than 0 - fread returns 0 on error. [ Updated by: René Scharfe <rene.scharfe@lsrfire.ath.cx> ] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Junio C Hamano <gitster@pobox.com> LKML-Reference: <4A3FB479.2090902@lsrfire.ath.cx> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Fix verbose for perf statJaswinder Singh Rajput2009-06-231-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | Error message should use stderr for verbose (-v), otherwise message will be lost for: $ ./perf stat -v <cmd> > /dev/null For example on AMD bus-cycles event is not available so now it looks like: $ ./perf stat -v -e bus-cycles ls > /dev/null Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument) Performance counter stats for 'ls': <not counted> bus-cycles 0.006765877 seconds time elapsed. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245757369.3776.1.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf report: Fix help text typoIngo Molnar2009-06-231-1/+1
| | | | | | | | | | Reported-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Optimize perf_counter_alloc()'s inherit casePeter Zijlstra2009-06-231-12/+20
| | | | | | | | | | | | | We don't need to add usage counts for swcounter and attr usage models for inherited counters since the parent counter will always have one, which suffices to generate the needed output. This avoids up to 3 global atomic increments per inherited counter. LKML-Reference: <new-submission> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Push inherit into perf_counter_alloc()Peter Zijlstra2009-06-231-8/+6
| | | | | | | | | | | | | | | | Teach perf_counter_alloc() about inheritance so that we can optimize the inherit path in the next patch. Remove the child_counter->atrr.inherit = 1 line because the only way to get there is if parent_counter->attr.inherit == 1 and we copy the attrs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Optimize perf_swcounter_event()Peter Zijlstra2009-06-232-4/+25
| | | | | | | | | | | | | | | Similar to tracepoints, use an enable variable to reduce overhead when unused. Only look for a counter of a particular event type when we know there is at least one in the system. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Handle overlapping MMAP eventsPeter Zijlstra2009-06-231-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Martin Schwidefsky reported "perf report" symbol resolution problems on S390. Since we only report MMAP, not MUNMAP, we have to deal with overlapping maps. We used to simply throw out the old map on the assumption whole maps got unmapped. This obviously doesn't deal with partial unmaps. However it appears some dynamic linkers do fancy partial unmaps (s390), so do something more elaborate and truncate the old maps, only removing them when they've been fully covered. This resolves (part of) the S390 symbol resolution problems. Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Tested-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Fix command option / manpageJaswinder Singh Rajput2009-06-231-3/+3
| | | | | | | | | -l is not supported, it should be -S for scale. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245703959.6167.16.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Set alias for page-faultsJaswinder Singh Rajput2009-06-221-18/+18
| | | | | | | | | | | "faults" should be alias for "page-faults" Also fixed alignment and 80 characters issue Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245683846.12092.1.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf report: Output more symbol related debug dataPeter Zijlstra2009-06-222-2/+7
| | | | | | | | | | | Print more symbol relocation related info under -vv. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Introduce alias member in event_symbolJaswinder Singh Rajput2009-06-221-25/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By introducing alias member in event_symbol : 1. duplicate lines are removed, like: cpu-cycles and cycles branch-instructions and branches context-switches and cs cpu-migrations and migrations 2. We can also add alias for another events. Now ./perf list looks like : List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults [Software event] faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] rNNN [raw hardware event descriptor] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245669268.17153.8.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Define separate declarations for H/W and S/W eventsJaswinder Singh Rajput2009-06-221-22/+22
| | | | | | | | | | | | | | | Define separate declarations for H/W and S/W events to: 1. Shorten name to save some space so that we can add more members 2. Fix alignment 3. Avoid declaring HARDWARE/SOFTWARE again and again. Removed unused CR(x, y) Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245669194.17153.6.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter tools: Fix vmlinux fallback when running on a different kernelIngo Molnar2009-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lucas De Marchi reported that perf report and perf annotate displays mismatching profile if a perf.data is analyzed on an older kernel - even if the correct vmlinux is specified via the -k option. The reason is the fallback path in util/symbol.c:dso__load_kernel(): int dso__load_kernel(struct dso *self, const char *vmlinux, symbol_filter_t filter, int verbose) { int err = -1; if (vmlinux) err = dso__load_vmlinux(self, vmlinux, filter, verbose); if (err) err = dso__load_kallsyms(self, filter, verbose); return err; } dso__load_vmlinux() returns negative on error, but on success it returns the number of symbols loaded - which confuses the function to load the kallsyms. This is normally harmless, as reporting is usually performed on the same kernel that is analyzed - but if there's a mismatch then we load the wrong kallsyms and create a non-sensical symbol tree. The fix is to only fall back to kallsyms on errors. Reported-by: Lucas De Marchi <lucas.de.marchi@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMDJaswinder Singh Rajput2009-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix AMD's Data Cache Refills from System event. After this patch : ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null Performance counter stats for 'ls /dev/': 2499484 L1-data-Cache-Load-Referencees (scaled from 3.97%) 70347 L1-data-Cache-Load-Misses (scaled from 7.30%) 9360 L1-data-Cache-Store-Referencees (scaled from 8.64%) 32804 L1-data-Cache-Prefetch-Referencees (scaled from 17.72%) 7693 L1-data-Cache-Prefetch-Misses (scaled from 22.97%) 2180945 L1-instruction-Cache-Load-Referencees (scaled from 28.48%) 14518 L1-instruction-Cache-Load-Misses (scaled from 35.00%) 2405 L1-instruction-Cache-Prefetch-Referencees (scaled from 34.89%) 71387 L2-Cache-Load-Referencees (scaled from 34.94%) 18732 L2-Cache-Load-Misses (scaled from 34.92%) 79918 L2-Cache-Store-Referencees (scaled from 36.02%) 1295294 Data-TLB-Cache-Load-Referencees (scaled from 35.99%) 30896 Data-TLB-Cache-Load-Misses (scaled from 33.36%) 1222030 Instruction-TLB-Cache-Load-Referencees (scaled from 29.46%) 357 Instruction-TLB-Cache-Load-Misses (scaled from 20.46%) 530888 Branch-Cache-Load-Referencees (scaled from 11.48%) 8638 Branch-Cache-Load-Misses (scaled from 5.09%) 0.011295149 seconds time elapsed. Earlier it always shows value 0. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* mm: page_alloc: clear PG_locked before checking flags on freeJohannes Weiner2009-06-201-5/+4
| | | | | | | | | | | | da456f1 "page allocator: do not disable interrupts in free_page_mlock()" moved the PG_mlocked clearing after the flag sanity checking which makes mlocked pages always trigger 'bad page'. Fix this by clearing the bit up front. Reported--and-debugged-by: Peter Chubb <peter.chubb@nicta.com.au> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86, 64-bit: Clean up user address maskingLinus Torvalds2009-06-204-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The discussion about using "access_ok()" in get_user_pages_fast() (see commit 7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use 'access_ok()' as a range check in get_user_pages_fast()" for details and end result), made us notice that x86-64 was really being very sloppy about virtual address checking. So be way more careful and straightforward about masking x86-64 virtual addresses: - All the VIRTUAL_MASK* variants now cover half of the address space, it's not like we can use the full mask on a signed integer, and the larger mask just invites mistakes when applying it to either half of the 48-bit address space. - /proc/kcore's kc_offset_to_vaddr() becomes a lot more obvious when it transforms a file offset into a (kernel-half) virtual address. - Unify/simplify the 32-bit and 64-bit USER_DS definition to be based on TASK_SIZE_MAX. This cleanup and more careful/obvious user virtual address checking also uncovered a buglet in the x86-64 implementation of strnlen_user(): it would do an "access_ok()" check on the whole potential area, even if the string itself was much shorter, and thus return an error even for valid strings. Our sloppy checking had hidden this. So this fixes 'strnlen_user()' to do this properly, the same way we already handled user strings in 'strncpy_from_user()'. Namely by just checking the first byte, and then relying on fault handling for the rest. That always works, since we impose a guard page that cannot be mapped at the end of the user space address space (and even if we didn't, we'd have the address space hole). Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds2009-06-202-3/+3
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq, irq.h: Fix kernel-doc warnings genirq: fix comment to say IRQ_WAKE_THREAD
| * genirq, irq.h: Fix kernel-doc warningsRandy Dunlap2009-06-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc warnings in linux/irq.h: Warning(include/linux/irq.h:201): No description found for parameter 'node' Warning(include/linux/irq.h:201): Excess struct/union/enum/typedef member 'cpu' description in 'irq_desc' Warning(include/linux/irq.h:434): No description found for parameter 'node' Warning(include/linux/irq.h:434): Excess function parameter 'cpu' description in 'alloc_desc_masks' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <4A3467EC.50006@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * genirq: fix comment to say IRQ_WAKE_THREADSteven Rostedt2009-05-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to implement a driver to use threaded irqs, I was confused when the return value to use that was described in the comment above request_threaded_irq was not defined. Turns out that the enum is IRQ_WAKE_THREAD where as the comment said IRQ_THREAD_WAKE. [Impact: do not confuse developers with wrong comments ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <alpine.DEB.2.00.0905121431020.13338@gandalf.stny.rr.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds2009-06-2040-892/+2321
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...
| * | perfcounter: Handle some IO return valuesFrederic Weisbecker2009-06-202-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building perfcounter tools raises the following warnings: builtin-record.c: In function ‘atexit_header’: builtin-record.c:464: erreur: ignoring return value of ‘pwrite’, declared with attribute warn_unused_result builtin-record.c: In function ‘__cmd_record’: builtin-record.c:503: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result builtin-report.c: In function ‘__cmd_report’: builtin-report.c:1403: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result This patch handles these IO return values. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1245456100-5477-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>