diff options
author | Waiman Long <Waiman.Long@hp.com> | 2014-09-30 13:36:15 -0400 |
---|---|---|
committer | Arnaldo Carvalho de Melo <acme@redhat.com> | 2014-10-01 14:39:57 -0300 |
commit | 4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf (patch) | |
tree | fa1e74148d31382b71588a7fbbc679f165e4a6cd /tools/perf/util/dso.h | |
parent | 8fa7d87f91479f7124142ca4ad93a37b80f8c1c0 (diff) | |
download | linux-4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf.tar.gz linux-4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf.tar.bz2 linux-4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf.zip |
perf symbols: Improve DSO long names lookup speed with rbtree
With workload that spawns and destroys many threads and processes, it
was found that perf-mem could took a long time to post-process the perf
data after the target workload had completed its operation.
The performance bottleneck was found to be the lookup and insertion of
the new DSO structures (thousands of them in this case).
In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
perf profile below shows what perf was doing after the profiled AIM7
shared workload completed:
- 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
- __strcmp_sse42
- 99.82% map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
- 13.17% perf perf [.] __dsos__findnew
__dsos__findnew
map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
So about 97% of CPU times were spent in the map__new() function trying
to insert new DSO entry into the DSO linked list. The whole
post-processing step took about 9 minutes.
The DSO structures are currently searched linearly. So the total
processing time will be proportional to n^2.
To overcome this performance problem, the DSO code is modified to also
put the DSO structures in a RB tree sorted by its long name in
additional to being in a simple linked list. With this change, the
processing time will become proportional to n*log(n) which will be much
quicker for large n. However, the short name will still be searched
using the old linear searching method. With that patch in place, the
same perf-mem post-processing step took less than 30 seconds to
complete.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Diffstat (limited to 'tools/perf/util/dso.h')
-rw-r--r-- | tools/perf/util/dso.h | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h index b63dc98ad71d..acb651acc7fd 100644 --- a/tools/perf/util/dso.h +++ b/tools/perf/util/dso.h @@ -91,14 +91,17 @@ struct dso_cache { }; /* - * DSOs are put into a list for fast iteration. + * DSOs are put into both a list for fast iteration and rbtree for fast + * long name lookup. */ struct dsos { struct list_head head; + struct rb_root root; /* rbtree root sorted by long name */ }; struct dso { struct list_head node; + struct rb_node rb_node; /* rbtree node sorted by long name */ struct rb_root symbols[MAP__NR_TYPES]; struct rb_root symbol_names[MAP__NR_TYPES]; void *a2l; |