diff options
author | Denis Vlasenko <vda.linux@googlemail.com> | 2007-07-15 23:41:56 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-07-16 09:05:52 -0700 |
commit | 4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2 (patch) | |
tree | 88780b40c23883af5e9958a7f397f23ff5619ff7 /lib/locking-selftest-rlock.h | |
parent | b39a734097d5095d63eb9c709a6aaf965633bb01 (diff) | |
download | linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.tar.gz linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.tar.bz2 linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.zip |
vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2
Optimize integer-to-string conversion in vsprintf.c for base 10. This is
by far the most used conversion, and in some use cases it impacts
performance. For example, top reads /proc/$PID/stat for every process, and
with 4000 processes decimal conversion alone takes noticeable time.
Using code from
http://www.cs.uiowa.edu/~jones/bcd/decimal.html
(with permission from the author, Douglas W. Jones)
binary-to-decimal-string conversion is done in groups of five digits at
once, using only additions/subtractions/shifts (with -O2; -Os throws in
some multiply instructions).
On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code.
This patch is run tested. Userspace benchmark/test is also attached.
I tested it on PIII and AMD64 and new code is generally ~2.5 times
faster. On AMD64:
# ./vsprintf_verify-O2
Original decimal conv: .......... 151 ns per iteration
Patched decimal conv: .......... 62 ns per iteration
Testing correctness
12895992590592 ok... [Ctrl-C]
# ./vsprintf_verify-O2
Original decimal conv: .......... 151 ns per iteration
Patched decimal conv: .......... 62 ns per iteration
Testing correctness
26025406464 ok... [Ctrl-C]
More realistic test: top from busybox project was modified to
report how many us it took to scan /proc (this does not account
any processing done after that, like sorting process list),
and then I test it with 4000 processes:
#!/bin/sh
i=4000
while test $i != 0; do
sleep 30 &
let i--
done
busybox top -b -n3 >/dev/null
on unpatched kernel:
top: 4120 processes took 102864 microseconds to scan
top: 4120 processes took 91757 microseconds to scan
top: 4120 processes took 92517 microseconds to scan
top: 4120 processes took 92581 microseconds to scan
on patched kernel:
top: 4120 processes took 75460 microseconds to scan
top: 4120 processes took 66451 microseconds to scan
top: 4120 processes took 67267 microseconds to scan
top: 4120 processes took 67618 microseconds to scan
The speedup comes from much faster generation of /proc/PID/stat
by sprintf() calls inside the kernel.
Signed-off-by: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'lib/locking-selftest-rlock.h')
0 files changed, 0 insertions, 0 deletions