summaryrefslogtreecommitdiffstats
path: root/lib/locking-selftest-rlock.h
diff options
context:
space:
mode:
authorDenis Vlasenko <vda.linux@googlemail.com>2007-07-15 23:41:56 -0700
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-07-16 09:05:52 -0700
commit4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2 (patch)
tree88780b40c23883af5e9958a7f397f23ff5619ff7 /lib/locking-selftest-rlock.h
parentb39a734097d5095d63eb9c709a6aaf965633bb01 (diff)
downloadlinux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.tar.gz
linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.tar.bz2
linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.zip
vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2
Optimize integer-to-string conversion in vsprintf.c for base 10. This is by far the most used conversion, and in some use cases it impacts performance. For example, top reads /proc/$PID/stat for every process, and with 4000 processes decimal conversion alone takes noticeable time. Using code from http://www.cs.uiowa.edu/~jones/bcd/decimal.html (with permission from the author, Douglas W. Jones) binary-to-decimal-string conversion is done in groups of five digits at once, using only additions/subtractions/shifts (with -O2; -Os throws in some multiply instructions). On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code. This patch is run tested. Userspace benchmark/test is also attached. I tested it on PIII and AMD64 and new code is generally ~2.5 times faster. On AMD64: # ./vsprintf_verify-O2 Original decimal conv: .......... 151 ns per iteration Patched decimal conv: .......... 62 ns per iteration Testing correctness 12895992590592 ok... [Ctrl-C] # ./vsprintf_verify-O2 Original decimal conv: .......... 151 ns per iteration Patched decimal conv: .......... 62 ns per iteration Testing correctness 26025406464 ok... [Ctrl-C] More realistic test: top from busybox project was modified to report how many us it took to scan /proc (this does not account any processing done after that, like sorting process list), and then I test it with 4000 processes: #!/bin/sh i=4000 while test $i != 0; do sleep 30 & let i-- done busybox top -b -n3 >/dev/null on unpatched kernel: top: 4120 processes took 102864 microseconds to scan top: 4120 processes took 91757 microseconds to scan top: 4120 processes took 92517 microseconds to scan top: 4120 processes took 92581 microseconds to scan on patched kernel: top: 4120 processes took 75460 microseconds to scan top: 4120 processes took 66451 microseconds to scan top: 4120 processes took 67267 microseconds to scan top: 4120 processes took 67618 microseconds to scan The speedup comes from much faster generation of /proc/PID/stat by sprintf() calls inside the kernel. Signed-off-by: Douglas W Jones <jones@cs.uiowa.edu> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'lib/locking-selftest-rlock.h')
0 files changed, 0 insertions, 0 deletions