linux-stable.git - Linux kernel stable tree

diff options

author	Denis Vlasenko <vda.linux@googlemail.com>	2007-07-15 23:41:56 -0700
committer	Linus Torvalds <torvalds@woody.linux-foundation.org>	2007-07-16 09:05:52 -0700
commit	4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2 (patch)
tree	88780b40c23883af5e9958a7f397f23ff5619ff7 /lib/locking-selftest-rlock.h
parent	b39a734097d5095d63eb9c709a6aaf965633bb01 (diff)
download	linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.tar.gz linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.tar.bz2 linux-stable-4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2.zip

vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2

Optimize integer-to-string conversion in vsprintf.c for base 10. This is by far the most used conversion, and in some use cases it impacts performance. For example, top reads /proc/$PID/stat for every process, and with 4000 processes decimal conversion alone takes noticeable time. Using code from http://www.cs.uiowa.edu/~jones/bcd/decimal.html (with permission from the author, Douglas W. Jones) binary-to-decimal-string conversion is done in groups of five digits at once, using only additions/subtractions/shifts (with -O2; -Os throws in some multiply instructions). On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code. This patch is run tested. Userspace benchmark/test is also attached. I tested it on PIII and AMD64 and new code is generally ~2.5 times faster. On AMD64: # ./vsprintf_verify-O2 Original decimal conv: .......... 151 ns per iteration Patched decimal conv: .......... 62 ns per iteration Testing correctness 12895992590592 ok... [Ctrl-C] # ./vsprintf_verify-O2 Original decimal conv: .......... 151 ns per iteration Patched decimal conv: .......... 62 ns per iteration Testing correctness 26025406464 ok... [Ctrl-C] More realistic test: top from busybox project was modified to report how many us it took to scan /proc (this does not account any processing done after that, like sorting process list), and then I test it with 4000 processes: #!/bin/sh i=4000 while test $i != 0; do sleep 30 & let i-- done busybox top -b -n3 >/dev/null on unpatched kernel: top: 4120 processes took 102864 microseconds to scan top: 4120 processes took 91757 microseconds to scan top: 4120 processes took 92517 microseconds to scan top: 4120 processes took 92581 microseconds to scan on patched kernel: top: 4120 processes took 75460 microseconds to scan top: 4120 processes took 66451 microseconds to scan top: 4120 processes took 67267 microseconds to scan top: 4120 processes took 67618 microseconds to scan The speedup comes from much faster generation of /proc/PID/stat by sprintf() calls inside the kernel. Signed-off-by: Douglas W Jones <jones@cs.uiowa.edu> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Diffstat (limited to 'lib/locking-selftest-rlock.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: