linux.git - Linux kernel mainline tree

diff options

author	Daniel Borkmann <daniel@iogearbox.net>	2018-05-14 23:22:32 +0200
committer	Alexei Starovoitov <ast@kernel.org>	2018-05-14 19:11:45 -0700
commit	6d2eea6fb0791bf191043a68937c9aa59c5ea678 (patch)
tree	24500a6739e431df2dab4aadc262cac4233a25cb /samples
parent	09ece3d0f2ee524b8d1d3c775d9eeb50c40558d8 (diff)
download	linux-6d2eea6fb0791bf191043a68937c9aa59c5ea678.tar.gz linux-6d2eea6fb0791bf191043a68937c9aa59c5ea678.tar.bz2 linux-6d2eea6fb0791bf191043a68937c9aa59c5ea678.zip

bpf, arm64: optimize 32/64 immediate emission

Improve the JIT to emit 64 and 32 bit immediates, the current algorithm is not optimal and we often emit more instructions than actually needed. arm64 has movz, movn, movk variants but for the current 64 bit immediates we only use movz with a series of movk when needed. For example loading ffffffffffffabab emits the following 4 instructions in the JIT today: * movz: abab, shift: 0, result: 000000000000abab * movk: ffff, shift: 16, result: 00000000ffffabab * movk: ffff, shift: 32, result: 0000ffffffffabab * movk: ffff, shift: 48, result: ffffffffffffabab Whereas after the patch the same load only needs a single instruction: * movn: 5454, shift: 0, result: ffffffffffffabab Another example where two extra instructions can be saved: * movz: abab, shift: 0, result: 000000000000abab * movk: 1f2f, shift: 16, result: 000000001f2fabab * movk: ffff, shift: 32, result: 0000ffff1f2fabab * movk: ffff, shift: 48, result: ffffffff1f2fabab After the patch: * movn: e0d0, shift: 16, result: ffffffff1f2fffff * movk: abab, shift: 0, result: ffffffff1f2fabab Another example with movz, before: * movz: 0000, shift: 0, result: 0000000000000000 * movk: fea0, shift: 32, result: 0000fea000000000 After: * movz: fea0, shift: 32, result: 0000fea000000000 Moreover, reuse emit_a64_mov_i() for 32 bit immediates that are loaded via emit_a64_mov_i64() which is a similar optimization as done in 6fe8b9c1f41d ("bpf, x64: save several bytes by using mov over movabsq when possible"). On arm64, the latter allows to use a single instruction with movn due to zero extension where otherwise two would be needed. And last but not least add a missing optimization in emit_a64_mov_i() where movn is used but the subsequent movk not needed. With some of the Cilium programs in use, this shrinks the needed instructions by about three percent. Tested on Cavium ThunderX CN8890. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Diffstat (limited to 'samples')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: