linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	xsk: remove rebind support	Björn Töpel	2018-05-22	1	-21/+9
\| \| \| \| \| \| \| \| \| \|	Supporting rebind, i.e. after a successful bind the process can call bind again without closing the socket, makes the AF_XDP setup state machine more complex. Constrain the state space, by not supporting rebind. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge branch 'bpf-sk-msg-fields'	Daniel Borkmann	2018-05-18	6	-3/+244
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	John Fastabend says: ==================== In this series we add the ability for sk msg programs to read basic sock information about the sock they are attached to. The second patch adds the tests to the selftest test_verifier. One observation that I had from writing this seriess is lots of the ./net/core/filter.c code is almost duplicated across program types. I thought about building a template/macro that we could use as a single block of code to read sock data out for multiple programs, but I wasn't convinced it was worth it yet. The result was using a macro saved a couple lines of code per block but made the code a bit harder to read IMO. We can probably revisit the idea later if we get more duplication. v2: add errstr field to negative test_verifier test cases to ensure we get the expected err string back from the verifier. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	bpf: add sk_msg prog sk access tests to test_verifier	John Fastabend	2018-05-18	2	-0/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add tests for BPF_PROG_TYPE_SK_MSG to test_verifier for read access to new sk fields. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	bpf: allow sk_msg programs to read sock fields	John Fastabend	2018-05-18	4	-3/+121
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	Currently sk_msg programs only have access to the raw data. However, it is often useful when building policies to have the policies specific to the socket endpoint. This allows using the socket tuple as input into filters, etc. This patch adds ctx access to the sock fields. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge branch 'bpf-nfp-shift-insns'	Daniel Borkmann	2018-05-18	5	-31/+435
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jiong Wang says: ==================== NFP eBPF JIT is missing logic indirect shifts (both left and right) and arithmetic right shift (both indirect shift and shift by constant). This patch adds support for them. For indirect shifts, shift amount is not specified as constant, NFP needs to get the shift amount through the low 5 bits of source A operand in PREV_ALU, therefore extra instructions are needed compared with shifts by constants. Because NFP is 32-bit, so we are using register pair for 64-bit shifts and therefore would need different instruction sequences depending on whether shift amount is less than 32 or not. NFP branch-on-bit-test instruction emitter is added by this patch set and is used for efficient runtime check on shift amount. We'd think the shift amount is less than 32 if bit 5 is clear and greater or equal then 32 otherwise. Shift amount is greater than or equal to 64 will result in undefined behavior. This patch also use range info to avoid generating unnecessary runtime code if we are certain shift amount is less than 32 or not. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	nfp: bpf: support arithmetic indirect right shift (BPF_ARSH \| BPF_X)	Jiong Wang	2018-05-18	1	-10/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code logic is similar with arithmetic right shift by constant, and NFP get indirect shift amount through source A operand of PREV_ALU. It is possible to fall back to logic right shift if the MSB is known to be zero from range info, however there is no benefit to do this given logic indirect right shift use the same number and cycle of instruction sequence. Suppose the MSB of regX is the bit we want to replicate to fill in all the vacant positions, and regY contains the shift amount, then we could use single instruction to set up both. [alu, --, regY, OR, regX] -- NOTE: the PREV_ALU result doesn't need to write to any destination register. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	nfp: bpf: support arithmetic right shift by constant (BPF_ARSH \| BPF_K)	Jiong Wang	2018-05-18	2	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code logic is similar with logic right shift except we also need to set PREV_ALU result properly, the MSB of which is the bit that will be replicated to fill in all the vacant positions. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	nfp: bpf: support logic indirect shifts (BPF_[L\|R]SH \| BPF_X)	Jiong Wang	2018-05-18	5	-32/+322
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For indirect shifts, shift amount is not specified as constant, NFP needs to get the shift amount through the low 5 bits of source A operand in PREV_ALU, therefore extra instructions are needed compared with shifts by constants. Because NFP is 32-bit, so we are using register pair for 64-bit shifts and therefore would need different instruction sequences depending on whether shift amount is less than 32 or not. NFP branch-on-bit-test instruction emitter is added by this patch and is used for efficient runtime check on shift amount. We'd think the shift amount is less than 32 if bit 5 is clear and greater or equal than 32 otherwise. Shift amount is greater than or equal to 64 will result in undefined behavior. This patch also use range info to avoid generating unnecessary runtime code if we are certain shift amount is less than 32 or not. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge branch 'bpf-af-xdp-cleanups'	Daniel Borkmann	2018-05-18	11	-127/+34
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Björn Töpel says: ==================== This series contain "cosmetics only" follow-up patches for AF_XDP. Thanks to Daniel for suggesting them! ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	xsk: proper '=' alignment	Björn Töpel	2018-05-18	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Properly align xsk_proto_ops initialization. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	xsk: fixed some cases of unnecessary parentheses	Björn Töpel	2018-05-18	3	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed some cases of unnecessary parentheses. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	xsk: remove newline at end of file	Björn Töpel	2018-05-18	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor cleanup, remove newline at end of Makefile. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	xsk: clean up SPDX headers	Björn Töpel	2018-05-18	10	-102/+11
\|/ \| \| \| \| \| \|	Clean up SPDX-License-Identifier and removing licensing leftovers. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: sockmap, fix double-free	Gustavo A. R. Silva	2018-05-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	`e' is being freed twice. Fix this by removing one of the kfree() calls. Addresses-Coverity-ID: 1468983 ("Double free") Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: sockmap, fix uninitialized variable	Gustavo A. R. Silva	2018-05-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	There is a potential execution path in which variable err is returned without being properly initialized previously. Fix this by initializing variable err to 0. Addresses-Coverity-ID: 1468964 ("Uninitialized scalar variable") Fixes: e5cd3abcb31a ("bpf: sockmap, refactor sockmap routines to work with hashmap") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: change eBPF helper doc parsing script to allow for smaller indent	Quentin Monnet	2018-05-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Documentation for eBPF helpers can be parsed from bpf.h and eventually turned into a man page. Commit 6f96674dbd8c ("bpf: relax constraints on formatting for eBPF helper documentation") changed the script used to parse it, in order to allow for different indent style and to ease the work for writing documentation for future helpers. The script currently considers that the first tab can be replaced by 6 to 8 spaces. But the documentation for bpf_fib_lookup() uses a mix of tabs (for the "Description" part) and of spaces ("Return" part), and only has 5 space long indent for the latter. We probably do not want to change the values accepted by the script each time a new helper gets a new indent style. However, it is worth noting that with those 5 spaces, the "Description" and "Return" part look aligned in the generated patch and in `git show`, so it is likely other helper authors will use the same length. Therefore, allow for helper documentation to use 5 spaces only for the first indent level. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller	2018-05-16	114	-1865/+4600
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Daniel Borkmann says: ==================== pull-request: bpf-next 2018-05-17 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Provide a new BPF helper for doing a FIB and neighbor lookup in the kernel tables from an XDP or tc BPF program. The helper provides a fast-path for forwarding packets. The API supports IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are implemented in this initial work, from David (Ahern). 2) Just a tiny diff but huge feature enabled for nfp driver by extending the BPF offload beyond a pure host processing offload. Offloaded XDP programs are allowed to set the RX queue index and thus opening the door for defining a fully programmable RSS/n-tuple filter replacement. Once BPF decided on a queue already, the device data-path will skip the conventional RSS processing completely, from Jakub. 3) The original sockmap implementation was array based similar to devmap. However unlike devmap where an ifindex has a 1:1 mapping into the map there are use cases with sockets that need to be referenced using longer keys. Hence, sockhash map is added reusing as much of the sockmap code as possible, from John. 4) Introduce BTF ID. The ID is allocatd through an IDR similar as with BPF maps and progs. It also makes BTF accessible to user space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data through BPF_OBJ_GET_INFO_BY_FD, from Martin. 5) Enable BPF stackmap with build_id also in NMI context. Due to the up_read() of current->mm->mmap_sem build_id cannot be parsed. This work defers the up_read() via a per-cpu irq_work so that at least limited support can be enabled, from Song. 6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND JIT conversion as well as implementation of an optimized 32/64 bit immediate load in the arm64 JIT that allows to reduce the number of emitted instructions; in case of tested real-world programs they were shrinking by three percent, from Daniel. 7) Add ifindex parameter to the libbpf loader in order to enable BPF offload support. Right now only iproute2 can load offloaded BPF and this will also enable libbpf for direct integration into other applications, from David (Beckett). 8) Convert the plain text documentation under Documentation/bpf/ into RST format since this is the appropriate standard the kernel is moving to for all documentation. Also add an overview README.rst, from Jesper. 9) Add __printf verification attribute to the bpf_verifier_vlog() helper. Though it uses va_list we can still allow gcc to check the format string, from Mathieu. 10) Fix a bash reference in the BPF selftest's Makefile. The '\|& ...' is a bash 4.0+ feature which is not guaranteed to be available when calling out to shell, therefore use a more portable variant, from Joe. 11) Fix a 64 bit division in xdp_umem_reg() by using div_u64() instead of relying on the gcc built-in, from Björn. 12) Fix a sock hashmap kmalloc warning reported by syzbot when an overly large key size is used in hashmap then causing overflows in htab->elem_size. Reject bogus attr->key_size early in the sock_hash_alloc(), from Yonghong. 13) Ensure in BPF selftests when urandom_read is being linked that --build-id is always enabled so that test_stacktrace_build_id[_nmi] won't be failing, from Alexei. 14) Add bitsperlong.h as well as errno.h uapi headers into the tools header infrastructure which point to one of the arch specific uapi headers. This was needed in order to fix a build error on some systems for the BPF selftests, from Sirio. 15) Allow for short options to be used in the xdp_monitor BPF sample code. And also a bpf.h tools uapi header sync in order to fix a selftest build failure. Both from Prashant. 16) More formally clarify the meaning of ID in the direct packet access section of the BPF documentation, from Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: sockmap, on update propagate errors back to userspace	John Fastabend	2018-05-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an error happens in the update sockmap element logic also pass the err up to the user. Fixes: e5cd3abcb31a ("bpf: sockmap, refactor sockmap routines to work with hashmap") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	bpf: fix sock hashmap kmalloc warning	Yonghong Song	2018-05-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	syzbot reported a kernel warning below: WARNING: CPU: 0 PID: 4499 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4499 Comm: syz-executor050 Not tainted 4.17.0-rc3+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 panic+0x22f/0x4de kernel/panic.c:184 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:kmalloc_slab+0x56/0x70 mm/slab_common.c:996 RSP: 0018:ffff8801d907fc58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801aeecb280 RCX: ffffffff8185ebd7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffe1 RBP: ffff8801d907fc58 R08: ffff8801adb5e1c0 R09: ffffed0035a84700 R10: ffffed0035a84700 R11: ffff8801ad423803 R12: ffff8801aeecb280 R13: 00000000fffffff4 R14: ffff8801ad891a00 R15: 00000000014200c0 __do_kmalloc mm/slab.c:3713 [inline] __kmalloc+0x25/0x760 mm/slab.c:3727 kmalloc include/linux/slab.h:517 [inline] map_get_next_key+0x24a/0x640 kernel/bpf/syscall.c:858 __do_sys_bpf kernel/bpf/syscall.c:2131 [inline] __se_sys_bpf kernel/bpf/syscall.c:2096 [inline] __x64_sys_bpf+0x354/0x4f0 kernel/bpf/syscall.c:2096 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe The test case is against sock hashmap with a key size 0xffffffe1. Such a large key size will cause the below code in function sock_hash_alloc() overflowing and produces a smaller elem_size, hence map creation will be successful. htab->elem_size = sizeof(struct htab_elem) + round_up(htab->map.key_size, 8); Later, when map_get_next_key is called and kernel tries to allocate the key unsuccessfully, it will issue the above warning. Similar to hashtab, ensure the key size is at most MAX_BPF_STACK for a successful map creation. Fixes: 81110384441a ("bpf: sockmap, add hash map support") Reported-by: syzbot+e4566d29080e7f3460ff@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	libbpf: add ifindex to enable offload support	David Beckett	2018-05-17	4	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BPF programs currently can only be offloaded using iproute2. This patch will allow programs to be offloaded using libbpf calls. Signed-off-by: David Beckett <david.beckett@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	bpf: add __printf verification to bpf_verifier_vlog	Mathieu Malaterre	2018-05-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__printf is useful to verify format and arguments. ‘bpf_verifier_vlog’ function is used twice in verifier.c in both cases the caller function already uses the __printf gcc attribute. Remove the following warning, triggered with W=1: kernel/bpf/verifier.c:176:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	samples/bpf: Decrement ttl in fib forwarding example	David Ahern	2018-05-16	1	-8/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only consider forwarding packets if ttl in received packet is > 1 and decrement ttl before handing off to bpf_redirect_map. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	Merge branch 'bpf-sock-hashmap'	Daniel Borkmann	2018-05-16	17	-453/+1161
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	John Fastabend says: ==================== In the original sockmap implementation we got away with using an array similar to devmap. However, unlike devmap where an ifindex has a nice 1:1 function into the map we have found some use cases with sockets that need to be referenced using longer keys. This series adds support for a sockhash map reusing as much of the sockmap code as possible. I made the decision to add sockhash specific helpers vs trying to generalize the existing helpers because (a) they have sockmap in the name and (b) the keys are different types. I prefer to be explicit here rather than play type games or do something else tricky. To test this we duplicate all the sockmap testing except swap out the sockmap with a sockhash. v2: fix file stats and add v2 tag v3: move tool updates into test patch, move bpftool updates into its own patch, and fixup the test patch stats to catch the renamed file and provide only diffs ± on that. v4: Add documentation to UAPI bpf.h v5: Add documentation to tools UAPI bpf.h v6: 'git add' test_sockhash_kern.c which was previously missing but was not causing issues because of typo in test script, noticed by Daniel. After this the git format-patch -M option no longer tracks the rename of the test_sockmap_kern files for some reason. I guess the diff has exceeded some threshold. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| \| *	bpf: bpftool, support for sockhash	John Fastabend	2018-05-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the SOCKHASH map type to bpftools so that we get correct pretty printing. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| \| *	bpf: selftest additions for SOCKHASH	John Fastabend	2018-05-16	7	-349/+453
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This runs existing SOCKMAP tests with SOCKHASH map type. To do this we push programs into include file and build two BPF programs. One for SOCKHASH and one for SOCKMAP. We then run the entire test suite with each type. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| \| *	bpf: sockmap, add hash map support	John Fastabend	2018-05-15	7	-19/+611
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sockmap is currently backed by an array and enforces keys to be four bytes. This works well for many use cases and was originally modeled after devmap which also uses four bytes keys. However, this has become limiting in larger use cases where a hash would be more appropriate. For example users may want to use the 5-tuple of the socket as the lookup key. To support this add hash support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| \| *	bpf: sockmap, refactor sockmap routines to work with hashmap	John Fastabend	2018-05-15	4	-87/+98
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch only refactors the existing sockmap code. This will allow much of the psock initialization code path and bpf helper codes to work for both sockmap bpf map types that are backed by an array, the currently supported type, and the new hash backed bpf map type sockhash. Most the fallout comes from three changes, - Pushing bpf programs into an independent structure so we can use it from the htab struct in the next patch. - Generalizing helpers to use void *key instead of the hardcoded u32. - Instead of passing map/key through the metadata we now do the lookup inline. This avoids storing the key in the metadata which will be useful when keys can be longer than 4 bytes. We rename the sk pointers to sk_redir at this point as well to avoid any confusion between the current sk pointer and the redirect pointer sk_redir. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	selftests/bpf: make sure build-id is on	Alexei Starovoitov	2018-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	--build-id may not be a default linker config. Make sure it's used when linking urandom_read test program. Otherwise test_stacktrace_build_id[_nmi] tests will be failling. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	Merge branch 'convert-doc-to-rst'	Alexei Starovoitov	2018-05-14	5	-726/+897
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jesper Dangaard Brouer says: ==================== The kernel is moving files under Documentation to use the RST (reStructuredText) format and Sphinx [1]. This patchset converts the files under Documentation/bpf/ into RST format. The Sphinx integration is left as followup work. [1] https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html This patchset have been uploaded as branch bpf_doc10 on github[2], so reviewers can see how GitHub renders this. [2] https://github.com/netoptimizer/linux/tree/bpf_doc10/Documentation/bpf ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, doc: howto use/run the BPF selftests	Jesper Dangaard Brouer	2018-05-14	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I always forget howto run the BPF selftests. Thus, lets add that info to the QA document. Documentation was based on Cilium's documentation: http://cilium.readthedocs.io/en/latest/bpf/#verifying-the-setup Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, doc: convert bpf_devel_QA.rst to use RST formatting	Jesper Dangaard Brouer	2018-05-14	1	-379/+420
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Same story as bpf_design_QA.rst RST format conversion. Again thanks to Quentin Monnet <quentin.monnet@netronome.com> for fixes and patches that have been squashed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, doc: convert bpf_design_QA.rst to use RST formatting	Jesper Dangaard Brouer	2018-05-14	1	-79/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RST formatting is done such that that when rendered or converted to different formats, an automatic index with links are created to the subsections. Thus, the questions are created as sections (or subsections), in-order to get the wanted auto-generated FAQ/QA index. Special thanks to Quentin Monnet <quentin.monnet@netronome.com> who have reviewed and corrected both RST formatting and GitHub rendering issues in this file. Those commits have been squashed. I've manually tested that this also renders nicely if included as part of the kernel 'make htmldocs'. As the end-goal is for this to become more integrated with kernel-doc project/movement. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, doc: rename txt files to rst files	Jesper Dangaard Brouer	2018-05-14	3	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will cause them to get auto rendered, e.g. when viewing them on GitHub. Followup patches will correct the content to be RST compliant. Also adjust README.rst to point to the renamed files. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, doc: add basic README.rst file	Jesper Dangaard Brouer	2018-05-14	1	-0/+36
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A README.rst file in a directory have special meaning for sites like github, which auto renders the contents. Plus search engines like Google also index these README.rst files. Auto rendering allow us to use links, for (re)directing eBPF users to other places where docs live. The end-goal would be to direct users towards https://www.kernel.org/doc/html/latest but we haven't written the full docs yet, so we start out small and take this incrementally. This directory itself contains some useful docs, which can be linked to from the README.rst file (verified this works for github). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	Merge branch 'fix-samples'	Alexei Starovoitov	2018-05-14	44	-147/+116
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jakub Kicinski says: ==================== Following patches address build issues after recent move to libbpf. For out-of-tree builds we would see the following error: gcc: error: samples/bpf/../../tools/lib/bpf/libbpf.a: No such file or directory libbpf build system is now always invoked explicitly rather than relying on building single objects most of the time. We need to resolve the friction between Kbuild and tools/ build system. Mini-library called libbpf.h in samples is renamed to bpf_insn.h, using linux/filter.h seems not completely trivial since some samples get upset when order on include search path in changed. We do have to rename libbpf.h, however, because otherwise it's hard to reliably get to libbpf's header in out-of-tree builds. v2: - fix the build error harder (patch 3); - add patch 5 (make clang less noisy). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	samples: bpf: make the build less noisy	Jakub Kicinski	2018-05-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Building samples with clang ignores the $(Q) setting, always printing full command to the output. Make it less verbose. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	samples: bpf: move libbpf from object dependencies to libs	Jakub Kicinski	2018-05-14	1	-94/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make complains that it doesn't know how to make libbpf.a: scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doesn't match the target pattern Now that we have it as a dependency of the sources simply add libbpf.a to libraries not objects. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	samples: bpf: fix build after move to compiling full libbpf.a	Jakub Kicinski	2018-05-14	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many ways users may compile samples, some of them got broken by commit 5f9380572b4b ("samples: bpf: compile and link against full libbpf"). Improve path resolution and make libbpf building a dependency of source files to force its build. Samples should now again build with any of: cd samples/bpf; make make samples/bpf/ make -C samples/bpf cd samples/bpf; make O=builddir make samples/bpf/ O=builddir make -C samples/bpf O=builddir export KBUILD_OUTPUT=builddir make samples/bpf/ make -C samples/bpf Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf") Reported-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	samples: bpf: rename libbpf.h to bpf_insn.h	Jakub Kicinski	2018-05-14	8	-12/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The libbpf.h file in samples is clashing with libbpf's header. Since it only includes a subset of filter.h instruction helpers rename it to bpf_insn.h. Drop the unnecessary include of bpf/bpf.h. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	samples: bpf: include bpf/bpf.h instead of local libbpf.h	Jakub Kicinski	2018-05-14	35	-35/+34
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two files in the tree called libbpf.h which is becoming problematic. Most samples don't actually need the local libbpf.h they simply include it to get to bpf/bpf.h. Include bpf/bpf.h directly instead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	Merge branch 'bpf-jit-cleanups'	Alexei Starovoitov	2018-05-14	7	-98/+228
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Daniel Borkmann says: ==================== This series follows up mostly with with some minor cleanups on top of 'Move ld_abs/ld_ind to native BPF' as well as implements better 32/64 bit immediate load into register and saves tail call init on cBPF for the arm64 JIT. Last but not least we add a couple of test cases. For details please see individual patches. Thanks! v1 -> v2: - Minor fix in i64_i16_blocks() to remove 24 shift. - Added last two patches. - Added Acks from prior round. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf: add ld64 imm test cases	Daniel Borkmann	2018-05-14	2	-0/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add test cases where we combine semi-random imm values, mainly for testing JITs when they have different encoding options for 64 bit immediates in order to reduce resulting image size. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, arm64: save 4 bytes in prologue when ebpf insns came from cbpf	Daniel Borkmann	2018-05-14	1	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can trivially save 4 bytes in prologue for cBPF since tail calls can never be used from there. The register push/pop is pairwise, here, x25 (fp) and x26 (tcc), so no point in changing that, only reset to zero is not needed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, arm64: optimize 32/64 immediate emission	Daniel Borkmann	2018-05-14	1	-31/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve the JIT to emit 64 and 32 bit immediates, the current algorithm is not optimal and we often emit more instructions than actually needed. arm64 has movz, movn, movk variants but for the current 64 bit immediates we only use movz with a series of movk when needed. For example loading ffffffffffffabab emits the following 4 instructions in the JIT today: * movz: abab, shift: 0, result: 000000000000abab * movk: ffff, shift: 16, result: 00000000ffffabab * movk: ffff, shift: 32, result: 0000ffffffffabab * movk: ffff, shift: 48, result: ffffffffffffabab Whereas after the patch the same load only needs a single instruction: * movn: 5454, shift: 0, result: ffffffffffffabab Another example where two extra instructions can be saved: * movz: abab, shift: 0, result: 000000000000abab * movk: 1f2f, shift: 16, result: 000000001f2fabab * movk: ffff, shift: 32, result: 0000ffff1f2fabab * movk: ffff, shift: 48, result: ffffffff1f2fabab After the patch: * movn: e0d0, shift: 16, result: ffffffff1f2fffff * movk: abab, shift: 0, result: ffffffff1f2fabab Another example with movz, before: * movz: 0000, shift: 0, result: 0000000000000000 * movk: fea0, shift: 32, result: 0000fea000000000 After: * movz: fea0, shift: 32, result: 0000fea000000000 Moreover, reuse emit_a64_mov_i() for 32 bit immediates that are loaded via emit_a64_mov_i64() which is a similar optimization as done in 6fe8b9c1f41d ("bpf, x64: save several bytes by using mov over movabsq when possible"). On arm64, the latter allows to use a single instruction with movn due to zero extension where otherwise two would be needed. And last but not least add a missing optimization in emit_a64_mov_i() where movn is used but the subsequent movk not needed. With some of the Cilium programs in use, this shrinks the needed instructions by about three percent. Tested on Cavium ThunderX CN8890. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, arm64: save 4 bytes of unneeded stack space	Daniel Borkmann	2018-05-14	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow-up to 816d9ef32a8b ("bpf, arm64: remove ld_abs/ld_ind") in that the extra 4 byte JIT scratchpad is not needed anymore since it was in ld_abs/ld_ind as stack buffer for bpf_load_pointer(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, arm32: save 4 bytes of unneeded stack space	Daniel Borkmann	2018-05-14	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The extra skb_copy_bits() buffer is not used anymore, therefore remove the extra 4 byte stack space requirement. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, x64: clean up retpoline emission slightly	Daniel Borkmann	2018-05-14	1	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the RETPOLINE_{RA,ED}X_BPF_JIT() a bit more readable by cleaning up the macro, aligning comments and spacing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, sparc: remove unused variable	Daniel Borkmann	2018-05-14	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since fe83963b7c38 ("bpf, sparc64: remove ld_abs/ld_ind") it's not used anymore therefore remove it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| \| *	bpf, mips: remove unused function	Daniel Borkmann	2018-05-14	1	-26/+0
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ool_skb_header_pointer() and size_to_len() is unused same as tmp_offset, therefore remove all of them. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	samples/bpf: xdp_monitor, accept short options	Prashant Bhole	2018-05-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updated optstring parameter for getopt_long() to accept short options. Also updated usage() function. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>