summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* flow_dissector: Fix potential use-after-free on BPF_PROG_DETACHJakub Sitnicki2019-08-241-1/+1
| | | | | | | | | | | | | | | | | | Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to free the program once a grace period has elapsed. The callback can run together with new RCU readers that started after the last grace period. New RCU readers can potentially see the "old" to-be-freed or already-freed pointer to the program object before the RCU update-side NULLs it. Reorder the operations so that the RCU update-side resets the protected pointer before the end of the grace period after which the program will be freed. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Reported-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* selftests/bpf: install files test_xdp_vlan.shAnders Roxell2019-08-211-1/+2
| | | | | | | | | | | | | | | | | When ./test_xdp_vlan_mode_generic.sh runs it complains that it can't find file test_xdp_vlan.sh. # selftests: bpf: test_xdp_vlan_mode_generic.sh # ./test_xdp_vlan_mode_generic.sh: line 9: ./test_xdp_vlan.sh: No such file or directory Rework so that test_xdp_vlan.sh gets installed, added to the variable TEST_PROGS_EXTENDED. Fixes: d35661fcf95d ("selftests/bpf: add wrapper scripts for test_xdp_vlan.sh") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* selftests/bpf: add config fragment BPF_JITAnders Roxell2019-08-211-0/+1
| | | | | | | | | | | | | | When running test_kmod.sh the following shows up # sysctl cannot stat /proc/sys/net/core/bpf_jit_enable No such file or directory cannot: stat_/proc/sys/net/core/bpf_jit_enable # # sysctl cannot stat /proc/sys/net/core/bpf_jit_harden No such file or directory cannot: stat_/proc/sys/net/core/bpf_jit_harden # Rework to enable CONFIG_BPF_JIT to solve "No such file or directory" Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* selftests/bpf: fix test_btf_dump with O=Ilya Leoshkevich2019-08-212-0/+10
| | | | | | | | | | | | | test_btf_dump fails when run with O=, because it needs to access source files and assumes they live in ./progs/, which is not the case in this scenario. Fix by instructing kselftest to copy btf_dump_test_case_*.c files to the test directory. Since kselftest does not preserve directory structure, adjust the test to look in ./progs/ and then in ./. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* selftests/bpf: fix test_cgroup_storage on s390Ilya Leoshkevich2019-08-211-3/+3
| | | | | | | | | | | | | | | test_cgroup_storage fails on s390 with an assertion failure: packets are dropped when they shouldn't. The problem is that BPF_DW packet count is accessed as BPF_W with an offset of 0, which is not correct on big-endian machines. Since the point of this test is not to verify narrow loads/stores, simply use BPF_DW when working with packet counts. Fixes: 68cfa3ac6b8d ("selftests/bpf: add a cgroup storage test") Fixes: 919646d2a3a9 ("selftests/bpf: extend the storage test to test per-cpu cgroup storage") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* xdp: unpin xdp umem pages in error pathIvan Khoronzhuk2019-08-201-1/+3
| | | | | | | | | Fix mem leak caused by missed unpin routine for umem pages. Fixes: 8aef7340ae9695 ("xsk: introduce xdp_umem_page") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* tools: bpftool: close prog FD before exit on showing a single programQuentin Monnet2019-08-151-1/+3
| | | | | | | | | | | | When showing metadata about a single program by invoking "bpftool prog show PROG", the file descriptor referring to the program is not closed before returning from the function. Let's close it. Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* selftests/bpf: fix "bind{4, 6} deny specific IP & port" on s390Ilya Leoshkevich2019-08-141-2/+5
| | | | | | | | | | | | | | | "bind4 allow specific IP & port" and "bind6 deny specific IP & port" fail on s390 because of endianness issue: the 4 IP address bytes are loaded as a word and compared with a constant, but the value of this constant should be different on big- and little- endian machines, which is not the case right now. Use __bpf_constant_ntohl to generate proper value based on machine endianness. Fixes: 1d436885b23b ("selftests/bpf: Selftest for sys_bind post-hooks.") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* s390/bpf: use 32-bit index for tail callsIlya Leoshkevich2019-08-131-4/+6
| | | | | | | | | | | | | | | | | | | | "p runtime/jit: pass > 32bit index to tail_call" fails when bpf_jit_enable=1, because the tail call is not executed. This in turn is because the generated code assumes index is 64-bit, while it must be 32-bit, and as a result prog array bounds check fails, while it should pass. Even if bounds check would have passed, the code that follows uses 64-bit index to compute prog array offset. Fix by using clrj instead of clgrj for comparing index with array size, and also by using llgfr for truncating index to 32 bits before using it to compute prog array offset. Fixes: 6651ee070b31 ("s390/bpf: implement bpf_tail_call() helper") Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* s390/bpf: fix lcgr instruction encodingIlya Leoshkevich2019-08-121-1/+1
| | | | | | | | | | | | | | | | "masking, test in bounds 3" fails on s390, because BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0) ignores the top 32 bits of BPF_REG_2. The reason is that JIT emits lcgfr instead of lcgr. The associated comment indicates that the code was intended to emit lcgr in the first place, it's just that the wrong opcode was used. Fix by using the correct opcode. Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rxNathan Chancellor2019-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang warns: drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^ ~~~~~~~~~~~~ drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a bitwise operation if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^~ & drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to silence this warning if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ~^~~~~~~~~~~~~~~ 1 warning generated. Explicitly check that NET_IP_ALIGN is not zero, which matches how this is checked in other parts of the tree. Because NET_IP_ALIGN is a build time constant, this check will be constant folded away during optimization. Fixes: 82a9928db560 ("tc35815: Enable StripCRC feature") Link: https://github.com/ClangBuiltLinux/linux/issues/608 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: initialise addr_trail_end when setting node addressesChris Packham2019-08-111-0/+1
| | | | | | | | | | | We set the field 'addr_trial_end' to 'jiffies', instead of the current value 0, at the moment the node address is initialized. This guarantees we don't inadvertently enter an address trial period when the node address is explicitly set by the user. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: Check existence of .port_mdb_add callback before calling itChen-Yu Tsai2019-08-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dsa framework has optional .port_mdb_{prepare,add,del} callback fields for drivers to handle multicast database entries. When adding an entry, the framework goes through a prepare phase, then a commit phase. Drivers not providing these callbacks should be detected in the prepare phase. DSA core may still bypass the bridge layer and call the dsa_port_mdb_add function directly with no prepare phase or no switchdev trans object, and the framework ends up calling an undefined .port_mdb_add callback. This results in a NULL pointer dereference, as shown in the log below. The other functions seem to be properly guarded. Do the same for .port_mdb_add in dsa_switch_mdb_add_bitmap() as well. 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = (ptrval) [00000000] *pgd=00000000 Internal error: Oops: 80000005 [#1] SMP ARM Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211 CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1 Hardware name: Allwinner sun7i (A20) Family Workqueue: events switchdev_deferred_process_work PC is at 0x0 LR is at dsa_switch_event+0x570/0x620 pc : [<00000000>] lr : [<c08533ec>] psr: 80070013 sp : ee871db8 ip : 00000000 fp : ee98d0a4 r10: 0000000c r9 : 00000008 r8 : ee89f710 r7 : ee98d040 r6 : ee98d088 r5 : c0f04c48 r4 : ee98d04c r3 : 00000000 r2 : ee89f710 r1 : 00000008 r0 : ee98d040 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6deb406a DAC: 00000051 Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval)) Stack: (0xee871db8 to 0xee872000) 1da0: ee871e14 103ace2d 1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000 1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0 1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000 1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff 1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4 1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500 1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000 1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8 1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122 1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec 1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc 1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00 1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000 1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4 1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000 1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20) [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74) [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4) [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14) [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4) [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68) [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8) [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104) [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c) [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104) [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14) [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c) [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc) [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150) [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0xee871fb0 to 0xee871ff8) 1fa0: 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: bad PC value ---[ end trace 1292c61abd17b130 ]--- [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) corresponds to $ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec linux/net/dsa/switch.c:156 linux/net/dsa/switch.c:178 linux/net/dsa/switch.c:328 Fixes: e6db98db8a95 ("net: dsa: add switch mdb bitmap functions") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_ptp: Keep unmatched entries in a linked listPetr Machata2019-08-111-83/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To identify timestamps for matching with their packets, Spectrum-1 uses a five-tuple of (port, direction, domain number, message type, sequence ID). If there are several clients from the same domain behind a single port sending Delay_Req's, the only thing differentiating these packets, as far as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between individual clients be similar, conflicts may arise. That is not a problem to hardware, which will simply deliver timestamps on a first comes, first served basis. However the driver uses a simple hash table to store the unmatched pieces. When a new conflicting piece arrives, it pushes out the previously stored one, which if it is a packet, is delivered without timestamp. Later on as the corresponding timestamps arrive, the first one is mismatched to the second packet, and the second one is never matched and eventually is GCd. To correct this issue, instead of using a simple rhashtable, use rhltable to keep the unmatched entries. Previously, a found unmatched entry would always be removed from the hash table. That is not the case anymore--an incompatible entry is left in the hash table. Therefore removal from the hash table cannot be used to confirm the validity of the looked-up pointer, instead the lookup would simply need to be redone. Therefore move it inside the critical section. This simplifies a lot of the code. Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: nps_enet: Fix function names in doc commentsJonathan Neuschäfer2019-08-111-2/+2
| | | | | | | | Adjust the function names in two doc comments to match the corresponding functions. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* rxrpc: Fix local refcountingDavid Howells2019-08-111-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called on an unbound socket on which rx->local is not yet set. The following reproduced (includes omitted): int main(void) { socket(AF_RXRPC, SOCK_DGRAM, AF_INET); return 0; } causes the following oops to occur: BUG: kernel NULL pointer dereference, address: 0000000000000010 ... RIP: 0010:rxrpc_unuse_local+0x8/0x1b ... Call Trace: rxrpc_release+0x2b5/0x338 __sock_release+0x37/0xa1 sock_close+0x14/0x17 __fput+0x115/0x1e9 task_work_run+0x72/0x98 do_exit+0x51b/0xa7a ? __context_tracking_exit+0x4e/0x10e do_group_exit+0xab/0xab __x64_sys_exit_group+0x14/0x17 do_syscall_64+0x89/0x1d4 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netdevsim: Restore per-network namespace accounting for fib entriesDavid Ahern2019-08-114-86/+98
| | | | | | | | | Prior to the commit in the fixes tag, the resource controller in netdevsim tracked fib entries and rules per network namespace. Restore that behavior. Fixes: 5fc494225c1e ("netdevsim: create devlink instance per netdevsim instance") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-08-114-21/+57
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-08-11 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei. 2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii. 3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak. 4) Fix bpftool to dump an error if pinning fails, from Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'bpf-bpftool-pinning-error-msg'Daniel Borkmann2019-08-091-2/+6
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jakub Kicinski says: ==================== First make sure we don't use "prog" in error messages because the pinning operation could be performed on a map. Second add back missing error message if pin syscall failed. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * tools: bpftool: add error message on pin failureJakub Kicinski2019-08-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No error message is currently printed if the pin syscall itself fails. It got lost in the loadall refactoring. Fixes: 77380998d91d ("bpftool: add loadall command") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * tools: bpftool: fix error message (prog -> object)Jakub Kicinski2019-08-091-1/+1
| |/ | | | | | | | | | | | | | | | | | | Change an error message to work for any object being pinned not just programs. Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: tests for jmp to 1st insnAlexei Starovoitov2019-08-011-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | Add 2 tests that check JIT code generation to jumps to 1st insn. 1st test is similar to syzbot reproducer. The backwards branch is never taken at runtime. 2nd test has branch to 1st insn that executes. The test is written as two bpf functions, since it's not possible to construct valid single bpf program that jumps to 1st insn. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com>
| * bpf: fix x64 JIT code generation for jmp to 1st insnAlexei Starovoitov2019-08-011-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduction of bounded loops exposed old bug in x64 JIT. JIT maintains the array of offsets to the end of all instructions to compute jmp offsets. addrs[0] - offset of the end of the 1st insn (that includes prologue). addrs[1] - offset of the end of the 2nd insn. JIT didn't keep the offset of the beginning of the 1st insn, since classic BPF didn't have backward jumps and valid extended BPF couldn't have a branch to 1st insn, because it didn't allow loops. With bounded loops it's possible to construct a valid program that jumps backwards to the 1st insn. Fix JIT by computing: addrs[0] - offset of the end of prologue == start of the 1st insn. addrs[1] - offset of the end of 1st insn. v1->v2: - Yonghong noticed a bug in jit linfo. Fix it by passing 'addrs + 1' to bpf_prog_fill_jited_linfo(), since it expects insn_to_jit_off array to be offsets to last byte. Reported-by: syzbot+35101610ff3e83119b1b@syzkaller.appspotmail.com Fixes: 2589726d12a1 ("bpf: introduce bounded loops") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com>
| * libbpf: set BTF FD for prog only when there is supported .BTF.ext dataAndrii Nakryiko2019-08-011-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5d01ab7bac46 ("libbpf: fix erroneous multi-closing of BTF FD") introduced backwards-compatibility issue, manifesting itself as -E2BIG error returned on program load due to unknown non-zero btf_fd attribute value for BPF_PROG_LOAD sys_bpf() sub-command. This patch fixes bug by ensuring that we only ever associate BTF FD with program if there is a BTF.ext data that was successfully loaded into kernel, which automatically means kernel supports func_info/line_info and associated BTF FD for progs (checked and ensured also by BTF sanitization code). Fixes: 5d01ab7bac46 ("libbpf: fix erroneous multi-closing of BTF FD") Reported-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * libbpf : make libbpf_num_possible_cpus function thread safeTakshak Chahande2019-07-311-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having static variable `cpus` in libbpf_num_possible_cpus function without guarding it with mutex makes this function thread-unsafe. If multiple threads accessing this function, in the current form; it leads to incrementing the static variable value `cpus` in the multiple of total available CPUs. Used local stack variable to calculate the number of possible CPUs and then updated the static variable using WRITE_ONCE(). Changes since v1: * added stack variable to calculate cpus * serialized static variable update using WRITE_ONCE() * fixed Fixes tag Fixes: 6446b3155521 ("bpf: add a new API libbpf_num_possible_cpus()") Signed-off-by: Takshak Chahande <ctakshak@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * libbpf: fix erroneous multi-closing of BTF FDAndrii Nakryiko2019-07-261-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Libbpf stores associated BTF FD per each instance of bpf_program. When program is unloaded, that FD is closed. This is wrong, because leads to a race and possibly closing of unrelated files, if application simultaneously opens new files while bpf_programs are unloaded. It's also unnecessary, because struct btf "owns" that FD, and btf__free(), called from bpf_object__close() will close it. Thus the fix is to never have per-program BTF FD and fetch it from obj->btf, when necessary. Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections") Reported-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | net/tls: swap sk_write_space on closeJakub Kicinski2019-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Now that we swap the original proto and clear the ULP pointer on close we have to make sure no callback will try to access the freed state. sk_write_space is not part of sk_prot, remember to swap it. Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | hv_netvsc: Fix a warning of suspicious RCU usageDexuan Cui2019-08-091-2/+7
| | | | | | | | | | | | | | | | | | This fixes a warning of "suspicious rcu_dereference_check() usage" when nload runs. Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'mlx5-fixes-2019-08-08' of ↵David S. Miller2019-08-0913-109/+101
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-08-08 This series introduces some fixes to mlx5 driver. Highlights: 1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs. 2) From Aya, Fixes to mlx5 tx devlink health reporter. 3) From Maxim, aRFs parsing to use flow dissector to avoid relying on invalid skb fields. Please pull and let me know if there is any problem. For -stable v4.3 ('net/mlx5e: Only support tx/rx pause setting for port owner') For -stable v4.9 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS') For -stable v5.1 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery') ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter') ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off') Note: when merged with net-next this minor conflict will pop up: ++<<<<<<< (net-next) + if (is_eswitch_flow) { + flow->esw_attr->match_level = match_level; + flow->esw_attr->tunnel_match_level = tunnel_match_level; ++======= + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { + flow->esw_attr->inner_match_level = inner_match_level; + flow->esw_attr->outer_match_level = outer_match_level; ++>>>>>>> (net) To resolve, use hunks from net (2nd) and replace: if (flow->flags & MLX5E_TC_FLOW_ESWITCH) with if (is_eswitch_flow) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/mlx5e: Remove redundant check in CQE recovery flow of tx reporterAya Levin2019-08-081-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove check of recovery bit, in the beginning of the CQE recovery function. This test is already performed right before the reporter is invoked, when CQE error is detected. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: Fix error flow of CQE recovery on tx reporterAya Levin2019-08-082-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CQE recovery function begins with test and set of recovery bit. Add an error flow which ensures clearing of this bit when leaving the recovery function, to allow further recoveries to take place. This allows removal of clearing recovery bit on sq activate. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: Fix false negative indication on tx reporter CQE recoveryAya Levin2019-08-081-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove wrong error return value when SQ is not in error state. CQE recovery on TX reporter queries the sq state. If the sq is not in error state, the sq is either in ready or reset state. Ready state is good state which doesn't require recovery and reset state is a temporal state which ends in ready state. With this patch, CQE recovery in this scenario is successful. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: kTLS, Fix tisn field placementTariq Toukan2019-08-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Shift the tisn field in the WQE control segment, per the HW specification. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: kTLS, Fix tisn field nameTariq Toukan2019-08-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: kTLS, Fix progress params context WQE layoutTariq Toukan2019-08-084-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TLS progress params context WQE should not include an Eth segment, drop it. In addition, align the tls_progress_params layout with the HW specification document: - fix the tisn field name. - remove the valid bit. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: kTLS, Fix wrong TIS opmod constantsTariq Toukan2019-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fix the used constants for TLS TIS opmods, per the HW specification. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: crypto, Fix wrong offset in encryption key commandTariq Toukan2019-08-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the 128b key offset in key encryption key creation command, per the HW specification. Fixes: 45d3b55dc665 ("net/mlx5: Add crypto library to support create/destroy encryption key") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg offMohamad Heib2019-08-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting speed to 56GBASE is allowed only with auto-negotiation enabled. This patch prevent setting speed to 56GBASE when auto-negotiation disabled. Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: Only support tx/rx pause setting for port ownerHuy Nguyen2019-08-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only support changing tx/rx pause frame setting if the net device is the vport group manager. Fixes: 3c2d18ef22df ("net/mlx5e: Support ethtool get/set_pauseparam") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Support inner header match criteria for non decap flow actionHuy Nguyen2019-08-083-21/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have an issue that OVS application creates an offloaded drop rule that drops VXLAN traffic with both inner and outer header match criteria. mlx5_core driver detects correctly the inner and outer header match criteria but does not enable the inner header match criteria due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that only decap rule needs inner header criteria. Solution: Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add two new members: inner_match_level and outer_match_level. inner/outer_match_level is set to NONE if the inner/outer match criteria is not specified in the tc rule creation request. The decap assumption is removed and the code just needs to check for inner/outer_match_level to enable the corresponding bit in firmware's match_criteria_enable value. Fixes: 6363651d6dd7 ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5e: Use flow keys dissector to parse packets for ARFSMaxim Mikityanskiy2019-08-081-63/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current ARFS code relies on certain fields to be set in the SKB (e.g. transport_header) and extracts IP addresses and ports by custom code that parses the packet. The necessary SKB fields, however, are not always set at that point, which leads to an out-of-bounds access. Use skb_flow_dissect_flow_keys() to get the necessary information reliably, fix the out-of-bounds access and reuse the code. Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | ixgbe: fix possible deadlock in ixgbe_service_task()Taehee Yoo2019-08-091-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ixgbe_service_task() calls unregister_netdev() under rtnl_lock(). But unregister_netdev() internally calls rtnl_lock(). So deadlock would occur. Fixes: 59dd45d550c5 ("ixgbe: firmware recovery mode") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'Fix-collisions-in-socket-cookie-generation'David S. Miller2019-08-094-8/+11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== Fix collisions in socket cookie generation This change makes the socket cookie generator as a global counter instead of per netns in order to fix cookie collisions for BPF use cases we ran into. See main patch #1 for more details. Given the change is small/trivial and fixes an issue we're seeing my preference would be net tree (though it cleanly applies to net-next as well). Went for net tree instead of bpf tree here given the main change is in net/core/sock_diag.c, but either way would be fine with me. v1 -> v2: - Fix up commit description in patch #1, thanks Eric! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bpf: sync bpf.h to tools infrastructureDaniel Borkmann2019-08-091-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull in updates in BPF helper function description. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sock: make cookie generation global instead of per netnsDaniel Borkmann2019-08-093-4/+4
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generating and retrieving socket cookies are a useful feature that is exposed to BPF for various program types through bpf_get_socket_cookie() helper. The fact that the cookie counter is per netns is quite a limitation for BPF in practice in particular for programs in host namespace that use socket cookies as part of a map lookup key since they will be causing socket cookie collisions e.g. when attached to BPF cgroup hooks or cls_bpf on tc egress in host namespace handling container traffic from veth or ipvlan devices with peer in different netns. Change the counter to be global instead. Socket cookie consumers must assume the value as opqaue in any case. Not every socket must have a cookie generated and knowledge of the counter value itself does not provide much value either way hence conversion to global is fine. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Martynas Pumputis <m@lambda.lt> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge tag 'rxrpc-fixes-20190809' of ↵David S. Miller2019-08-097-83/+100
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== Here's a couple of fixes for rxrpc: (1) Fix refcounting of the local endpoint. (2) Don't calculate or report packet skew information. This has been obsolete since AFS 3.1 and so is a waste of resources. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rxrpc: Don't bother generating maxSkew in the ACK packetDavid Howells2019-08-096-44/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't bother generating maxSkew in the ACK packet as it has been obsolete since AFS 3.1. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
| * | | rxrpc: Fix local endpoint refcountingDavid Howells2019-08-094-39/+72
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The object lifetime management on the rxrpc_local struct is broken in that the rxrpc_local_processor() function is expected to clean up and remove an object - but it may get requeued by packets coming in on the backing UDP socket once it starts running. This may result in the assertion in rxrpc_local_rcu() firing because the memory has been scheduled for RCU destruction whilst still queued: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:468! Note that if the processor comes around before the RCU free function, it will just do nothing because ->dead is true. Fix this by adding a separate refcount to count active users of the endpoint that causes the endpoint to be destroyed when it reaches 0. The original refcount can then be used to refcount objects through the work processor and cause the memory to be rcu freed when that reaches 0. Fixes: 4f95dd78a77e ("rxrpc: Rework local endpoint management") Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
* | | net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ ↵Fuqian Huang2019-08-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | context As spin_unlock_irq will enable interrupts. Function tsi108_stat_carry is called from interrupt handler tsi108_irq. Interrupts are enabled in interrupt handler. Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq in IRQ context to avoid this. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | team: Add vlan tx offload to hw_enc_featuresYueHaibing2019-08-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should also enable team's vlan tx offload in hw_enc_features, pass the vlan packets to the slave devices with vlan tci, let the slave handle vlan tunneling offload implementation. Fixes: 3268e5cb494d ("team: Advertise tunneling offload features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>