linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	bpf, sockmap: Allow inserting listening TCP sockets into sockmap	Jakub Sitnicki	2020-02-21	2	-20/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order for sockmap/sockhash types to become generic collections for storing TCP sockets we need to loosen the checks during map update, while tightening the checks in redirect helpers. Currently sock{map,hash} require the TCP socket to be in established state, which prevents inserting listening sockets. Change the update pre-checks so the socket can also be in listening state. Since it doesn't make sense to redirect with sock{map,hash} to listening sockets, add appropriate socket state checks to BPF redirect helpers too. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-5-jakub@cloudflare.com
*	tcp_bpf: Don't let child socket inherit parent protocol ops on copy	Jakub Sitnicki	2020-02-21	3	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prepare for cloning listening sockets that have their protocol callbacks overridden by sk_msg. Child sockets must not inherit parent callbacks that access state stored in sk_user_data owned by the parent. Restore the child socket protocol callbacks before it gets hashed and any of the callbacks can get invoked. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-4-jakub@cloudflare.com
*	net, sk_msg: Clear sk_user_data pointer on clone if tagged	Jakub Sitnicki	2020-02-21	3	-3/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sk_user_data can hold a pointer to an object that is not intended to be shared between the parent socket and the child that gets a pointer copy on clone. This is the case when sk_user_data points at reference-counted object, like struct sk_psock. One way to resolve it is to tag the pointer with a no-copy flag by repurposing its lowest bit. Based on the bit-flag value we clear the child sk_user_data pointer after cloning the parent socket. The no-copy flag is stored in the pointer itself as opposed to externally, say in socket flags, to guarantee that the pointer and the flag are copied from parent to child socket in an atomic fashion. Parent socket state is subject to change while copying, we don't hold any locks at that time. This approach relies on an assumption that sk_user_data holds a pointer to an object aligned at least 2 bytes. A manual audit of existing users of rcu_dereference_sk_user_data helper confirms our assumption. Also, an RCU-protected sk_user_data is not likely to hold a pointer to a char value or a pathological case of "struct { char c; }". To be safe, warn when the flag-bit is set when setting sk_user_data to catch any future misuses. It is worth considering why clearing sk_user_data unconditionally is not an option. There exist users, DRBD, NVMe, and Xen drivers being among them, that rely on the pointer being copied when cloning the listening socket. Potentially we could distinguish these users by checking if the listening socket has been created in kernel-space via sock_create_kern, and hence has sk_kern_sock flag set. However, this is not the case for NVMe and Xen drivers, which create sockets without marking them as belonging to the kernel. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-3-jakub@cloudflare.com
*	net, sk_msg: Annotate lockless access to sk_prot on clone	Jakub Sitnicki	2020-02-21	5	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sk_msg and ULP frameworks override protocol callbacks pointer in sk->sk_prot, while tcp accesses it locklessly when cloning the listening socket, that is with neither sk_lock nor sk_callback_lock held. Once we enable use of listening sockets with sockmap (and hence sk_msg), there will be shared access to sk->sk_prot if socket is getting cloned while being inserted/deleted to/from the sockmap from another CPU: Read side: tcp_v4_rcv sk = __inet_lookup_skb(...) tcp_check_req(sk) inet_csk(sk)->icsk_af_ops->syn_recv_sock tcp_v4_syn_recv_sock tcp_create_openreq_child inet_csk_clone_lock sk_clone_lock READ_ONCE(sk->sk_prot) Write side: sock_map_ops->map_update_elem sock_map_update_elem sock_map_update_common sock_map_link_no_progs tcp_bpf_init tcp_bpf_update_sk_prot sk_psock_update_proto WRITE_ONCE(sk->sk_prot, ops) sock_map_ops->map_delete_elem sock_map_delete_elem __sock_map_delete sock_map_unref sk_psock_put sk_psock_drop sk_psock_restore_proto tcp_update_ulp WRITE_ONCE(sk->sk_prot, proto) Mark the shared access with READ_ONCE/WRITE_ONCE annotations. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
*	docs/bpf: Update bpf development Q/A file	Yonghong Song	2020-02-20	1	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	bpf now has its own mailing list bpf@vger.kernel.org. Update the bpf_devel_QA.rst file to reflect this. Also llvm has switch to github with llvm and clang in the same repo https://github.com/llvm/llvm-project.git. Update the QA file with newer build instructions. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200221004354.930952-1-yhs@fb.com
*	selftests/bpf: Fix trampoline_count clean up logic	Andrii Nakryiko	2020-02-20	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \|	Libbpf's Travis CI tests caught this issue. Ensure bpf_link and bpf_object clean up is performed correctly. Fixes: d633d57902a5 ("selftest/bpf: Add test for allowed trampolines count") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20200220230546.769250-1-andriin@fb.com
*	Merge branch 'set_attach_target'	Alexei Starovoitov	2020-02-20	5	-9/+54
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eelco Chaudron says: ==================== Currently when you want to attach a trace program to a bpf program the section name needs to match the tracepoint/function semantics. However the addition of the bpf_program__set_attach_target() API allows you to specify the tracepoint/function dynamically. The call flow would look something like this: xdp_fd = bpf_prog_get_fd_by_id(id); trace_obj = bpf_object__open_file("func.o", NULL); prog = bpf_object__find_program_by_title(trace_obj, "fentry/myfunc"); bpf_program__set_expected_attach_type(prog, BPF_TRACE_FENTRY); bpf_program__set_attach_target(prog, xdp_fd, "xdpfilt_blk_all"); bpf_object__load(trace_obj) v1 -> v2: Remove requirement for attach type hint in API v2 -> v3: Moved common warning to __find_vmlinux_btf_id, requested by Andrii Updated the xdp_bpf2bpf test to use this new API v3 -> v4: Split up patch, update libbpf.map version v4 -> v5: Fix return code, and prog assignment in test case ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	selftests/bpf: Update xdp_bpf2bpf test to use new set_attach_target API	Eelco Chaudron	2020-02-20	2	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the new bpf_program__set_attach_target() API in the xdp_bpf2bpf selftest so it can be referenced as an example on how to use it. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/158220520562.127661.14289388017034825841.stgit@xdp-tutorial
\| *	libbpf: Add support for dynamic program attach target	Eelco Chaudron	2020-02-20	3	-4/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when you want to attach a trace program to a bpf program the section name needs to match the tracepoint/function semantics. However the addition of the bpf_program__set_attach_target() API allows you to specify the tracepoint/function dynamically. The call flow would look something like this: xdp_fd = bpf_prog_get_fd_by_id(id); trace_obj = bpf_object__open_file("func.o", NULL); prog = bpf_object__find_program_by_title(trace_obj, "fentry/myfunc"); bpf_program__set_expected_attach_type(prog, BPF_TRACE_FENTRY); bpf_program__set_attach_target(prog, xdp_fd, "xdpfilt_blk_all"); bpf_object__load(trace_obj) Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158220519486.127661.7964708960649051384.stgit@xdp-tutorial
\| *	libbpf: Bump libpf current version to v0.0.8	Eelco Chaudron	2020-02-20	1	-0/+3
\|/ \| \| \| \| \| \| \| \|	New development cycles starts, bump to v0.0.8. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158220518424.127661.8278643006567775528.stgit@xdp-tutorial
*	libbpf: Relax check whether BTF is mandatory	Andrii Nakryiko	2020-02-20	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If BPF program is using BTF-defined maps, BTF is required only for libbpf itself to process map definitions. If after that BTF fails to be loaded into kernel (e.g., if it doesn't support BTF at all), this shouldn't prevent valid BPF program from loading. Existing retry-without-BTF logic for creating maps will succeed to create such maps without any problems. So, presence of .maps section shouldn't make BTF required for kernel. Update the check accordingly. Validated by ensuring simple BPF program with BTF-defined maps is still loaded on old kernel without BTF support and map is correctly parsed and created. Fixes: abd29c931459 ("libbpf: allow specifying map definitions using BTF") Reported-by: Julia Kartseva <hex@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200220062635.1497872-1-andriin@fb.com
*	selftests/bpf: Fix build of sockmap_ktls.c	Alexei Starovoitov	2020-02-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The selftests fails to build with: tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c: In function ‘test_sockmap_ktls_disconnect_after_delete’: tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c:72:37: error: ‘TCP_ULP’ undeclared (first use in this function) 72 \| err = setsockopt(cli, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); \| ^~~~~~~ Similar to commit that fixes build of sockmap_basic.c on systems with old /usr/include fix the build of sockmap_ktls.c Fixes: d1ba1204f2ee ("selftests/bpf: Test unhashing kTLS socket after removing from map") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200219205514.3353788-1-ast@kernel.org
*	selftests/bpf: Change llvm flag -mcpu=probe to -mcpu=v3	Yonghong Song	2020-02-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The latest llvm supports cpu version v3, which is cpu version v1 plus some additional 64bit jmp insns and 32bit jmp insn support. In selftests/bpf Makefile, the llvm flag -mcpu=probe did runtime probe into the host system. Depending on compilation environments, it is possible that runtime probe may fail, e.g., due to memlock issue. This will cause generated code with cpu version v1. This may cause confusion as the same compiler and the same C code generates different byte codes in different environment. Let us change the llvm flag -mcpu=probe to -mcpu=v3 so the generated code will be the same regardless of the compilation environment. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200219004236.2291125-1-yhs@fb.com
*	Merge branch 'bpf_read_branch_records'	Alexei Starovoitov	2020-02-19	5	-2/+309
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Daniel Xu says: ==================== Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Changes in v8: - Use globals instead of perf buffer - Call test_perf_branches__detach() before destroying skeleton - Fix typo in docs Changes in v7: - Const-ify and static-ify local var - Documentation formatting Changes in v6: - Move #ifdef a little to avoid unused variable warnings on !x86 - Test negative condition in selftest (-EINVAL on improperly configured perf event) - Skip positive condition selftest on setups that don't support branch records Changes in v5: - Rename bpf_perf_prog_read_branches() -> bpf_read_branch_records() - Rename BPF_F_GET_BR_SIZE -> BPF_F_GET_BRANCH_RECORDS_SIZE - Squash tools/ bpf.h sync into selftest commit Changes in v4: - Add BPF_F_GET_BR_SIZE flag - Return -ENOENT on unsupported architectures - Only accept initialized memory in helper - Check buffer size is multiple of sizeof(struct perf_branch_entry) - Use bpf skeleton in selftest - Add commit messages - Spelling and formatting Changes in v3: - Document filling unused buffer with zero - Formatting fixes - Rebase Changes in v2: - Change to a bpf helper instead of context access - Avoid mentioning Intel specific things ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	selftests/bpf: Add bpf_read_branch_records() selftest	Daniel Xu	2020-02-19	3	-1/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a selftest to test: * default bpf_read_branch_records() behavior * BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior * error path on non branch record perf events * using helper to write to stack * using helper to write to global On host with hardware counter support: # ./test_progs -t perf_branches #27/1 perf_branches_hw:OK #27/2 perf_branches_no_hw:OK #27 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED On host without hardware counter support (VM): # ./test_progs -t perf_branches #27/1 perf_branches_hw:OK #27/2 perf_branches_no_hw:OK #27 perf_branches:OK Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED Also sync tools/include/uapi/linux/bpf.h. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-3-dxu@dxuuu.xyz
\| *	bpf: Add bpf_read_branch_records() helper	Daniel Xu	2020-02-19	2	-1/+65
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
*	Merge branch 'bpf-skmsg-simplify-restore'	Daniel Borkmann	2020-02-19	2	-16/+124
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jakub Sitnicki says: ==================== This series has been split out from "Extend SOCKMAP to store listening sockets" [0]. I think it stands on its own, and makes the latter series smaller, which will make the review easier, hopefully. The essence is that we don't need to do a complicated dance in sk_psock_restore_proto, if we agree that the contract with tcp_update_ulp is to restore callbacks even when the socket doesn't use ULP. This is what tcp_update_ulp currently does, and we just make use of it. Series is accompanied by a test for a particularly tricky case of restoring callbacks when we have both sockmap and tls callbacks configured in sk->sk_prot. [0] https://lore.kernel.org/bpf/20200127131057.150941-1-jakub@cloudflare.com/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	selftests/bpf: Test unhashing kTLS socket after removing from map	Jakub Sitnicki	2020-02-19	1	-0/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a TCP socket gets inserted into a sockmap, its sk_prot callbacks get replaced with tcp_bpf callbacks built from regular tcp callbacks. If TLS gets enabled on the same socket, sk_prot callbacks get replaced once again, this time with kTLS callbacks built from tcp_bpf callbacks. Now, we allow removing a socket from a sockmap that has kTLS enabled. After removal, socket remains with kTLS configured. This is where things things get tricky. Since the socket has a set of sk_prot callbacks that are a mix of kTLS and tcp_bpf callbacks, we need to restore just the tcp_bpf callbacks to the original ones. At the moment, it comes down to the the unhash operation. We had a regression recently because tcp_bpf callbacks were not cleared in this particular scenario of removing a kTLS socket from a sockmap. It got fixed in commit 4da6a196f93b ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop"). Add a test that triggers the regression so that we don't reintroduce it in the future. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200217121530.754315-4-jakub@cloudflare.com
\| *	bpf, sk_msg: Don't clear saved sock proto on restore	Jakub Sitnicki	2020-02-19	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need to clear psock->sk_proto when restoring socket protocol callbacks in sk->sk_prot. The psock is about to get detached from the sock and eventually destroyed. At worst we will restore the protocol callbacks and the write callback twice. This makes reasoning about psock state easier. Once psock is initialized, we can count on psock->sk_proto always being set. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200217121530.754315-3-jakub@cloudflare.com
\| *	bpf, sk_msg: Let ULP restore sk_proto and write_space callback	Jakub Sitnicki	2020-02-19	1	-10/+1
\|/ \| \| \| \| \| \| \| \| \| \|	We don't need a fallback for when the socket is not using ULP. tcp_update_ulp handles this case exactly the same as we do in sk_psock_restore_proto. Get rid of the duplicated code. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200217121530.754315-2-jakub@cloudflare.com
*	bpf: Allow bpf_perf_event_read_value in all BPF programs	Song Liu	2020-02-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	bpf_perf_event_read_value() is NMI safe. Enable it for all BPF programs. This can be used in fentry/fexit to profile BPF program and individual kernel function with hardware counters. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200214234146.2910011-1-songliubraving@fb.com
*	net: ena: remove set but not used variable 'hash_key'	YueHaibing	2020-02-17	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	drivers/net/ethernet/amazon/ena/ena_com.c: In function ena_com_hash_key_allocate: drivers/net/ethernet/amazon/ena/ena_com.c:1070:50: warning: variable hash_key set but not used [-Wunused-but-set-variable] commit 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported") introduced this, but not used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: netlink: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	2020-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: switchdev: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	2020-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bpf, sockmap: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	2020-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	NFC: digital: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	2020-02-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: usb: cdc-phonet: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	2020-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: phy: allow bcm84881 to be a module	Russell King	2020-02-17	1	-2/+2
\| \| \| \| \| \| \| \| \|	Now that the phylib module loading issue has been resolved, we can allow this PHY driver to be built as a module. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'net-smc-next'	David S. Miller	2020-02-17	6	-40/+44
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ursula Braun says: ==================== net/smc: patches 2020-02-17 here are patches for SMC making termination tasks more perfect. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/smc: reduce port_event scheduling	Ursula Braun	2020-02-17	1	-15/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IB event handlers schedule the port event worker for further processing of port state changes. This patch reduces the number of schedules to avoid duplicate processing of the same port change. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/smc: simplify normal link termination	Karsten Graul	2020-02-17	4	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	smc_lgr_terminate() and smc_lgr_terminate_sched() both result in soft link termination, smc_lgr_terminate_sched() is scheduling a worker for this task. Take out complexity by always using the termination worker and getting rid of smc_lgr_terminate() completely. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/smc: remove unused parameter of smc_lgr_terminate()	Karsten Graul	2020-02-17	4	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The soft parameter of smc_lgr_terminate() is not used and obsolete. Remove it. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/smc: do not delete lgr from list twice	Karsten Graul	2020-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When 2 callers call smc_lgr_terminate() at the same time for the same lgr, one gets the lgr_lock and deletes the lgr from the list and releases the lock. Then the second caller gets the lock and tries to delete it again. In smc_lgr_terminate() add a check if the link group lgr is already deleted from the link group list and prevent to try to delete it a second time. And add a check if the lgr is marked as freeing, which means that a termination is already pending. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/smc: use termination worker under send_lock	Karsten Graul	2020-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	smc_tx_rdma_write() is called under the send_lock and should not call smc_lgr_terminate() directly. Call smc_lgr_terminate_sched() instead which schedules a worker. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/smc: improve smc_lgr_cleanup()	Karsten Graul	2020-02-17	1	-4/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	smc_lgr_cleanup() is called during termination processing, there is no need to send a DELETE_LINK at that time. A DELETE_LINK should have been sent before the termination is initiated, if needed. And remove the extra call to wake_up(&lnk->wr_reg_wait) because smc_llc_link_inactive() already calls the related helper function smc_wr_wakeup_reg_wait(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mlxsw-Reduce-dependency-between-bridge-and-router-code'	David S. Miller	2020-02-17	6	-132/+147
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ido Schimmel says: ==================== mlxsw: Reduce dependency between bridge and router code This patch set reduces the dependency between the bridge and the router code in preparation for RTNL removal from the route insertion path in mlxsw. The motivation and solution are explained in detail in patch #3. The main idea is that we need to stop special-casing the VXLAN devices with regards to the reference counting of the FIDs. Otherwise, we can bump into the situation described in patch #3, where the routing code calls into the bridge code which calls back into the routing code. After adding a mutex to protect router data structures to remove RTNL dependency, this can result in an AA deadlock. Patches #1 and #2 are preparations. They convert the FIDs to use 'refcount_t' for reference counting in order to catch over/under flows and add extack to the bridge creation function. Patches #3-#5 reduce the dependency between the bridge and the router code. First, by having the VXLAN device take a reference on the FID in patch #3 and then by removing unnecessary code following the change in patch #3. Patches #6-#10 adjust existing selftests and add new ones to exercise the new code paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	selftests: mlxsw: vxlan: Add test for error path	Ido Schimmel	2020-02-17	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test that when two VXLAN tunnels with conflicting configurations (i.e., different TTL) are enslaved to the same VLAN-aware bridge, then the enslavement of a port to the bridge is denied. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	selftests: mlxsw: vxlan: Adjust test to recent changes	Ido Schimmel	2020-02-17	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After recent changes, the VXLAN tunnel will be offloaded regardless if any local ports are member in the FID or not. Adjust the test to make sure the tunnel is offloaded in this case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	selftests: mlxsw: extack: Test creation of multiple VLAN-aware bridges	Ido Schimmel	2020-02-17	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The driver supports a single VLAN-aware bridge. Test that the enslavement of a port to the second VLAN-aware bridge fails with an extack. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	selftests: mlxsw: extack: Test bridge creation with VXLAN	Ido Schimmel	2020-02-17	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test that creation of a bridge (both VLAN-aware and VLAN-unaware) fails with an extack when a VXLAN device with an unsupported configuration is already enslaved to it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	selftests: mlxsw: Remove deprecated test	Ido Schimmel	2020-02-17	2	-45/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The addition of a VLAN on a bridge slave prompts the driver to have the local port in question join the FID corresponding to this VLAN. Before recent changes, the operation of joining the FID would also mean that the driver would enable VXLAN tunneling if a VXLAN device was also member in the VLAN. In case the configuration of the VXLAN tunnel was not supported, an extack error would be returned. Since the operation of joining the FID no longer means that VXLAN tunneling is potentially enabled, the test is no longer relevant. Remove it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum: Reduce dependency between bridge and router code	Ido Schimmel	2020-02-17	3	-20/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f40be47a3e40 ("mlxsw: spectrum_router: Do not force specific configuration order") added a call from the routing code to the bridge code in order to handle the case where VNI should be set on a FID following the joining of the router port to the FID. This is no longer required, as previous patches made VXLAN devices explicitly take a reference on the FID and set VNI on it. Therefore, remove the unnecessary call and simply have the RIF take a reference on the FID without checking if VNI should also be set on it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Remove VXLAN checks during FID membership	Ido Schimmel	2020-02-17	1	-58/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As explained in previous patch, VXLAN devices now take a reference on the FID and not only local ports. Therefore, there is no need for local ports to check if they need to set a VNI on the FID when they join the FID. Remove these unnecessary checks. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Have VXLAN device take reference on FID	Ido Schimmel	2020-02-17	1	-19/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up until now only local ports and the router port (which is also a local port) took a reference on the corresponding FID (Filtering Identifier) when joining a bridge. For example: 192.0.2.1/24 br0 \| +------+------+ \| \| swp1 vxlan0 In this case the reference count of the FID will be '2'. Since the VXLAN device does not take a reference on the FID, whenever a local port joins the bridge it needs to check if a VXLAN device is already enslaved. If the VXLAN device should be mapped to the FID in question, then the VXLAN device's VNI is set on the FID. Beside the fact that this scheme special-cases the VXLAN device, it also creates an unnecessary dependency between the routing and bridge code: 1. [R] IP address is added on 'br0', which prompts the creation of a RIF and a backing FID 2. [B] VNI is enabled on backing FID 3. [R] Host route corresponding to VXLAN device's source address is promoted to perform NVE decapsulation [R] - Routing code [B] - Bridge code This back and forth dependency will become problematic when a lock is added in the routing code instead of relying on RTNL, as it will result in an AA deadlock. Instead, have the VXLAN device take a reference on the FID just like all the other netdev members of the bridge. In order to correctly handle the case where VXLAN devices are already enslaved to the bridge when it is offloaded, walk the bridge's slaves and replay the configuration. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Propagate extack to bridge creation function	Ido Schimmel	2020-02-17	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Propagate extack to bridge creation function so that error messages could be passed to user space via netlink instead of printing them to kernel log. A subsequent patch will pass the new extack argument to more functions. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_fid: Use 'refcount_t' for FID reference counting	Ido Schimmel	2020-02-17	1	-6/+7
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	'refcount_t' is very useful for catching over/under flows. Convert the FID (Filtering Identifier) objects to use it instead of 'unsigned int' for their reference count. A subsequent patch in the series will change the way VXLAN devices hold / release the FID reference, which is why the conversion is made now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bridge: teach ndo_dflt_bridge_getlink() more brport flags	Julian Wiedmann	2020-02-17	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	This enables ndo_dflt_bridge_getlink() to report a bridge port's offload settings for multicast and broadcast flooding. CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'sfc-couple-more-ARFS-tidy-ups'	David S. Miller	2020-02-17	2	-19/+24
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Edward Cree says: ==================== couple more ARFS tidy-ups Tie up some loose ends from the recent ARFS work. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sfc: move some ARFS code out of headers	Edward Cree	2020-02-17	2	-18/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	efx_filter_rfs_expire() is a work-function, so it being inline makes no sense. It's only ever used in efx_channels.c, so move it there. While we're at it, clean out some related unused cruft. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sfc: only schedule asynchronous filter work if needed	Edward Cree	2020-02-17	2	-2/+8
\|/ \| \| \| \| \| \| \| \| \|	Prevent excessive CPU time spent running a workitem with nothing to do. We avoid any races by keeping the same check in efx_filter_rfs_expire(). Suggested-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>