summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* libbpf: Wrap source argument of BPF_CORE_READ macro in parenthesesAndrii Nakryiko2020-06-221-4/+4
| | | | | | | | | | | | Wrap source argument of BPF_CORE_READ family of macros into parentheses to allow uses like this: BPF_CORE_READ((struct cast_struct *)src, a, b, c); Fixes: 7db3822ab991 ("libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200619231703.738941-8-andriin@fb.com
* tools/bpftool: Generalize BPF skeleton support and generate vmlinux.hAndrii Nakryiko2020-06-227-66/+45
| | | | | | | | | | | | | | Adapt Makefile to support BPF skeleton generation beyond single profiler.bpf.c case. Also add vmlinux.h generation and switch profiler.bpf.c to use it. clang-bpf-global-var feature is extended and renamed to clang-bpf-co-re to check for support of preserve_access_index attribute, which, together with BTF for global variables, is the minimum requirement for modern BPF programs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-7-andriin@fb.com
* tools/bpftool: Minimize bootstrap bpftoolAndrii Nakryiko2020-06-224-32/+38
| | | | | | | | | | | | | | | | | | | | | | Build minimal "bootstrap mode" bpftool to enable skeleton (and, later, vmlinux.h generation), instead of building almost complete, but slightly different (w/o skeletons, etc) bpftool to bootstrap complete bpftool build. Current approach doesn't scale well (engineering-wise) when adding more BPF programs to bpftool and other complicated functionality, as it requires constant adjusting of the code to work in both bootstrapped mode and normal mode. So it's better to build only minimal bpftool version that supports only BPF skeleton code generation and BTF-to-C conversion. Thankfully, this is quite easy to accomplish due to internal modularity of bpftool commands. This will also allow to keep adding new functionality to bpftool in general, without the need to care about bootstrap mode for those new parts of bpftool. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-6-andriin@fb.com
* tools/bpftool: Move map/prog parsing logic into commonAndrii Nakryiko2020-06-224-308/+310
| | | | | | | | | | | | | Move functions that parse map and prog by id/tag/name/etc outside of map.c/prog.c, respectively. These functions are used outside of those files and are generic enough to be in common. This also makes heavy-weight map.c and prog.c more decoupled from the rest of bpftool files and facilitates more lightweight bootstrap bpftool variant. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-5-andriin@fb.com
* selftests/bpf: Add __ksym extern selftestAndrii Nakryiko2020-06-222-0/+103
| | | | | | | | | | Validate libbpf is able to handle weak and strong kernel symbol externs in BPF code correctly. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-4-andriin@fb.com
* libbpf: Add support for extracting kernel symbol addressesAndrii Nakryiko2020-06-223-6/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for another (in addition to existing Kconfig) special kind of externs in BPF code, kernel symbol externs. Such externs allow BPF code to "know" kernel symbol address and either use it for comparisons with kernel data structures (e.g., struct file's f_op pointer, to distinguish different kinds of file), or, with the help of bpf_probe_user_kernel(), to follow pointers and read data from global variables. Kernel symbol addresses are found through /proc/kallsyms, which should be present in the system. Currently, such kernel symbol variables are typeless: they have to be defined as `extern const void <symbol>` and the only operation you can do (in C code) with them is to take its address. Such extern should reside in a special section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong vs weak semantics stays the same as with Kconfig externs. If symbol is not found in /proc/kallsyms, this will be a failure for strong (non-weak) extern, but will be defaulted to 0 for weak externs. If the same symbol is defined multiple times in /proc/kallsyms, then it will be error if any of the associated addresses differs. In that case, address is ambiguous, so libbpf falls on the side of caution, rather than confusing user with randomly chosen address. In the future, once kernel is extended with variables BTF information, such ksym externs will be supported in a typed version, which will allow BPF program to read variable's contents directly, similarly to how it's done for fentry/fexit input arguments. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-3-andriin@fb.com
* libbpf: Generalize libbpf externs supportAndrii Nakryiko2020-06-221-140/+206
| | | | | | | | | | | | Switch existing Kconfig externs to be just one of few possible kinds of more generic externs. This refactoring is in preparation for ksymbol extern support, added in the follow up patch. There are no functional changes intended. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-2-andriin@fb.com
* libbpf: Add a bunch of attribute getters/setters for map definitionsAndrii Nakryiko2020-06-233-10/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
* selftests/bpf: Test access to bpf map pointerAndrey Ignatov2020-06-223-0/+780
| | | | | | | | | | | | | | | | | Add selftests to test access to map pointers from bpf program for all map types except struct_ops (that one would need additional work). verifier test focuses mostly on scenarios that must be rejected. prog_tests test focuses on accessing multiple fields both scalar and a nested struct from bpf program and verifies that those fields have expected values. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/139a6a17f8016491e39347849b951525335c6eb4.1592600985.git.rdna@fb.com
* bpf: Set map_btf_{name, id} for all map typesAndrey Ignatov2020-06-2214-0/+72
| | | | | | | | | | | Set map_btf_name and map_btf_id for all map types so that map fields can be accessed by bpf programs. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/a825f808f22af52b018dbe82f1c7d29dab5fc978.1592600985.git.rdna@fb.com
* bpf: Support access to bpf map fieldsAndrey Ignatov2020-06-227-9/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple use-cases when it's convenient to have access to bpf map fields, both `struct bpf_map` and map type specific struct-s such as `struct bpf_array`, `struct bpf_htab`, etc. For example while working with sock arrays it can be necessary to calculate the key based on map->max_entries (some_hash % max_entries). Currently this is solved by communicating max_entries via "out-of-band" channel, e.g. via additional map with known key to get info about target map. That works, but is not very convenient and error-prone while working with many maps. In other cases necessary data is dynamic (i.e. unknown at loading time) and it's impossible to get it at all. For example while working with a hash table it can be convenient to know how much capacity is already used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case). At the same time kernel knows this info and can provide it to bpf program. Fill this gap by adding support to access bpf map fields from bpf program for both `struct bpf_map` and map type specific fields. Support is implemented via btf_struct_access() so that a user can define their own `struct bpf_map` or map type specific struct in their program with only necessary fields and preserve_access_index attribute, cast a map to this struct and use a field. For example: struct bpf_map { __u32 max_entries; } __attribute__((preserve_access_index)); struct bpf_array { struct bpf_map map; __u32 elem_size; } __attribute__((preserve_access_index)); struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 4); __type(key, __u32); __type(value, __u32); } m_array SEC(".maps"); SEC("cgroup_skb/egress") int cg_skb(void *ctx) { struct bpf_array *array = (struct bpf_array *)&m_array; struct bpf_map *map = (struct bpf_map *)&m_array; /* .. use map->max_entries or array->map.max_entries .. */ } Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of corresponding struct. Only reading from map fields is supported. For btf_struct_access() to work there should be a way to know btf id of a struct that corresponds to a map type. To get btf id there should be a way to get a stringified name of map-specific struct, such as "bpf_array", "bpf_htab", etc for a map type. Two new fields are added to `struct bpf_map_ops` to handle it: * .map_btf_name keeps a btf name of a struct returned by map_alloc(); * .map_btf_id is used to cache btf id of that struct. To make btf ids calculation cheaper they're calculated once while preparing btf_vmlinux and cached same way as it's done for btf_id field of `struct bpf_func_proto` While calculating btf ids, struct names are NOT checked for collision. Collisions will be checked as a part of the work to prepare btf ids used in verifier in compile time that should land soon. The only known collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs net/core/sock_map.c) was fixed earlier. Both new fields .map_btf_name and .map_btf_id must be set for a map type for the feature to work. If neither is set for a map type, verifier will return ENOTSUPP on a try to access map_ptr of corresponding type. If just one of them set, it's verifier misconfiguration. Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be supported separately. The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by perfmon_capable() so that unpriv programs won't have access to bpf map fields. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/6479686a0cd1e9067993df57b4c3eef0e276fec9.1592600985.git.rdna@fb.com
* bpf: Rename bpf_htab to bpf_shtab in sock_mapAndrey Ignatov2020-06-221-41/+41
| | | | | | | | | | | | | | | | | | | | | | | There are two different `struct bpf_htab` in bpf code in the following files: - kernel/bpf/hashtab.c - net/core/sock_map.c It makes it impossible to find proper btf_id by name = "bpf_htab" and kind = BTF_KIND_STRUCT what is needed to support access to map ptr so that bpf program can access `struct bpf_htab` fields. To make it possible one of the struct-s should be renamed, sock_map.c looks like a better candidate for rename since it's specialized version of hashtab. Rename it to bpf_shtab ("sh" stands for Sock Hash). Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/c006a639e03c64ca50fc87c4bb627e0bfba90f4e.1592600985.git.rdna@fb.com
* bpf: Switch btf_parse_vmlinux to btf_find_by_name_kindAndrey Ignatov2020-06-221-17/+6
| | | | | | | | | | | | | | | | | btf_parse_vmlinux() implements manual search for struct bpf_ctx_convert since at the time of implementing btf_find_by_name_kind() was not available. Later btf_find_by_name_kind() was introduced in 27ae7997a661 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS"). It provides similar search functionality and can be leveraged in btf_parse_vmlinux(). Do it. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/6e12d5c3e8a3d552925913ef73a695dd1bb27800.1592600985.git.rdna@fb.com
* tools/bpftool: Relicense bpftool's BPF profiler prog as dual-license GPL/BSDAndrii Nakryiko2020-06-201-2/+2
| | | | | | | | | Relicense it to be compatible with the rest of bpftool files. Suggested-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200619222024.519774-1-andriin@fb.com
* tools/bpf: Add verifier tests for 32bit pointer/scalar arithmeticYonghong Song2020-06-191-0/+38
| | | | | | | | | Added two test_verifier subtests for 32bit pointer/scalar arithmetic with BPF_SUB operator. They are passing verifier now. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200618234632.3321367-1-yhs@fb.com
* bpf: Avoid verifier failure for 32bit pointer arithmeticYonghong Song2020-06-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When do experiments with llvm (disabling instcombine and simplifyCFG), I hit the following error with test_seg6_loop.o. ; R1=pkt(id=0,off=0,r=48,imm=0), R7=pkt(id=0,off=40,r=48,imm=0) w2 = w7 ; R2_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) w2 -= w1 R2 32-bit pointer arithmetic prohibited The corresponding source code is: uint32_t srh_off // srh and skb->data are all packet pointers srh_off = (char *)srh - (char *)(long)skb->data; The verifier does not support 32-bit pointer/scalar arithmetic. Without my llvm change, the code looks like ; R3=pkt(id=0,off=40,r=48,imm=0), R8=pkt(id=0,off=0,r=48,imm=0) w3 -= w8 ; R3_w=inv(id=0) This is explicitly allowed in verifier if both registers are pointers and the opcode is BPF_SUB. To fix this problem, I changed the verifier to allow 32-bit pointer/scaler BPF_SUB operations. At the source level, the issue could be workarounded with inline asm or changing "uint32_t srh_off" to "uint64_t srh_off". But I feel that verifier change might be the right thing to do. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200618234631.3321118-1-yhs@fb.com
* bpf: sk_storage: Prefer to get a free cache_idxMartin KaFai Lau2020-06-181-4/+37
| | | | | | | | | | | | | | | | | | The cache_idx is currently picked by RR. There is chance that the same cache_idx will be picked by multiple sk_storage_maps while other cache_idx is still unused. e.g. It could happen when the sk_storage_map is recreated during the restart of the user space process. This patch tracks the usage count for each cache_idx. There is 16 of them now (defined in BPF_SK_STORAGE_CACHE_SIZE). It will try to pick the free cache_idx. If none was found, it would pick one with the minimal usage count. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200617174226.2301909-1-kafai@fb.com
* libbpf: Bump version to 0.1.0Andrii Nakryiko2020-06-171-0/+3
| | | | | | | | Bump libbpf version to 0.1.0, as new development cycle starts. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200617183132.1970836-1-andriin@fb.com
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2020-06-1633-314/+309
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Don't get per-cpu pointer with preemption enabled in nft_set_pipapo, fix from Stefano Brivio. 2) Fix memory leak in ctnetlink, from Pablo Neira Ayuso. 3) Multiple definitions of MPTCP_PM_MAX_ADDR, from Geliang Tang. 4) Accidently disabling NAPI in non-error paths of macb_open(), from Charles Keepax. 5) Fix races between alx_stop and alx_remove, from Zekun Shen. 6) We forget to re-enable SRIOV during resume in bnxt_en driver, from Michael Chan. 7) Fix memory leak in ipv6_mc_destroy_dev(), from Wang Hai. 8) rxtx stats use wrong index in mvpp2 driver, from Sven Auhagen. 9) Fix memory leak in mptcp_subflow_create_socket error path, from Wei Yongjun. 10) We should not adjust the TCP window advertised when sending dup acks in non-SACK mode, because it won't be counted as a dup by the sender if the window size changes. From Eric Dumazet. 11) Destroy the right number of queues during remove in mvpp2 driver, from Sven Auhagen. 12) Various WOL and PM fixes to e1000 driver, from Chen Yu, Vaibhav Gupta, and Arnd Bergmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) e1000e: fix unused-function warning e1000: use generic power management e1000e: Do not wake up the system via WOL if device wakeup is disabled lan743x: add MODULE_DEVICE_TABLE for module loading alias mlxsw: spectrum: Adjust headroom buffers for 8x ports bareudp: Fixed configuration to avoid having garbage values mvpp2: remove module bugfix tcp: grow window for OOO packets only for SACK flows mptcp: fix memory leak in mptcp_subflow_create_socket() netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline net: dsa: sja1105: fix PTP timestamping with large tc-taprio cycles mvpp2: ethtool rxtx stats fix MAINTAINERS: switch to my private email for Renesas Ethernet drivers rocker: fix incorrect error handling in dma_rings_init test_objagg: Fix potential memory leak in error handling net: ethernet: mtk-star-emac: simplify interrupt handling mld: fix memory leak in ipv6_mc_destroy_dev() bnxt_en: Return from timer if interface is not in open state. bnxt_en: Fix AER reset logic on 57500 chips. ...
| * Merge branch '1GbE' of ↵David S. Miller2020-06-162-51/+28
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2020-06-16 This series contains fixes to e1000 and e1000e. Chen fixes an e1000e issue where systems could be waken via WoL, even though the user has disabled the wakeup bit via sysfs. Vaibhav Gupta updates the e1000 driver to clean up the legacy Power Management hooks. Arnd Bergmann cleans up the inconsistent use CONFIG_PM_SLEEP preprocessor tags, which also resolves the compiler warnings about the possibility of unused structure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * e1000e: fix unused-function warningArnd Bergmann2020-06-161-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CONFIG_PM_SLEEP #ifdef checks in this file are inconsistent, leading to a warning about sometimes unused function: drivers/net/ethernet/intel/e1000e/netdev.c:137:13: error: unused function 'e1000e_check_me' [-Werror,-Wunused-function] Rather than adding more #ifdefs, just remove them completely and mark the PM functions as __maybe_unused to let the compiler work it out on it own. Fixes: e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME systems") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * e1000: use generic power managementVaibhav Gupta2020-06-161-36/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With legacy PM hooks, it was the responsibility of a driver to manage PCI states and also the device's power state. The generic approach is to let PCI core handle the work. e1000_suspend() calls __e1000_shutdown() to perform intermediate tasks. __e1000_shutdown() modifies the value of "wake" (device should be wakeup enabled or not), responsible for controlling the flow of legacy PM. Since, PCI core has no idea about the value of "wake", new code for generic PM may produce unexpected results. Thus, use "device_set_wakeup_enable()" to wakeup-enable the device accordingly. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * e1000e: Do not wake up the system via WOL if device wakeup is disabledChen Yu2020-06-161-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the system will be woken up via WOL(Wake On LAN) even if the device wakeup ability has been disabled via sysfs: cat /sys/devices/pci0000:00/0000:00:1f.6/power/wakeup disabled The system should not be woken up if the user has explicitly disabled the wake up ability for this device. This patch clears the WOL ability of this network device if the user has disabled the wake up ability in sysfs. Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver") Reported-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | lan743x: add MODULE_DEVICE_TABLE for module loading aliasTim Harvey2020-06-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Without a MODULE_DEVICE_TABLE the attributes are missing that create an alias for auto-loading the module in userspace via hotplug. Signed-off-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mlxsw: spectrum: Adjust headroom buffers for 8x portsIdo Schimmel2020-06-164-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The port's headroom buffers are used to store packets while they traverse the device's pipeline and also to store packets that are egress mirrored. On Spectrum-3, ports with eight lanes use two headroom buffers between which the configured headroom size is split. In order to prevent packet loss, multiply the calculated headroom size by two for 8x ports. Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bareudp: Fixed configuration to avoid having garbage valuesMartin2020-06-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Code to initialize the conf structure while gathering the configuration of the device was missing. Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Martin <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mvpp2: remove module bugfixSven Auhagen2020-06-161-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remove function does not destroy all BM Pools when per cpu pool is active. When reloading the mvpp2 as a module the BM Pools are still active in hardware and due to the bug have twice the size now old + new. This eventually leads to a kernel crash. v2: * add Fixes tag Fixes: 7d04b0b13b11 ("mvpp2: percpu buffers") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: grow window for OOO packets only for SACK flowsEric Dumazet2020-06-161-2/+10
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in 2013, we made a change that broke fast retransmit for non SACK flows. Indeed, for these flows, a sender needs to receive three duplicate ACK before starting fast retransmit. Sending ACK with different receive window do not count. Even if enabling SACK is strongly recommended these days, there still are some cases where it has to be disabled. Not increasing the window seems better than having to rely on RTO. After the fix, following packetdrill test gives : // Initialize connection 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8> +0 < . 1:1(0) ack 1 win 514 +0 accept(3, ..., ...) = 4 +0 < . 1:1001(1000) ack 1 win 514 // Quick ack +0 > . 1:1(0) ack 1001 win 264 +0 < . 2001:3001(1000) ack 1 win 514 // DUPACK : Normally we should not change the window +0 > . 1:1(0) ack 1001 win 264 +0 < . 3001:4001(1000) ack 1 win 514 // DUPACK : Normally we should not change the window +0 > . 1:1(0) ack 1001 win 264 +0 < . 4001:5001(1000) ack 1 win 514 // DUPACK : Normally we should not change the window +0 > . 1:1(0) ack 1001 win 264 +0 < . 1001:2001(1000) ack 1 win 514 // Hole is repaired. +0 > . 1:1(0) ack 5001 win 272 Fixes: 4e4f1fc22681 ("tcp: properly increase rcv_ssthresh for ofo packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mptcp: fix memory leak in mptcp_subflow_create_socket()Wei Yongjun2020-06-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | socket malloced by sock_create_kern() should be release before return in the error handling, otherwise it cause memory leak. unreferenced object 0xffff88810910c000 (size 1216): comm "00000003_test_m", pid 12238, jiffies 4295050289 (age 54.237s) hex dump (first 32 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0..... backtrace: [<00000000e877f89f>] sock_alloc_inode+0x18/0x1c0 [<0000000093d1dd51>] alloc_inode+0x63/0x1d0 [<000000005673fec6>] new_inode_pseudo+0x14/0xe0 [<00000000b5db6be8>] sock_alloc+0x3c/0x260 [<00000000e7e3cbb2>] __sock_create+0x89/0x620 [<0000000023e48593>] mptcp_subflow_create_socket+0xc0/0x5e0 [<00000000419795e4>] __mptcp_socket_create+0x1ad/0x3f0 [<00000000b2f942e8>] mptcp_stream_connect+0x281/0x4f0 [<00000000c80cd5cc>] __sys_connect_file+0x14d/0x190 [<00000000dc761f11>] __sys_connect+0x128/0x160 [<000000008b14e764>] __x64_sys_connect+0x6f/0xb0 [<000000007b4f93bd>] do_syscall_64+0xa1/0x530 [<00000000d3e770b6>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: 2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'remove-dependency-between-mlx5-act_ct-nf_flow_table'David S. Miller2020-06-154-61/+55
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Roi Dayan says: ==================== remove dependency between mlx5, act_ct, nf_flow_table Some exported functions from act_ct and nf_flow_table being used in mlx5_core. This leads that mlx5 module always require act_ct and nf_flow_table modules. Those small exported functions can be moved to the header files to avoid this module dependency. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inlineAlaa Hleihel2020-06-152-49/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, nf_flow_table_offload_add/del_cb are exported by nf_flow_table module, therefore modules using them will have hard-dependency on nf_flow_table and will require loading it all the time. This can lead to an unnecessary overhead on systems that do not use this API. To relax the hard-dependency between the modules, we unexport these functions and make them static inline. Fixes: 978703f42549 ("netfilter: flowtable: Add API for registering to flow table events") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inlineAlaa Hleihel2020-06-152-12/+10
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, tcf_ct_flow_table_restore_skb is exported by act_ct module, therefore modules using it will have hard-dependency on act_ct and will require loading it all the time. This can lead to an unnecessary overhead on systems that do not use hardware connection tracking action (ct_metadata action) in the first place. To relax the hard-dependency between the modules, we unexport this function and make it a static inline one. Fixes: 30b0cf90c6dd ("net/sched: act_ct: Support restoring conntrack info on skbs") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: sja1105: fix PTP timestamping with large tc-taprio cyclesVladimir Oltean2020-06-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It isn't actually described clearly at all in UM10944.pdf, but on TX of a management frame (such as PTP), this needs to happen: - The destination MAC address (i.e. 01-80-c2-00-00-0e), along with the desired destination port, need to be installed in one of the 4 management slots of the switch, over SPI. - The host can poll over SPI for that management slot's ENFPORT field. That gets unset when the switch has matched the slot to the frame. And therein lies the problem. ENFPORT does not mean that the packet has been transmitted. Just that it has been received over the CPU port, and that the mgmt slot is yet again available. This is relevant because of what we are doing in sja1105_ptp_txtstamp_skb, which is called right after sja1105_mgmt_xmit. We are in a hard real-time deadline, since the hardware only gives us 24 bits of TX timestamp, so we need to read the full PTP clock to reconstruct it. Because we're in a hurry (in an attempt to make sure that we have a full 64-bit PTP time which is as close as possible to the actual transmission time of the frame, to avoid 24-bit wraparounds), first we read the PTP clock, then we poll for the TX timestamp to become available. But of course, we don't know for sure that the frame has been transmitted when we read the full PTP clock. We had assumed that ENFPORT means it has, but the assumption is incorrect. And while in most real-life scenarios this has never been caught due to software delays, nowhere is this fact more obvious than with a tc-taprio offload, where PTP traffic gets a small timeslot very rarely (example: 1 packet per 10 ms). In that case, we will be reading the PTP clock for timestamp reconstruction too early (before the packet has been transmitted), and this renders the reconstruction procedure incorrect (see the assumptions described in the comments found on function sja1105_tstamp_reconstruct). So the PTP TX timestamps will be off by 1<<24 clock ticks, or 135 ms (1 tick is 8 ns). So fix this case of premature optimization by simply reordering the sja1105_ptpegr_ts_poll and the sja1105_ptpclkval_read function calls. It turns out that in practice, the 135 ms hard deadline for PTP timestamp wraparound is not so hard, since even the most bandwidth-intensive PTP profiles, such as 802.1AS-2011, have a sync frame interval of 125 ms. So if we couldn't deliver a timestamp in 135 ms (which we can), we're toast and have much bigger problems anyway. Fixes: 47ed985e97f5 ("net: dsa: sja1105: Add logic for TX timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mvpp2: ethtool rxtx stats fixSven Auhagen2020-06-151-2/+2
| | | | | | | | | | | | | | | | The ethtool rx and tx queue statistics are reporting wrong values. Fix reading out the correct ones. Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * MAINTAINERS: switch to my private email for Renesas Ethernet driversSergei Shtylyov2020-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | I no longer work for Cogent Embedded (but my old email still works :-)), and still would like to continue looking after the Renesas Ethernet drivers and bindings. Let's switch to my private email. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rocker: fix incorrect error handling in dma_rings_initAditya Pakki2020-06-151-2/+2
| | | | | | | | | | | | | | | | | | | | In rocker_dma_rings_init, the goto blocks in case of errors caused by the functions rocker_dma_cmd_ring_waits_alloc() and rocker_dma_ring_create() are incorrect. The patch fixes the order consistent with cleanup in rocker_dma_rings_fini(). Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
| * test_objagg: Fix potential memory leak in error handlingAditya Pakki2020-06-151-2/+2
| | | | | | | | | | | | | | | | | | In case of failure of check_expect_hints_stats(), the resources allocated by objagg_hints_get should be freed. The patch fixes this issue. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: mtk-star-emac: simplify interrupt handlingBartosz Golaszewski2020-06-151-89/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During development we tried to make the interrupt handling as fine-grained as possible with TX and RX interrupts being disabled/enabled independently and the counter registers reset from workqueue context. Unfortunately after thorough testing of current mainline, we noticed the driver has become unstable under heavy load. While this is hard to reproduce, it's quite consistent in the driver's current form. This patch proposes to go back to the previous approach of doing all processing in napi context with all interrupts masked in order to make the driver usable in mainline linux. This doesn't impact the performance on pumpkin boards at all and it's in line with what many ethernet drivers do in mainline linux anyway. At the same time we're adding a FIXME comment about the need to improve the interrupt handling. Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mld: fix memory leak in ipv6_mc_destroy_dev()Wang Hai2020-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a84d01647989 ("mld: fix memory leak in mld_del_delrec()") fixed the memory leak of MLD, but missing the ipv6_mc_destroy_dev() path, in which mca_sources are leaked after ma_put(). Using ip6_mc_clear_src() to take care of the missing free. BUG: memory leak unreferenced object 0xffff8881113d3180 (size 64): comm "syz-executor071", pid 389, jiffies 4294887985 (age 17.943s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff 02 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 ................ backtrace: [<000000002cbc483c>] kmalloc include/linux/slab.h:555 [inline] [<000000002cbc483c>] kzalloc include/linux/slab.h:669 [inline] [<000000002cbc483c>] ip6_mc_add1_src net/ipv6/mcast.c:2237 [inline] [<000000002cbc483c>] ip6_mc_add_src+0x7f5/0xbb0 net/ipv6/mcast.c:2357 [<0000000058b8b1ff>] ip6_mc_source+0xe0c/0x1530 net/ipv6/mcast.c:449 [<000000000bfc4fb5>] do_ipv6_setsockopt.isra.12+0x1b2c/0x3b30 net/ipv6/ipv6_sockglue.c:754 [<00000000e4e7a722>] ipv6_setsockopt+0xda/0x150 net/ipv6/ipv6_sockglue.c:950 [<0000000029260d9a>] rawv6_setsockopt+0x45/0x100 net/ipv6/raw.c:1081 [<000000005c1b46f9>] __sys_setsockopt+0x131/0x210 net/socket.c:2132 [<000000008491f7db>] __do_sys_setsockopt net/socket.c:2148 [inline] [<000000008491f7db>] __se_sys_setsockopt net/socket.c:2145 [inline] [<000000008491f7db>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2145 [<00000000c7bc11c5>] do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295 [<000000005fb7a3f3>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'bnxt_en-Bug-fixes'David S. Miller2020-06-151-18/+17
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael Chan says: ==================== bnxt_en: Bug fixes. Four fixes related to the bnxt_en driver's resume path, AER reset, and the timer function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bnxt_en: Return from timer if interface is not in open state.Vasundhara Volam2020-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will avoid many uneccessary error logs when driver or firmware is in reset. Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bnxt_en: Fix AER reset logic on 57500 chips.Michael Chan2020-06-151-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AER reset should follow the same steps as suspend/resume. We need to free context memory during AER reset and allocate new context memory during recovery by calling bnxt_hwrm_func_qcaps(). We also need to call bnxt_reenable_sriov() to restore the VFs. Fixes: bae361c54fb6 ("bnxt_en: Improve AER slot reset.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bnxt_en: Re-enable SRIOV during resume.Michael Chan2020-06-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If VFs are enabled, we need to re-configure them during resume because firmware has been reset while resuming. Otherwise, the VFs won't work after resume. Fixes: c16d4ee0e397 ("bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bnxt_en: Simplify bnxt_resume().Michael Chan2020-06-151-12/+2
| |/ | | | | | | | | | | | | | | | | | | | | The separate steps we do in bnxt_resume() can be done more simply by calling bnxt_hwrm_func_qcaps(). This change will add an extra __bnxt_hwrm_func_qcaps() call which is needed anyway on older firmware. Fixes: f9b69d7f6279 ("bnxt_en: Fix suspend/resume path on 57500 chips") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2020-06-154-25/+65
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix bogus EEXIST on element insertions to the rbtree with timeouts, from Stefano Brivio. 2) Preempt BUG splat in the pipapo element insertion path, also from Stefano. 3) Release filter from the ctnetlink error path. 4) Release flowtable hooks from the deletion path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * netfilter: nf_tables: hook list memleak in flowtable deletionPablo Neira Ayuso2020-06-121-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After looking up for the flowtable hooks that need to be removed, release the hook objects in the deletion list. The error path needs to released these hook objects too. Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable") Reported-by: syzbot+eb9d5924c51d6d59e094@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: ctnetlink: memleak in filter initialization error pathPablo Neira Ayuso2020-06-101-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | Release the filter object in case of error. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Reported-by: syzbot+38b8b548a851a01793c5@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nft_set_pipapo: Disable preemption before getting per-CPU pointerStefano Brivio2020-06-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lkp kernel test robot reports, with CONFIG_DEBUG_PREEMPT enabled: [ 165.316525] BUG: using smp_processor_id() in preemptible [00000000] code: nft/6247 [ 165.319547] caller is nft_pipapo_insert+0x464/0x610 [nf_tables] [ 165.321846] CPU: 1 PID: 6247 Comm: nft Not tainted 5.6.0-rc5-01595-ge32a4dc6512ce3 #1 [ 165.332128] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 165.334892] Call Trace: [ 165.336435] dump_stack+0x8f/0xcb [ 165.338128] debug_smp_processor_id+0xb2/0xc0 [ 165.340117] nft_pipapo_insert+0x464/0x610 [nf_tables] [ 165.342290] ? nft_trans_alloc_gfp+0x1c/0x60 [nf_tables] [ 165.344420] ? rcu_read_lock_sched_held+0x52/0x80 [ 165.346460] ? nft_trans_alloc_gfp+0x1c/0x60 [nf_tables] [ 165.348543] ? __mmu_interval_notifier_insert+0xa0/0xf0 [ 165.350629] nft_add_set_elem+0x5ff/0xa90 [nf_tables] [ 165.352699] ? __lock_acquire+0x241/0x1400 [ 165.354573] ? __lock_acquire+0x241/0x1400 [ 165.356399] ? reacquire_held_locks+0x12f/0x200 [ 165.358384] ? nf_tables_valid_genid+0x1f/0x40 [nf_tables] [ 165.360502] ? nla_strcmp+0x10/0x50 [ 165.362199] ? nft_table_lookup+0x4f/0xa0 [nf_tables] [ 165.364217] ? nla_strcmp+0x10/0x50 [ 165.365891] ? nf_tables_newsetelem+0xd5/0x150 [nf_tables] [ 165.367997] nf_tables_newsetelem+0xd5/0x150 [nf_tables] [ 165.370083] nfnetlink_rcv_batch+0x4fd/0x790 [nfnetlink] [ 165.372205] ? __lock_acquire+0x241/0x1400 [ 165.374058] ? __nla_validate_parse+0x57/0x8a0 [ 165.375989] ? cap_inode_getsecurity+0x230/0x230 [ 165.377954] ? security_capable+0x38/0x50 [ 165.379795] nfnetlink_rcv+0x11d/0x140 [nfnetlink] [ 165.381779] netlink_unicast+0x1b2/0x280 [ 165.383612] netlink_sendmsg+0x351/0x470 [ 165.385439] sock_sendmsg+0x5b/0x60 [ 165.387133] ____sys_sendmsg+0x200/0x280 [ 165.388871] ? copy_msghdr_from_user+0xd9/0x160 [ 165.390805] ___sys_sendmsg+0x88/0xd0 [ 165.392524] ? __might_fault+0x3e/0x90 [ 165.394273] ? sock_getsockopt+0x3d5/0xbb0 [ 165.396021] ? __handle_mm_fault+0x545/0x6a0 [ 165.397822] ? find_held_lock+0x2d/0x90 [ 165.399593] ? __sys_sendmsg+0x5e/0xa0 [ 165.401338] __sys_sendmsg+0x5e/0xa0 [ 165.402979] do_syscall_64+0x60/0x280 [ 165.404680] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 165.406621] RIP: 0033:0x7ff1fa46e783 [ 165.408299] Code: c7 c0 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 [ 165.414163] RSP: 002b:00007ffedf59ea78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 165.416804] RAX: ffffffffffffffda RBX: 00007ffedf59fc60 RCX: 00007ff1fa46e783 [ 165.419419] RDX: 0000000000000000 RSI: 00007ffedf59fb10 RDI: 0000000000000005 [ 165.421886] RBP: 00007ffedf59fc10 R08: 00007ffedf59ea54 R09: 0000000000000001 [ 165.424445] R10: 00007ff1fa630c6c R11: 0000000000000246 R12: 0000000000020000 [ 165.426954] R13: 0000000000000280 R14: 0000000000000005 R15: 00007ffedf59ea90 Disable preemption before accessing the lookup scratch area in nft_pipapo_insert(). Reported-by: kernel test robot <lkp@intel.com> Analysed-by: Florian Westphal <fw@strlen.de> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nft_set_rbtree: Don't account for expired elements on insertionStefano Brivio2020-06-081-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While checking the validity of insertion in __nft_rbtree_insert(), we currently ignore conflicting elements and intervals only if they are not active within the next generation. However, if we consider expired elements and intervals as potentially conflicting and overlapping, we'll return error for entries that should be added instead. This is particularly visible with garbage collection intervals that are comparable with the element timeout itself, as reported by Mike Dillinger. Other than the simple issue of denying insertion of valid entries, this might also result in insertion of a single element (opening or closing) out of a given interval. With single entries (that are inserted as intervals of size 1), this leads in turn to the creation of new intervals. For example: # nft add element t s { 192.0.2.1 } # nft list ruleset [...] elements = { 192.0.2.1-255.255.255.255 } Always ignore expired elements active in the next generation, while checking for conflicts. It might be more convenient to introduce a new macro that covers both inactive and expired items, as this type of check also appears quite frequently in other set back-ends. This is however beyond the scope of this fix and can be deferred to a separate patch. Other than the overlap detection cases introduced by commit 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"), we also have to cover the original conflict check dealing with conflicts between two intervals of size 1, which was introduced before support for timeout was introduced. This won't return an error to the user as -EEXIST is masked by nft if NLM_F_EXCL is not given, but would result in a silent failure adding the entry. Reported-by: Mike Dillinger <miked@softtalker.com> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | MAINTAINERS: merge entries for felix and ocelot driversVladimir Oltean2020-06-151-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ocelot switchdev driver also provides a set of library functions for the felix DSA driver, which in practice means that most of the patches will be of interest to both groups of driver maintainers. So, as also suggested in the discussion here, let's merge the 2 entries into a single larger one: https://www.spinics.net/lists/netdev/msg657412.html Note that the entry has been renamed into "OCELOT SWITCH" since neither Vitesse nor Microsemi exist any longer as company names, instead they are now named Microchip (which again might be subject to change in the future), so use the device family name instead. Suggested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>