summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-netdev' of ↵Jakub Kicinski2023-08-255-197/+495
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-25 We've added 87 non-merge commits during the last 8 day(s) which contain a total of 104 files changed, 3719 insertions(+), 4212 deletions(-). The main changes are: 1) Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds, from Jiri Olsa. 2) Add support BPF cpu v4 instructions for arm64 JIT compiler, from Xu Kuohai. 3) Add support BPF cpu v4 instructions for riscv64 JIT compiler, from Pu Lehui. 4) Fix LWT BPF xmit hooks wrt their return values where propagating the result from skb_do_redirect() would trigger a use-after-free, from Yan Zhai. 5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr where the map's value kptr type and locally allocated obj type mismatch, from Yonghong Song. 6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph root/node which bypassed reg->off == 0 enforcement, from Kumar Kartikeya Dwivedi. 7) Lift BPF verifier restriction in networking BPF programs to treat comparison of packet pointers not as a pointer leak, from Yafang Shao. 8) Remove unmaintained XDP BPF samples as they are maintained in xdp-tools repository out of tree, from Toke Høiland-Jørgensen. 9) Batch of fixes for the tracing programs from BPF samples in order to make them more libbpf-aware, from Daniel T. Lee. 10) Fix a libbpf signedness determination bug in the CO-RE relocation handling logic, from Andrii Nakryiko. 11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up fixes for bpf_refcount shared ownership implementation, both from Dave Marchevsky. 12) Add a new bpf_object__unpin() API function to libbpf, from Daniel Xu. 13) Fix a memory leak in libbpf to also free btf_vmlinux when the bpf_object gets closed, from Hao Luo. 14) Small error output improvements to test_bpf module, from Helge Deller. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits) selftests/bpf: Add tests for rbtree API interaction in sleepable progs bpf: Allow bpf_spin_{lock,unlock} in sleepable progs bpf: Consider non-owning refs to refcounted nodes RCU protected bpf: Reenable bpf_refcount_acquire bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes bpf: Consider non-owning refs trusted bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire selftests/bpf: Enable cpu v4 tests for RV64 riscv, bpf: Support unconditional bswap insn riscv, bpf: Support signed div/mod insns riscv, bpf: Support 32-bit offset jmp insn riscv, bpf: Support sign-extension mov insns riscv, bpf: Support sign-extension load insns riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W samples/bpf: Add note to README about the XDP utilities moved to xdp-tools samples/bpf: Cleanup .gitignore samples/bpf: Remove the xdp_sample_pkts utility samples/bpf: Remove the xdp1 and xdp2 utilities samples/bpf: Remove the xdp_rxq_info utility samples/bpf: Remove the xdp_redirect* utilities ... ==================== Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * bpf: Allow bpf_spin_{lock,unlock} in sleepable progsDave Marchevsky2023-08-252-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks") disabled bpf_spin_lock usage in sleepable progs, stating: Sleepable LSM programs can be preempted which means that allowng spin locks will need more work (disabling preemption and the verifier ensuring that no sleepable helpers are called when a spin lock is held). This patch disables preemption before grabbing bpf_spin_lock. The second requirement above "no sleepable helpers are called when a spin lock is held" is implicitly enforced by current verifier logic due to helper calls in spin_lock CS being disabled except for a few exceptions, none of which sleep. Due to above preemption changes, bpf_spin_lock CS can also be considered a RCU CS, so verifier's in_rcu_cs check is modified to account for this. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230821193311.3290257-7-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Consider non-owning refs to refcounted nodes RCU protectedDave Marchevsky2023-08-251-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An earlier patch in the series ensures that the underlying memory of nodes with bpf_refcount - which can have multiple owners - is not reused until RCU grace period has elapsed. This prevents use-after-free with non-owning references that may point to recently-freed memory. While RCU read lock is held, it's safe to dereference such a non-owning ref, as by definition RCU GP couldn't have elapsed and therefore underlying memory couldn't have been reused. From the perspective of verifier "trustedness" non-owning refs to refcounted nodes are now trusted only in RCU CS and therefore should no longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them MEM_RCU in order to reflect this new state. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230821193311.3290257-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Reenable bpf_refcount_acquireDave Marchevsky2023-08-251-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all reported issues are fixed, bpf_refcount_acquire can be turned back on. Also reenable all bpf_refcount-related tests which were disabled. This a revert of: * commit f3514a5d6740 ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled") * commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed") Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodesDave Marchevsky2023-08-251-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the final fix for the use-after-free scenario described in commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for non-owning refs"). That commit, by virtue of changing bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed the "refcount incr on 0" splat. The not_zero check in refcount_inc_not_zero, though, still occurs on memory that could have been free'd and reused, so the commit didn't properly fix the root cause. This patch actually fixes the issue by free'ing using the recently-added bpf_mem_free_rcu, which ensures that the memory is not reused until RCU grace period has elapsed. If that has happened then there are no non-owning references alive that point to the recently-free'd memory, so it can be safely reused. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Ensure kptr_struct_meta is non-NULL for collection insert and ↵Dave Marchevsky2023-08-251-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | refcount_acquire It's straightforward to prove that kptr_struct_meta must be non-NULL for any valid call to these kfuncs: * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any struct in user BTF with a special field (e.g. bpf_refcount, {rb,list}_node). These are stored in that BTF's struct_meta_tab. * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes have {rb,list}_node field and that it's at the correct offset. Similarly, check_kfunc_args ensures bpf_refcount field existence for node param to bpf_refcount_acquire. * So a btf_struct_meta must have been created for the struct type of node param to these kfuncs * That BTF and its struct_meta_tab are guaranteed to still be around. Any arbitrary {rb,list} node the BPF program interacts with either: came from bpf_obj_new or a collection removal kfunc in the same program, in which case the BTF is associated with the program and still around; or came from bpf_kptr_xchg, in which case the BTF was associated with the map and is still around Instead of silently continuing with NULL struct_meta, which caused confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"), let's error out. Then, at runtime, we can confidently say that the implementations of these kfuncs were given a non-NULL kptr_struct_meta, meaning that special-field-specific functionality like bpf_obj_free_fields and the bpf_obj_drop change introduced later in this series are guaranteed to execute. This patch doesn't change functionality, just makes it easier to reason about existing functionality. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Remove a WARN_ON_ONCE warning related to local kptrYonghong Song2023-08-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, in function bpf_obj_free_fields(), for local kptr, a warning will be issued if the struct does not contain any special fields. But actually the kernel seems totally okay with a local kptr without any special fields. Permitting no special fields also aligns with future percpu kptr which also allows no special fields. Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230824063417.201925-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Fix issue in verifying allow_ptr_leaksYafang Shao2023-08-231-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we converted the capabilities of our networking-bpf program from cap_sys_admin to cap_net_admin+cap_bpf, our networking-bpf program failed to start. Because it failed the bpf verifier, and the error log is "R3 pointer comparison prohibited". A simple reproducer as follows, SEC("cls-ingress") int ingress(struct __sk_buff *skb) { struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); if ((long)(iph + 1) > (long)skb->data_end) return TC_ACT_STOLEN; return TC_ACT_OK; } Per discussion with Yonghong and Alexei [1], comparison of two packet pointers is not a pointer leak. This patch fixes it. Our local kernel is 6.1.y and we expect this fix to be backported to 6.1.y, so stable is CCed. [1]. https://lore.kernel.org/bpf/CAADnVQ+Nmspr7Si+pxWn8zkE7hX-7s93ugwC+94aXSy4uQ9vBg@mail.gmail.com/ Suggested-by: Yonghong Song <yonghong.song@linux.dev> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230823020703.3790-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Fix check_func_arg_reg_off bug for graph root/nodeKumar Kartikeya Dwivedi2023-08-221-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit being fixed introduced a hunk into check_func_arg_reg_off that bypasses reg->off == 0 enforcement when offset points to a graph node or root. This might possibly be done for treating bpf_rbtree_remove and others as KF_RELEASE and then later check correct reg->off in helper argument checks. But this is not the case, those helpers are already not KF_RELEASE and permit non-zero reg->off and verify it later to match the subobject in BTF type. However, this logic leads to bpf_obj_drop permitting free of register arguments with non-zero offset when they point to a graph root or node within them, which is not ok. For instance: struct foo { int i; int j; struct bpf_rb_node node; }; struct foo *f = bpf_obj_new(typeof(*f)); if (!f) ... bpf_obj_drop(f); // OK bpf_obj_drop(&f->i); // still ok from verifier PoV bpf_obj_drop(&f->node); // Not OK, but permitted right now Fix this by dropping the whole part of code altogether. Fixes: 6a3cd3318ff6 ("bpf: Migrate release_on_unlock logic to non-owning ref semantics") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230822175140.1317749-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Fix a bpf_kptr_xchg() issue with local kptrYonghong Song2023-08-221-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reviewing local percpu kptr support, Alexei discovered a bug wherea bpf_kptr_xchg() may succeed even if the map value kptr type and locally allocated obj type do not match ([1]). Missed struct btf_id comparison is the reason for the bug. This patch added such struct btf_id comparison and will flag verification failure if types do not match. [1] https://lore.kernel.org/bpf/20230819002907.io3iphmnuk43xblu@macbook-pro-8.dhcp.thefacebook.com/#t Reported-by: Alexei Starovoitov <ast@kernel.org> Fixes: 738c96d5e2e3 ("bpf: Allow local kptrs to be exchanged via bpf_kptr_xchg") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230822050053.2886960-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add bpf_get_func_ip helper support for uprobe linkJiri Olsa2023-08-211-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Adding support for bpf_get_func_ip helper being called from ebpf program attached by uprobe_multi link. It returns the ip of the uprobe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-7-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add pid filter support for uprobe_multi linkJiri Olsa2023-08-212-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support to specify pid for uprobe_multi link and the uprobes are created only for task with given pid value. Using the consumer.filter filter callback for that, so the task gets filtered during the uprobe installation. We still need to check the task during runtime in the uprobe handler, because the handler could get executed if there's another system wide consumer on the same uprobe (thanks Oleg for the insight). Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add cookies support for uprobe_multi linkJiri Olsa2023-08-212-5/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support to specify cookies array for uprobe_multi link. The cookies array share indexes and length with other uprobe_multi arrays (offsets/ref_ctr_offsets). The cookies[i] value defines cookie for i-the uprobe and will be returned by bpf_get_attach_cookie helper when called from ebpf program hooked to that specific uprobe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add multi uprobe linkJiri Olsa2023-08-212-3/+244
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding new multi uprobe link that allows to attach bpf program to multiple uprobes. Uprobes to attach are specified via new link_create uprobe_multi union: struct { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; __u32 cnt; __u32 flags; } uprobe_multi; Uprobes are defined for single binary specified in path and multiple calling sites specified in offsets array with optional reference counters specified in ref_ctr_offsets array. All specified arrays have length of 'cnt'. The 'flags' supports single bit for now that marks the uprobe as return probe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add attach_type checks under bpf_prog_attach_check_attach_typeJiri Olsa2023-08-211-68/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | Add extra attach_type checks from link_create under bpf_prog_attach_check_attach_type. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf, cpumask: Clean up bpf_cpu_map_entry directly in cpu_map_freeHou Tao2023-08-211-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | After synchronous_rcu(), both the dettached XDP program and xdp_do_flush() are completed, and the only user of bpf_cpu_map_entry will be cpu_map_kthread_run(), so instead of calling __cpu_map_entry_replace() to stop kthread and cleanup entry after a RCU grace period, do these things directly. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20230816045959.358059-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf, cpumap: Use queue_rcu_work() to remove unnecessary rcu_barrier()Hou Tao2023-08-211-69/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As for now __cpu_map_entry_replace() uses call_rcu() to wait for the inflight xdp program to exit the RCU read critical section, and then launch kworker cpu_map_kthread_stop() to call kthread_stop() to flush all pending xdp frames or skbs. But it is unnecessary to use rcu_barrier() in cpu_map_kthread_stop() to wait for the completion of __cpu_map_entry_free(), because rcu_barrier() will wait for all pending RCU callbacks and cpu_map_kthread_stop() only needs to wait for the completion of a specific __cpu_map_entry_free(). So use queue_rcu_work() to replace call_rcu(), schedule_work() and rcu_barrier(). queue_rcu_work() will queue a __cpu_map_entry_free() kworker after a RCU grace period. Because __cpu_map_entry_free() is running in a kworker context, so it is OK to do all of these freeing procedures include kthread_stop() in it. After the update, there is no need to do reference-counting for bpf_cpu_map_entry, because bpf_cpu_map_entry is freed directly in __cpu_map_entry_free(), so just remove it. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20230816045959.358059-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-08-181-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge tag 'pm-6.5-rc6' of ↵Linus Torvalds2023-08-111-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an amd-pstate cpufreq driver issues and recently introduced hibernation-related breakage. Specifics: - Make amd-pstate use device_attributes as expected by the CPU root kobject (Thomas Weißschuh) - Restore the previous behavior of resume_store() when hibernation is not available which is to return the full number of bytes that were to be written by user space (Vlastimil Babka)" * tag 'pm-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: amd-pstate: fix global sysfs attribute type PM: hibernate: fix resume_store() return value when hibernation not available
| | * PM: hibernate: fix resume_store() return value when hibernation not availableVlastimil Babka2023-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a laptop with hibernation set up but not actively used, and with secure boot and lockdown enabled kernel, 6.5-rc1 gets stuck on boot with the following repeated messages: A start job is running for Resume from hibernation using device /dev/system/swap (24s / no limit) lockdown_is_locked_down: 25311154 callbacks suppressed Lockdown: systemd-hiberna: hibernation is restricted; see man kernel_lockdown.7 ... Checking the resume code leads to commit cc89c63e2fe3 ("PM: hibernate: move finding the resume device out of software_resume") which inadvertently changed the return value from resume_store() to 0 when !hibernation_available(). This apparently translates to userspace write() returning 0 as in number of bytes written, and userspace looping indefinitely in the attempt to write the intended value. Fix this by returning the full number of bytes that were to be written, as that's what was done before the commit. Fixes: cc89c63e2fe3 ("PM: hibernate: move finding the resume device out of software_resume") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe()Yafang Shao2023-08-161-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") leads to the following Smatch static checker warning: kernel/bpf/syscall.c:3416 bpf_perf_link_fill_kprobe() error: uninitialized symbol 'type'. That can happens when uname is NULL. So fix it by verifying the uname when we really need to fill it. Fixes: 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Closes: https://lore.kernel.org/bpf/85697a7e-f897-4f74-8b43-82721bebc462@kili.mountain Link: https://lore.kernel.org/bpf/20230813141900.1268-2-laoar.shao@gmail.com
* | | bpf: Support default .validate() and .update() behavior for struct_ops linksDavid Vernet2023-08-141-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also define the .validate() and .update() callbacks in its corresponding struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful in its own right to ensure that the map is unloaded if an application crashes. For example, with sched_ext, we want to automatically unload the host-wide scheduler if the application crashes. We would likely never support updating elements of a sched_ext struct_ops map, so we'd have to implement these callbacks showing that they _can't_ support element updates just to benefit from the basic lifetime management of struct_ops links. Let's enable struct_ops maps to work with BPF_F_LINK even if they haven't defined these callbacks, by assuming that a struct_ops map element cannot be updated by default. Acked-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230814185908.700553-2-void@manifault.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
* | | Merge tag 'for-netdev' of ↵Jakub Kicinski2023-08-109-25/+44
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2023-08-09 We've added 19 non-merge commits during the last 6 day(s) which contain a total of 25 files changed, 369 insertions(+), 141 deletions(-). The main changes are: 1) Fix array-index-out-of-bounds access when detaching from an already empty mprog entry from Daniel Borkmann. 2) Adjust bpf selftest because of a recent llvm change related to the cpu-v4 ISA from Eduard Zingerman. 3) Add uprobe support for the bpf_get_func_ip helper from Jiri Olsa. 4) Fix a KASAN splat due to the kernel incorrectly accepted an invalid program using the recent cpu-v4 instruction from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf: btf: Remove two unused function declarations bpf: lru: Remove unused declaration bpf_lru_promote() selftests/bpf: relax expected log messages to allow emitting BPF_ST selftests/bpf: remove duplicated functions bpf, docs: Fix small typo and define semantics of sign extension selftests/bpf: Add bpf_get_func_ip test for uprobe inside function selftests/bpf: Add bpf_get_func_ip tests for uprobe on function entry bpf: Add support for bpf_get_func_ip helper for uprobe program selftests/bpf: Add a movsx selftest for sign-extension of R10 bpf: Fix an incorrect verification success with movsx insn bpf, docs: Formalize type notation and function semantics in ISA standard bpf: change bpf_alu_sign_string and bpf_movsx_string to static libbpf: Use local includes inside the library bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR. bpf: fix inconsistent return types of bpf_xdp_copy_buf(). selftests/bpf: fix the incorrect verification of port numbers. selftests/bpf: Add test for detachment on empty mprog entry bpf: Fix mprog detachment for empty mprog entry bpf: bpf_struct_ops: Remove unnecessary initial values of variables ==================== Link: https://lore.kernel.org/r/20230810055123.109578-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | bpf: lru: Remove unused declaration bpf_lru_promote()Yue Haibing2023-08-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3a08c2fd7634 ("bpf: LRU List") declared but never implemented this. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808145531.19692-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: Add support for bpf_get_func_ip helper for uprobe programJiri Olsa2023-08-073-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support for bpf_get_func_ip helper for uprobe program to return probed address for both uprobe and return uprobe. We discussed this in [1] and agreed that uprobe can have special use of bpf_get_func_ip helper that differs from kprobe. The kprobe bpf_get_func_ip returns: - address of the function if probe is attach on function entry for both kprobe and return kprobe - 0 if the probe is not attach on function entry The uprobe bpf_get_func_ip returns: - address of the probe for both uprobe and return uprobe The reason for this semantic change is that kernel can't really tell if the probe user space address is function entry. The uprobe program is actually kprobe type program attached as uprobe. One of the consequences of this design is that uprobes do not have its own set of helpers, but share them with kprobes. As we need different functionality for bpf_get_func_ip helper for uprobe, I'm adding the bool value to the bpf_trace_run_ctx, so the helper can detect that it's executed in uprobe context and call specific code. The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which is currently used only for executing bpf programs in uprobe. Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe to address that it's only used for uprobes and that it sets the run_ctx.is_uprobe as suggested by Yafang Shao. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Viktor Malik <vmalik@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: Fix an incorrect verification success with movsx insnYonghong Song2023-08-071-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports a verifier bug which triggers a runtime panic. The test bpf program is: 0: (62) *(u32 *)(r10 -8) = 553656332 1: (bf) r1 = (s16)r10 2: (07) r1 += -8 3: (b7) r2 = 3 4: (bd) if r2 <= r1 goto pc+0 5: (85) call bpf_trace_printk#-138320 6: (b7) r0 = 0 7: (95) exit At insn 1, the current implementation keeps 'r1' as a frame pointer, which caused later bpf_trace_printk helper call crash since frame pointer address is not valid any more. Note that at insn 4, the 'pointer vs. scalar' comparison is allowed for privileged prog run. To fix the problem with above insn 1, the fix in the patch adopts similar pattern to existing 'R1 = (u32) R2' handling. For unprivileged prog run, verification will fail with 'R<num> sign-extension part of pointer'. For privileged prog run, the dst_reg 'r1' will be marked as an unknown scalar, so later 'bpf_trace_pointk' helper will complain since it expected certain pointers. Reported-by: syzbot+d61b595e9205573133b3@syzkaller.appspotmail.com Fixes: 8100928c8814 ("bpf: Support new sign-extension mov insns") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230807175721.671696-1-yonghong.song@linux.dev Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: change bpf_alu_sign_string and bpf_movsx_string to staticYang Yingliang2023-08-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bpf_alu_sign_string and bpf_movsx_string introduced in commit f835bb622299 ("bpf: Add kernel/bpftool asm support for new instructions") are only used in disasm.c now, change them to static. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308050615.wxAn1v2J-lkp@intel.com/ Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230803023128.3753323-1-yangyingliang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR.Kui-Feng Lee2023-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Verify if the pointer obtained from bpf_xdp_pointer() is either an error or NULL before returning it. The function bpf_dynptr_slice() mistakenly returned an ERR_PTR. Instead of solely checking for NULL, it should also verify if the pointer returned by bpf_xdp_pointer() is an error or NULL. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/bpf/d1360219-85c3-4a03-9449-253ea905f9d1@moroto.mountain/ Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr") Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230803231206.1060485-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: Fix mprog detachment for empty mprog entryDaniel Borkmann2023-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported an UBSAN array-index-out-of-bounds access in bpf_mprog_read() upon bpf_mprog_detach(). While it did not have a reproducer, I was able to manually reproduce through an empty mprog entry which just has miniq present. The latter is important given otherwise we get an ENOENT error as tcx detaches the whole mprog entry. The index 4294967295 was triggered via NULL dtuple.prog which then attempts to detach from the back. bpf_mprog_fetch() in this case did hit the idx == total and therefore tried to grab the entry at idx -1. Fix it by adding an explicit bpf_mprog_total() check in bpf_mprog_detach() and bail out early with ENOENT. Fixes: 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs") Reported-by: syzbot+0c06ba0f831fe07a8f27@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230804131112.11012-1-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: bpf_struct_ops: Remove unnecessary initial values of variablesLi kunyu2023-08-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | err and tlinks is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li kunyu <kunyu@nfschina.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230804175929.2867-1-kunyu@nfschina.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-08-101-1/+42
|\ \ \ \ | |/ / / |/| / / | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Merge tag 'wq-for-6.5-rc5-fixes' of ↵Linus Torvalds2023-08-071-1/+42
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - The recently added cpu_intensive auto detection and warning mechanism was spuriously triggered on slow CPUs. While not causing serious issues, it's still a nuisance and can cause unintended concurrency management behaviors. Relax the threshold on machines with lower BogoMIPS. While BogoMIPS is not an accurate measure of performance by most measures, we don't have to be accurate and it has rough but strong enough correlation. - A correction in Kconfig help text * tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 workqueue: Fix cpu_intensive_thresh_us name in help text
| | * workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000Tejun Heo2023-07-251-1/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work items. Once detected, they're excluded from concurrency management to prevent them from blocking other per-cpu work items. If CONFIG_WQ_CPU_INTENSIVE_REPORT is enabled, repeat offenders are also reported so that the code can be updated. The default threshold is 10ms which is long enough to do fair bit of work on modern CPUs while short enough to be usually not noticeable. This unfortunately leads to a lot of, arguable spurious, detections on very slow CPUs. Using the same threshold across CPUs whose performance levels may be apart by multiple levels of magnitude doesn't make whole lot of sense. This patch scales up wq_cpu_intensive_thresh_us upto 1 second when BogoMIPS is below 4000. This is obviously very inaccurate but it doesn't have to be accurate to be useful. The mechanism is still useful when the threshold is fully scaled up and the benefits of reports are usually shared with everyone regardless of who's reporting, so as long as there are sufficient number of fast machines reporting, we don't lose much. Some (or is it all?) ARM CPUs systemtically report significantly lower BogoMIPS. While this doesn't break anything, given how widespread ARM CPUs are, it's at least a missed opportunity and it probably would be a good idea to teach workqueue about it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
* | | Merge tag 'for-netdev' of ↵Jakub Kicinski2023-08-0310-131/+536
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2023-08-03 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 84 files changed, 4026 insertions(+), 562 deletions(-). The main changes are: 1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer, Daniel Borkmann 2) Support new insns from cpu v4 from Yonghong Song 3) Non-atomically allocate freelist during prefill from YiFei Zhu 4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu 5) Add tracepoint to xdp attaching failure from Leon Hwang 6) struct netdev_rx_queue and xdp.h reshuffling to reduce rebuild time from Jakub Kicinski * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) net: invert the netdevice.h vs xdp.h dependency net: move struct netdev_rx_queue out of netdevice.h eth: add missing xdp.h includes in drivers selftests/bpf: Add testcase for xdp attaching failure tracepoint bpf, xdp: Add tracepoint to xdp attaching failure selftests/bpf: fix static assert compilation issue for test_cls_*.c bpf: fix bpf_probe_read_kernel prototype mismatch riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework libbpf: fix typos in Makefile tracing: bpf: use struct trace_entry in struct syscall_tp_t bpf, devmap: Remove unused dtab field from bpf_dtab_netdev bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry netfilter: bpf: Only define get_proto_defrag_hook() if necessary bpf: Fix an array-index-out-of-bounds issue in disasm.c net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn docs/bpf: Fix malformed documentation bpf: selftests: Add defrag selftests bpf: selftests: Support custom type and proto for client sockets bpf: selftests: Support not connecting client socket netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link ... ==================== Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: invert the netdevice.h vs xdp.h dependencyJakub Kicinski2023-08-033-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xdp.h is far more specific and is included in only 67 other files vs netdevice.h's 1538 include sites. Make xdp.h include netdevice.h, instead of the other way around. This decreases the incremental allmodconfig builds size when xdp.h is touched from 5947 to 662 objects. Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h is a mega-header in its own right so it's nice to avoid xdp.h getting included there as well. The only unfortunate part is that the typedef for xdp_features_t has to move to netdevice.h, since its embedded in struct netdevice. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: fix bpf_probe_read_kernel prototype mismatchArnd Bergmann2023-08-022-20/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_probe_read_kernel() has a __weak definition in core.c and another definition with an incompatible prototype in kernel/trace/bpf_trace.c, when CONFIG_BPF_EVENTS is enabled. Since the two are incompatible, there cannot be a shared declaration in a header file, but the lack of a prototype causes a W=1 warning: kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes] On 32-bit architectures, the local prototype u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) passes arguments in other registers as the one in bpf_trace.c BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) which uses 64-bit arguments in pairs of registers. As both versions of the function are fairly simple and only really differ in one line, just move them into a header file as an inline function that does not add any overhead for the bpf_trace.c callers and actually avoids a function call for the other one. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.net/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | tracing: bpf: use struct trace_entry in struct syscall_tp_tYauheni Kaliuta2023-08-011-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf tracepoint program uses struct trace_event_raw_sys_enter as argument where trace_entry is the first field. Use the same instead of unsigned long long since if it's amended (for example by RT patch) it accesses data with wrong offset. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230801075222.7717-1-ykaliuta@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf, devmap: Remove unused dtab field from bpf_dtab_netdevHou Tao2023-07-311-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances") removes the use of bpf_dtab_netdev::dtab in bq_enqueue(), so just remove dtab from bpf_dtab_netdev. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230728014942.892272-3-houtao@huaweicloud.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entryHou Tao2023-07-311-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit cdfafe98cabe ("xdp: Make cpumap flush_list common for all map instances"), cmap is no longer used, so just remove it. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230728014942.892272-2-houtao@huaweicloud.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * | | bpf: Fix an array-index-out-of-bounds issue in disasm.cYonghong Song2023-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported an array-index-out-of-bounds when printing out bpf insns. Further investigation shows the insn is illegal but is printed out due to log level 1 or 2 before actual insn verification in do_check(). This particular illegal insn is a MOVSX insn with offset value 2. The legal offset value for MOVSX should be 8, 16 and 32. The disasm sign-extension-size array index is calculated as (insn->off / 8) - 1 and offset value 2 gives an out-of-bound index -1. Tighten the checking for MOVSX insn in disasm.c to avoid array-index-out-of-bounds issue. Reported-by: syzbot+3758842a6c01012aa73b@syzkaller.appspotmail.com Fixes: f835bb622299 ("bpf: Add kernel/bpftool asm support for new instructions") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230731204534.1975311-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Non-atomically allocate freelist during prefillYiFei Zhu2023-07-281-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In internal testing of test_maps, we sometimes observed failures like: test_maps: test_maps.c:173: void test_hashmap_percpu(unsigned int, void *): Assertion `bpf_map_update_elem(fd, &key, value, BPF_ANY) == 0' failed. where the errno is ENOMEM. After some troubleshooting and enabling the warnings, we saw: [ 91.304708] percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left [ 91.304716] CPU: 51 PID: 24145 Comm: test_maps Kdump: loaded Tainted: G N 6.1.38-smp-DEV #7 [ 91.304719] Hardware name: Google Astoria/astoria, BIOS 0.20230627.0-0 06/27/2023 [ 91.304721] Call Trace: [ 91.304724] <TASK> [ 91.304730] [<ffffffffa7ef83b9>] dump_stack_lvl+0x59/0x88 [ 91.304737] [<ffffffffa7ef83f8>] dump_stack+0x10/0x18 [ 91.304738] [<ffffffffa75caa0c>] pcpu_alloc+0x6fc/0x870 [ 91.304741] [<ffffffffa75ca302>] __alloc_percpu_gfp+0x12/0x20 [ 91.304743] [<ffffffffa756785e>] alloc_bulk+0xde/0x1e0 [ 91.304746] [<ffffffffa7566c02>] bpf_mem_alloc_init+0xd2/0x2f0 [ 91.304747] [<ffffffffa7547c69>] htab_map_alloc+0x479/0x650 [ 91.304750] [<ffffffffa751d6e0>] map_create+0x140/0x2e0 [ 91.304752] [<ffffffffa751d413>] __sys_bpf+0x5a3/0x6c0 [ 91.304753] [<ffffffffa751c3ec>] __x64_sys_bpf+0x1c/0x30 [ 91.304754] [<ffffffffa7ef847a>] do_syscall_64+0x5a/0x80 [ 91.304756] [<ffffffffa800009b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd This makes sense, because in atomic context, percpu allocation would not create new chunks; it would only create in non-atomic contexts. And if during prefill all precpu chunks are full, -ENOMEM would happen immediately upon next unit_alloc. Prefill phase does not actually run in atomic context, so we can use this fact to allocate non-atomically with GFP_KERNEL instead of GFP_NOWAIT. This avoids the immediate -ENOMEM. GFP_NOWAIT has to be used in unit_alloc when bpf program runs in atomic context. Even if bpf program runs in non-atomic context, in most cases, rcu read lock is enabled for the program so GFP_NOWAIT is still needed. This is often also the case for BPF_MAP_UPDATE_ELEM syscalls. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230728043359.3324347-1-zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Fix compilation warning with -WparenthesesYonghong Song2023-07-282-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel test robot reported compilation warnings when -Wparentheses is added to KBUILD_CFLAGS with gcc compiler. The following is the error message: .../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_reg_to_size_sx’: .../bpf-next/kernel/bpf/verifier.c:5901:14: error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses] if (s64_max >= 0 == s64_min >= 0) { ~~~~~~~~^~~~ .../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_subreg_to_size_sx’: .../bpf-next/kernel/bpf/verifier.c:5965:14: error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses] if (s32_min >= 0 == s32_max >= 0) { ~~~~~~~~^~~~ To fix the issue, add proper parentheses for the above '>=' condition to silence the warning/error. I tried a few clang compilers like clang16 and clang18 and they do not emit such warnings with -Wparentheses. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307281133.wi0c4SqG-lkp@intel.com/ Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230728055740.2284534-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Add kernel/bpftool asm support for new instructionsYonghong Song2023-07-271-6/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add asm support for new instructions so kernel verifier and bpftool xlated insn dumps can have proper asm syntax for new instructions. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Support new 32bit offset jmp instructionYonghong Song2023-07-272-13/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add interpreter/jit/verifier support for 32bit offset jmp instruction. If a conditional jmp instruction needs more than 16bit offset, it can be simulated with a conditional jmp + a 32bit jmp insn. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011231.3716103-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Fix jit blinding with new sdiv/smov insnsYonghong Song2023-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle new insns properly in bpf_jit_blind_insn() function. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Support new signed div/mod instructions.Yonghong Song2023-07-272-18/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add interpreter/jit support for new signed div/mod insns. The new signed div/mod instructions are encoded with unsigned div/mod instructions plus insn->off == 1. Also add basic verifier support to ensure new insns get accepted. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Support new unconditional bswap instructionYonghong Song2023-07-272-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing 'be' and 'le' insns will do conditional bswap depends on host endianness. This patch implements unconditional bswap insns. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Handle sign-extenstin ctx member accessesYonghong Song2023-07-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if user accesses a ctx member with signed types, the compiler will generate an unsigned load followed by necessary left and right shifts. With the introduction of sign-extension load, compiler may just emit a ldsx insn instead. Let us do a final movsx sign extension to the final unsigned ctx load result to satisfy original sign extension requirement. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011207.3712528-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Support new sign-extension mov insnsYonghong Song2023-07-272-28/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add interpreter/jit support for new sign-extension mov insns. The original 'MOV' insn is extended to support reg-to-reg signed version for both ALU and ALU64 operations. For ALU mode, the insn->off value of 8 or 16 indicates sign-extension from 8- or 16-bit value to 32-bit value. For ALU64 mode, the insn->off value of 8/16/32 indicates sign-extension from 8-, 16- or 32-bit value to 64-bit value. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: Support new sign-extension load insnsYonghong Song2023-07-272-24/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add interpreter/jit support for new sign-extension load insns which adds a new mode (BPF_MEMSX). Also add verifier support to recognize these insns and to do proper verification with new insns. In verifier, besides to deduce proper bounds for the dst_reg, probed memory access is also properly handled. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>