summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bpf, selftests: Use single cgroup helpers for both test_sockmap/progsJohn Fastabend2020-08-0115-132/+43
| | | | | | | | | | | | | | Nearly every user of cgroup helpers does the same sequence of API calls. So push these into a single helper cgroup_setup_and_join. The cases that do a bit of extra logic are test_progs which currently uses an env variable to decide if it needs to setup the cgroup environment or can use an existingi environment. And then tests that are doing cgroup tests themselves. We skip these cases for now. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159623335418.30208.15807461815525100199.stgit@john-XPS-13-9370
* Documentation/bpf: Use valid and new links in index.rstTiezhu Yang2020-07-312-6/+8
| | | | | | | | | | | | | | | | | There exists an error "404 Not Found" when I click the html link of "Documentation/networking/filter.rst" in the BPF documentation [1], fix it. Additionally, use the new links about "BPF and XDP Reference Guide" and "bpf(2)" to avoid redirects. [1] https://www.kernel.org/doc/html/latest/bpf/ Fixes: d9b9170a2653 ("docs: bpf: Rename README.rst to index.rst") Fixes: cb3f0d56e153 ("docs: networking: convert filter.txt to ReST") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1596184142-18476-1-git-send-email-yangtiezhu@loongson.cn
* libbpf: Fix register in PT_REGS MIPS macrosJerry Crunchtime2020-07-311-2/+2
| | | | | | | | | | | | The o32, n32 and n64 calling conventions require the return value to be stored in $v0 which maps to $2 register, i.e., the register 2. Fixes: c1932cd ("bpf: Add MIPS support to samples/bpf.") Signed-off-by: Jerry Crunchtime <jerry.c.t@web.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/43707d31-0210-e8f0-9226-1af140907641@web.de
* udp, bpf: Ignore connections in reuseport group after BPF sk lookupJakub Sitnicki2020-07-312-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | When BPF sk lookup invokes reuseport handling for the selected socket, it should ignore the fact that reuseport group can contain connected UDP sockets. With BPF sk lookup this is not relevant as we are not scoring sockets to find the best match, which might be a connected UDP socket. Fix it by unconditionally accepting the socket selected by reuseport. This fixes the following two failures reported by test_progs. # ./test_progs -t sk_lookup ... #73/14 UDP IPv4 redir and reuseport with conns:FAIL ... #73/20 UDP IPv6 redir and reuseport with conns:FAIL ... Fixes: a57066b1a019 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com
* libbpf: Make destructors more robust by handling ERR_PTR(err) casesAndrii Nakryiko2020-07-313-8/+7
| | | | | | | | | | | | | | | Most of libbpf "constructors" on failure return ERR_PTR(err) result encoded as a pointer. It's a common mistake to eventually pass such malformed pointers into xxx__destroy()/xxx__free() "destructors". So instead of fixing up clean up code in selftests and user programs, handle such error pointers in destructors themselves. This works beautifully for NULL pointers passed to destructors, so might as well just work for error pointers. Suggested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200729232148.896125-1-andriin@fb.com
* selftests/bpf: Omit nodad flag when adding addresses to loopbackJakub Sitnicki2020-07-311-2/+2
| | | | | | | | | | | | | | | | | | Setting IFA_F_NODAD flag for IPv6 addresses to add to loopback is unnecessary. Duplicate Address Detection does not happen on loopback device. Also, passing 'nodad' flag to 'ip address' breaks libbpf CI, which runs in an environment with BusyBox implementation of 'ip' command, that doesn't understand this flag. Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Andrii Nakryiko <andrii@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200730125325.1869363-1-jakub@cloudflare.com
* selftests/bpf: Don't destroy failed linkAndrii Nakryiko2020-07-311-14/+28
| | | | | | | | | | | | Check that link is NULL or proper pointer before invoking bpf_link__destroy(). Not doing this causes crash in test_progs, when cg_storage_multi selftest fails. Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200729045056.3363921-1-andriin@fb.com
* selftests/bpf: Add xdpdrv mode for test_xdp_redirectHangbin Liu2020-07-311-32/+52
| | | | | | | | | | | | | | | This patch add xdpdrv mode for test_xdp_redirect.sh since veth has support native mode. After update here is the test result: # ./test_xdp_redirect.sh selftests: test_xdp_redirect xdpgeneric [PASS] selftests: test_xdp_redirect xdpdrv [PASS] Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: William Tu <u9012063@gmail.com> Link: https://lore.kernel.org/bpf/20200729085658.403794-1-liuhangbin@gmail.com
* selftests/bpf: Verify socket storage in cgroup/sock_{create, release}Stanislav Fomichev2020-07-311-0/+19
| | | | | | | | | | | Augment udp_limit test to set and verify socket storage value. That should be enough to exercise the changes from the previous patch. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200729003104.1280813-2-sdf@google.com
* bpf: Expose socket storage to BPF_PROG_TYPE_CGROUP_SOCKStanislav Fomichev2020-07-312-0/+13
| | | | | | | | | | | | | | | | | | | | | | This lets us use socket storage from the following hooks: * BPF_CGROUP_INET_SOCK_CREATE * BPF_CGROUP_INET_SOCK_RELEASE * BPF_CGROUP_INET4_POST_BIND * BPF_CGROUP_INET6_POST_BIND Using existing 'bpf_sk_storage_get_proto' doesn't work because second argument is ARG_PTR_TO_SOCKET. Even though BPF_PROG_TYPE_CGROUP_SOCK hooks operate on 'struct bpf_sock', the verifier still considers it as a PTR_TO_CTX. That's why I'm adding another 'bpf_sk_storage_get_cg_sock_proto' definition strictly for BPF_PROG_TYPE_CGROUP_SOCK which accepts ARG_PTR_TO_CTX which is really 'struct sock' for this program type. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200729003104.1280813-1-sdf@google.com
* selftests/bpf: Test bpf_iter buffer access with negative offsetYonghong Song2020-07-312-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit afbf21dce668 ("bpf: Support readonly/readwrite buffers in verifier") added readonly/readwrite buffer support which is currently used by bpf_iter tracing programs. It has a bug with incorrect parameter ordering which later fixed by Commit f6dfbe31e8fa ("bpf: Fix swapped arguments in calls to check_buffer_access"). This patch added a test case with a negative offset access which will trigger the error path. Without Commit f6dfbe31e8fa, running the test case in the patch, the error message looks like: R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0 ; value_sum += *(__u32 *)(value - 4); 2: (61) r1 = *(u32 *)(r1 -4) R1 invalid (null) buffer access: off=-4, size=4 With the above commit, the error message looks like: R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0 ; value_sum += *(__u32 *)(value - 4); 2: (61) r1 = *(u32 *)(r1 -4) R1 invalid rdwr buffer access: off=-4, size=4 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200728221801.1090406-1-yhs@fb.com
* bpf: Add missing newline characters in verifier error messagesYonghong Song2020-07-311-2/+2
| | | | | | | | | | | | Newline characters are added in two verifier error messages, refactored in Commit afbf21dce668 ("bpf: Support readonly/readwrite buffers in verifier"). This way, they do not mix with messages afterwards. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200728221801.1090349-1-yhs@fb.com
* bpf, arm64: Add BPF exception tablesJean-Philippe Brucker2020-07-313-9/+108
| | | | | | | | | | | | | | | | | | | | | | When a tracing BPF program attempts to read memory without using the bpf_probe_read() helper, the verifier marks the load instruction with the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize this flag it falls back to the interpreter. Add support for BPF_PROBE_MEM, by appending an exception table to the BPF program. If the load instruction causes a data abort, the fixup infrastructure finds the exception table and fixes up the fault, by clearing the destination register and jumping over the faulting instruction. To keep the compact exception table entry format, inspect the pc in fixup_exception(). A more generic solution would add a "handler" field to the table entry, like on x86 and s390. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200728152122.1292756-2-jean-philippe@linaro.org
* bpf: Fix build without CONFIG_NET when using BPF XDP linkAndrii Nakryiko2020-07-291-0/+2
| | | | | | | | | | | | | | Entire net/core subsystem is not built without CONFIG_NET. linux/netdevice.h just assumes that it's always there, so the easiest way to fix this is to conditionally compile out bpf_xdp_link_attach() use in bpf/syscall.c. Fixes: aa8d3a716b59 ("bpf, xdp: Add bpf_link-based XDP attachment API") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200728190527.110830-1-andriin@fb.com
* bpf, selftests: use :: 1 for localhost in tcp_server.pyJohn Fastabend2020-07-293-4/+4
| | | | | | | | | | | | | | | Using localhost requires the host to have a /etc/hosts file with that specific line in it. By default my dev box did not, they used ip6-localhost, so the test was failing. To fix remove the need for any /etc/hosts and use ::1. I could just add the line, but this seems easier. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159594714197.21431.10113693935099326445.stgit@john-Precision-5820-Tower
* xdp: Prevent kernel-infoleak in xsk_getsockopt()Peilin Ye2020-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | xsk_getsockopt() is copying uninitialized stack memory to userspace when 'extra_stats' is 'false'. Fix it. Doing '= {};' is sufficient since currently 'struct xdp_statistics' is defined as follows: struct xdp_statistics { __u64 rx_dropped; __u64 rx_invalid_descs; __u64 tx_invalid_descs; __u64 rx_ring_full; __u64 rx_fill_ring_empty_descs; __u64 tx_ring_empty_descs; }; When being copied to the userspace, 'stats' will not contain any uninitialized 'holes' between struct fields. Fixes: 8aa5a33578e9 ("xsk: Add new statistics") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/bpf/20200728053604.404631-1-yepeilin.cs@gmail.com
* bpf: Fix swapped arguments in calls to check_buffer_accessColin Ian King2020-07-281-4/+4
| | | | | | | | | | | | | | There are a couple of arguments of the boolean flag zero_size_allowed and the char pointer buf_info when calling to function check_buffer_access that are swapped by mistake. Fix these by swapping them to correct the argument ordering. Fixes: afbf21dce668 ("bpf: Support readonly/readwrite buffers in verifier") Addresses-Coverity: ("Array compared to 0") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200727175411.155179-1-colin.king@canonical.com
* selftests/bpf: Add new bpf_iter context structs to fix build on old kernelsAndrii Nakryiko2020-07-281-0/+18
| | | | | | | | | | | Add bpf_iter__bpf_map_elem and bpf_iter__bpf_sk_storage_map to bpf_iter.h. Fixes: 3b1c420bd882 ("selftests/bpf: Add a test for bpf sk_storage_map iterator") Fixes: 2a7c2fff7dd6 ("selftests/bpf: Add test for bpf hash map iterators") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200727233345.1686358-1-andriin@fb.com
* bpf: Fix bpf_ringbuf_output() signature to return longAndrii Nakryiko2020-07-282-2/+2
| | | | | | | | | | | | Due to bpf tree fix merge, bpf_ringbuf_output() signature ended up with int as a return type, while all other helpers got converted to returning long. So fix it in bpf-next now. Fixes: b0659d8a950d ("bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200727224715.652037-1-andriin@fb.com
* tools, bpftool: Add LSM type to array of prog namesQuentin Monnet2020-07-271-0/+1
| | | | | | | | | | Assign "lsm" as a printed name for BPF_PROG_TYPE_LSM in bpftool, so that it can use it when listing programs loaded on the system or when probing features. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200724090618.16378-3-quentin@isovalent.com
* tools, bpftool: Skip type probe if name is not foundQuentin Monnet2020-07-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | For probing program and map types, bpftool loops on type values and uses the relevant type name in prog_type_name[] or map_type_name[]. To ensure the name exists, we exit from the loop if we go over the size of the array. However, this is not enough in the case where the arrays have "holes" in them, program or map types for which they have no name, but not at the end of the list. This is currently the case for BPF_PROG_TYPE_LSM, not known to bpftool and which name is a null string. When probing for features, bpftool attempts to strlen() that name and segfaults. Let's fix it by skipping probes for "unknown" program and map types, with an informational message giving the numeral value in that case. Fixes: 93a3545d812a ("tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type") Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200724090618.16378-2-quentin@isovalent.com
* Merge branch 'bpf_link-XDP'Alexei Starovoitov2020-07-2539-383/+658
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrii Nakryiko says: ==================== Following cgroup and netns examples, implement bpf_link support for XDP. The semantics is described in patch #2. Program and link attachments are mutually exclusive, in the sense that neither link can replace attached program nor program can replace attached link. Link can't replace attached link as well, as is the case for any other bpf_link implementation. Patch #1 refactors existing BPF program-based attachment API and centralizes high-level query/attach decisions in generic kernel code, while drivers are kept simple and are instructed with low-level decisions about attaching and detaching specific bpf_prog. This also makes QUERY command unnecessary, and patch #8 removes support for it from all kernel drivers. If that's a bad idea, we can drop that patch altogether. With refactoring in patch #1, adding bpf_xdp_link is completely transparent to drivers, they are still functioning at the level of "effective" bpf_prog, that should be called in XDP data path. Corresponding libbpf support for BPF XDP link is added in patch #5. v3->v4: - fix a compilation warning in one of drivers (Jakub); v2->v3: - fix build when CONFIG_BPF_SYSCALL=n (kernel test robot); v1->v2: - fix prog refcounting bug (David); - split dev_change_xdp_fd() changes into 2 patches (David); - add extack messages to all user-induced errors (David). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commandsAndrii Nakryiko2020-07-2528-218/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
| * selftests/bpf: Add BPF XDP link selftestsAndrii Nakryiko2020-07-252-0/+149
| | | | | | | | | | | | | | | | | | Add selftest validating all the attachment logic around BPF XDP link. Test also link updates and get_obj_info() APIs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-9-andriin@fb.com
| * libbpf: Add support for BPF XDP linkAndrii Nakryiko2020-07-255-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Sync UAPI header and add support for using bpf_link-based XDP attachment. Make xdp/ prog type set expected attach type. Kernel didn't enforce attach_type for XDP programs before, so there is no backwards compatiblity issues there. Also fix section_names selftest to recognize that xdp prog types now have expected attach type. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-8-andriin@fb.com
| * bpf: Implement BPF XDP link-specific introspection APIsAndrii Nakryiko2020-07-252-0/+34
| | | | | | | | | | | | | | | | Implement XDP link-specific show_fdinfo and link_info to emit ifindex. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com
| * bpf, xdp: Implement LINK_UPDATE for BPF XDP linkAndrii Nakryiko2020-07-251-0/+43
| | | | | | | | | | | | | | | | | | Add support for LINK_UPDATE command for BPF XDP link to enable reliable replacement of underlying BPF program. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com
| * bpf, xdp: Add bpf_link-based XDP attachment APIAndrii Nakryiko2020-07-254-7/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through BPF_LINK_CREATE command. bpf_xdp_link is mutually exclusive with direct BPF program attachment, previous BPF program should be detached prior to attempting to create a new bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it can't be replaced by other BPF program attachment or link attachment. It will be detached only when the last BPF link FD is closed. bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to how other BPF links behave (cgroup, flow_dissector). At that point bpf_link will become defunct, but won't be destroyed until last FD is closed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com
| * bpf, xdp: Extract common XDP program attachment logicAndrii Nakryiko2020-07-251-74/+91
| | | | | | | | | | | | | | | | | | | | | | Further refactor XDP attachment code. dev_change_xdp_fd() is split into two parts: getting bpf_progs from FDs and attachment logic, working with bpf_progs. This makes attachment logic a bit more straightforward and prepares code for bpf_xdp_link inclusion, which will share the common logic. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com
| * bpf, xdp: Maintain info on attached XDP BPF programs in net_deviceAndrii Nakryiko2020-07-253-75/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of delegating to drivers, maintain information about which BPF programs are attached in which XDP modes (generic/skb, driver, or hardware) locally in net_device. This effectively obsoletes XDP_QUERY_PROG command. Such re-organization simplifies existing code already. But it also allows to further add bpf_link-based XDP attachments without drivers having to know about any of this at all, which seems like a good setup. XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to install/uninstall active BPF program. All the higher-level concerns about prog/link interaction will be contained within generic driver-agnostic logic. All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed. It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags when resetting installed programs. That seems unnecessary, plus most drivers don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW should be enough of an indicator of what is required of driver to correctly reset active BPF program. dev_xdp_uninstall() is also generalized as an iteration over all three supported mode. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com
| * bpf: Make bpf_link API available indepently of CONFIG_BPF_SYSCALLAndrii Nakryiko2020-07-251-26/+55
|/ | | | | | | | | | | Similarly to bpf_prog, make bpf_link and related generic API available unconditionally to make it easier to have bpf_link support in various parts of the kernel. Stub out init/prime/settle/cleanup and inc/put APIs. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com
* bpf: Fix build on architectures with special bpf_user_pt_regs_tSong Liu2020-07-251-5/+4
| | | | | | | | | | | | Architectures like s390, powerpc, arm64, riscv have speical definition of bpf_user_pt_regs_t. So we need to cast the pointer before passing it to bpf_get_stack(). This is similar to bpf_get_stack_tp(). Fixes: 03d42fd2d83f ("bpf: Separate bpf_get_[stack|stackid] for perf events BPF") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200724200503.3629591-1-songliubraving@fb.com
* bpf/local_storage: Fix build without CONFIG_CGROUPYiFei Zhu2020-07-251-2/+2
| | | | | | | | | | | | local_storage.o has its compile guard as CONFIG_BPF_SYSCALL, which does not imply that CONFIG_CGROUP is on. Including cgroup-internal.h when CONFIG_CGROUP is off cause a compilation failure. Fixes: f67cfc233706 ("bpf: Make cgroup storages shared between programs on the same cgroup") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200724211753.902969-1-zhuyifei1999@gmail.com
* Merge branch 'shared-cgroup-storage'Alexei Starovoitov2020-07-2511-143/+905
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | YiFei Zhu says: ==================== To access the storage in a CGROUP_STORAGE map, one uses bpf_get_local_storage helper, which is extremely fast due to its use of per-CPU variables. However, its whole code is built on the assumption that one map can only be used by one program at any time, and this prohibits any sharing of data between multiple programs using these maps, eliminating a lot of use cases, such as some per-cgroup configuration storage, written to by a setsockopt program and read by a cg_sock_addr program. Why not use other map types? The great part of CGROUP_STORAGE map is that it is isolated by different cgroups its attached to. When one program uses bpf_get_local_storage, even on the same map, it gets different storages if it were run as a result of attaching to different cgroups. The kernel manages the storages, simplifying BPF program or userspace. In theory, one could probably use other maps like array or hash to do the same thing, but it would be a major overhead / complexity. Userspace needs to know when a cgroup is being freed in order to free up a space in the replacement map. This patch set introduces a significant change to the semantics of CGROUP_STORAGE map type. Instead of each storage being tied to one single attachment, it is shared across different attachments to the same cgroup, and persists until either the map or the cgroup attached to is being freed. User may use u64 as the key to the map, and the result would be that the attach type become ignored during key comparison, and programs of different attach types will share the same storage if the cgroups they are attached to are the same. How could this break existing users? * Users that uses detach & reattach / program replacement as a shortcut to zeroing the storage. Since we need sharing between programs, we cannot zero the storage. Users that expect this behavior should either attach a program with a new map, or explicitly zero the map with a syscall. This case is dependent on undocumented implementation details, so the impact should be very minimal. Patch 1 introduces a test on the old expected behavior of the map type. Patch 2 introduces a test showing how two programs cannot share one such map. Patch 3 implements the change of semantics to the map. Patch 4 amends the new test such that it yields the behavior we expect from the change. Patch 5 documents the map type. Changes since RFC: * Clarify commit message in patch 3 such that it says the lifetime of the storage is ended at the freeing of the cgroup_bpf, rather than the cgroup itself. * Restored an -ENOMEM check in __cgroup_bpf_attach. * Update selftests for recent change in network_helpers API. Changes since v1: * s/CHECK_FAIL/CHECK/ * s/bpf_prog_attach/bpf_program__attach_cgroup/ * Moved test__start_subtest to test_cg_storage_multi. * Removed some redundant CHECK_FAIL where they are already CHECK-ed. Changes since v2: * Lock cgroup_mutex during map_free. * Publish new storages only if attach is successful, by tracking exactly which storages are reused in an array of bools. * Mention bpftool map dump showing a value of zero for attach_type in patch 3 commit message. Changes since v3: * Use a much simpler lookup and allocate-if-not-exist from the fact that cgroup_mutex is locked during attach. * Removed an unnecessary spinlock hold. Changes since v4: * Changed semantics so that if the key type is struct bpf_cgroup_storage_key the map retains isolation between different attach types. Sharing between different attach types only occur when key type is u64. * Adapted tests and docs for the above change. Changes since v5: * Removed redundant NULL check before bpf_link__destroy. * Free BPF object explicitly, after asserting that object failed to load, in the event that the object did not fail to load. * Rename variable in bpf_cgroup_storage_key_cmp for clarity. * Added a lot of information to Documentation, more or less copied from what Martin KaFai Lau wrote. ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * Documentation/bpf: Document CGROUP_STORAGE map typeYiFei Zhu2020-07-252-0/+178
| | | | | | | | | | | | | | | | | | | | | | The machanics and usage are not very straightforward. Given the changes it's better to document how it works and how to use it, rather than having to rely on the examples and implementation to infer what is going on. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/b412edfbb05cb1077c9e2a36a981a54ee23fa8b3.1595565795.git.zhuyifei@google.com
| * selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingressYiFei Zhu2020-07-253-27/+311
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mirrors the original egress-only test. The cgroup_storage is now extended to have two packet counters, one for egress and one for ingress. We also extend to have two egress programs to test that egress will always share with other egress origrams in the same cgroup. The behavior of the counters are exactly the same as the original egress-only test. The test is split into two, one "isolated" test that when the key type is struct bpf_cgroup_storage_key, which contains the attach type, programs of different attach types will see different storages. The other, "shared" test that when the key type is u64, programs of different attach types will see the same storage if they are attached to the same cgroup. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/c756f5f1521227b8e6e90a453299dda722d7324d.1595565795.git.zhuyifei@google.com
| * bpf: Make cgroup storages shared between programs on the same cgroupYiFei Zhu2020-07-254-143/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change comes in several parts: One, the restriction that the CGROUP_STORAGE map can only be used by one program is removed. This results in the removal of the field 'aux' in struct bpf_cgroup_storage_map, and removal of relevant code associated with the field, and removal of now-noop functions bpf_free_cgroup_storage and bpf_cgroup_storage_release. Second, we permit a key of type u64 as the key to the map. Providing such a key type indicates that the map should ignore attach type when comparing map keys. However, for simplicity newly linked storage will still have the attach type at link time in its key struct. cgroup_storage_check_btf is adapted to accept u64 as the type of the key. Third, because the storages are now shared, the storages cannot be unconditionally freed on program detach. There could be two ways to solve this issue: * A. Reference count the usage of the storages, and free when the last program is detached. * B. Free only when the storage is impossible to be referred to again, i.e. when either the cgroup_bpf it is attached to, or the map itself, is freed. Option A has the side effect that, when the user detach and reattach a program, whether the program gets a fresh storage depends on whether there is another program attached using that storage. This could trigger races if the user is multi-threaded, and since nondeterminism in data races is evil, go with option B. The both the map and the cgroup_bpf now tracks their associated storages, and the storage unlink and free are removed from cgroup_bpf_detach and added to cgroup_bpf_release and cgroup_storage_map_free. The latter also new holds the cgroup_mutex to prevent any races with the former. Fourth, on attach, we reuse the old storage if the key already exists in the map, via cgroup_storage_lookup. If the storage does not exist yet, we create a new one, and publish it at the last step in the attach process. This does not create a race condition because for the whole attach the cgroup_mutex is held. We keep track of an array of new storages that was allocated and if the process fails only the new storages would get freed. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com
| * selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progsYiFei Zhu2020-07-254-11/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current assumption is that the lifetime of a cgroup storage is tied to the program's attachment. The storage is created in cgroup_bpf_attach, and released upon cgroup_bpf_detach and cgroup_bpf_release. Because the current semantics is that each attachment gets a completely independent cgroup storage, and you can have multiple programs attached to the same (cgroup, attach type) pair, the key of the CGROUP_STORAGE map, looking up the map with this pair could yield multiple storages, and that is not permitted. Therefore, the kernel verifier checks that two programs cannot share the same CGROUP_STORAGE map, even if they have different expected attach types, considering that the actual attach type does not always have to be equal to the expected attach type. The test creates a CGROUP_STORAGE map and make it shared across two different programs, one cgroup_skb/egress and one /ingress. It asserts that the two programs cannot be both loaded, due to verifier failure from the above reason. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/30a6b0da67ae6b0296c4d511bfb19c5f3d035916.1595565795.git.zhuyifei@google.com
| * selftests/bpf: Add test for CGROUP_STORAGE map on multiple attachesYiFei Zhu2020-07-252-0/+191
|/ | | | | | | | | | | | | | | | | | | | | This test creates a parent cgroup, and a child of that cgroup. It attaches a cgroup_skb/egress program that simply counts packets, to a global variable (ARRAY map), and to a CGROUP_STORAGE map. The program is first attached to the parent cgroup only, then to parent and child. The test cases sends a message within the child cgroup, and because the program is inherited across parent / child cgroups, it will trigger the egress program for both the parent and child, if they exist. The program, when looking up a CGROUP_STORAGE map, uses the cgroup and attach type of the attachment parameters; therefore, both attaches uses different cgroup storages. We assert that all packet counts returns what we expects. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/5a20206afa4606144691c7caa0d1b997cd60dec0.1595565795.git.zhuyifei@google.com
* Merge branch 'fix-bpf_get_stack-with-PEBS'Alexei Starovoitov2020-07-2510-21/+462
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Song Liu says: ==================== Calling get_perf_callchain() on perf_events from PEBS entries may cause unwinder errors. To fix this issue, perf subsystem fetches callchain early, and marks perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY. Similar issue exists when BPF program calls get_perf_callchain() via helper functions. For more information about this issue, please refer to discussions in [1]. This set fixes this issue with helper proto bpf_get_stackid_pe and bpf_get_stack_pe. [1] https://lore.kernel.org/lkml/ED7B9430-6489-4260-B3C5-9CFA2E3AA87A@fb.com/ Changes v4 => v5: 1. Return -EPROTO instead of -EINVAL on PERF_EVENT_IOC_SET_BPF errors. (Alexei) 2. Let libbpf print a hint message when PERF_EVENT_IOC_SET_BPF returns -EPROTO. (Alexei) Changes v3 => v4: 1. Fix error check logic in bpf_get_stackid_pe and bpf_get_stack_pe. (Alexei) 2. Do not allow attaching BPF programs with bpf_get_stack|stackid to perf_event with precise_ip > 0, but not proper callchain. (Alexei) 3. Add selftest get_stackid_cannot_attach. Changes v2 => v3: 1. Fix handling of stackmap skip field. (Andrii) 2. Simplify the code in a few places. (Andrii) Changes v1 => v2: 1. Simplify the design and avoid introducing new helper function. (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: Add get_stackid_cannot_attachSong Liu2020-07-251-0/+91
| | | | | | | | | | | | | | | | | | | | This test confirms that BPF program that calls bpf_get_stackid() cannot attach to perf_event with precise_ip > 0 but not PERF_SAMPLE_CALLCHAIN; and cannot attach if the perf_event has exclude_callchain_kernel. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-6-songliubraving@fb.com
| * selftests/bpf: Add callchain_stackidSong Liu2020-07-252-0/+175
| | | | | | | | | | | | | | | | | | | | | | This tests new helper function bpf_get_stackid_pe and bpf_get_stack_pe. These two helpers have different implementation for perf_event with PEB entries. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200723180648.1429892-5-songliubraving@fb.com
| * libbpf: Print hint when PERF_EVENT_IOC_SET_BPF returns -EPROTOSong Liu2020-07-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | The kernel prevents potential unwinder warnings and crashes by blocking BPF program with bpf_get_[stack|stackid] on perf_event without PERF_SAMPLE_CALLCHAIN, or with exclude_callchain_[kernel|user]. Print a hint message in libbpf to help the user debug such issues. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-4-songliubraving@fb.com
| * bpf: Fail PERF_EVENT_IOC_SET_BPF when bpf_get_[stack|stackid] cannot workSong Liu2020-07-253-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_get_[stack|stackid] on perf_events with precise_ip uses callchain attached to perf_sample_data. If this callchain is not presented, do not allow attaching BPF program that calls bpf_get_[stack|stackid] to this event. In the error case, -EPROTO is returned so that libbpf can identify this error and print proper hint message. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-3-songliubraving@fb.com
| * bpf: Separate bpf_get_[stack|stackid] for perf events BPFSong Liu2020-07-253-20/+170
|/ | | | | | | | | | | | | | | Calling get_perf_callchain() on perf_events from PEBS entries may cause unwinder errors. To fix this issue, the callchain is fetched early. Such perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY. Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may also cause unwinder errors. To fix this, add separate version of these two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in bpf_perf_event_data_kern->data->callchain. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
* Merge branch 'bpf_iter-for-map-elems'Alexei Starovoitov2020-07-2532-62/+1687
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yonghong Song says: ==================== Bpf iterator has been implemented for task, task_file, bpf_map, ipv6_route, netlink, tcp and udp so far. For map elements, there are two ways to traverse all elements from user space: 1. using BPF_MAP_GET_NEXT_KEY bpf subcommand to get elements one by one. 2. using BPF_MAP_LOOKUP_BATCH bpf subcommand to get a batch of elements. Both these approaches need to copy data from kernel to user space in order to do inspection. This patch implements bpf iterator for map elements. User can have a bpf program in kernel to run with each map element, do checking, filtering, aggregation, modifying values etc. without copying data to user space. Patch #1 and #2 are refactoring. Patch #3 implements readonly/readwrite buffer support in verifier. Patches #4 - #7 implements map element support for hash, percpu hash, lru hash lru percpu hash, array, percpu array and sock local storage maps. Patches #8 - #9 are libbpf and bpftool support. Patches #10 - #13 are selftests for implemented map element iterators. Changelogs: v3 -> v4: . fix a kasan failure triggered by a failed bpf_iter link_create, not just free_link but need cleanup_link. (Alexei) v2 -> v3: . rebase on top of latest bpf-next v1 -> v2: . support to modify map element values. (Alexei) . map key/values can be used with helper arguments for those arguments with ARG_PTR_TO_MEM or ARG_PTR_TO_INIT_MEM register type. (Alexei) . remove usused variable. (kernel test robot) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: Add a test for out of bound rdonly buf accessYonghong Song2020-07-252-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the bpf program contains out of bound access w.r.t. a particular map key/value size, the verification will be still okay, e.g., it will be accepted by verifier. But it will be rejected during link_create time. A test is added here to ensure link_create failure did happen if out of bound access happened. $ ./test_progs -n 4 ... #4/23 rdonly-buf-out-of-bound:OK ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184124.591700-1-yhs@fb.com
| * selftests/bpf: Add a test for bpf sk_storage_map iteratorYonghong Song2020-07-252-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | Added one test for bpf sk_storage_map_iterator. $ ./test_progs -n 4 ... #4/22 bpf_sk_storage_map:OK ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184122.591591-1-yhs@fb.com
| * selftests/bpf: Add test for bpf array map iteratorsYonghong Song2020-07-253-0/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two subtests are added. $ ./test_progs -n 4 ... #4/20 bpf_array_map:OK #4/21 bpf_percpu_array_map:OK ... The bpf_array_map subtest also tested bpf program changing array element values and send key/value to user space through bpf_seq_write() interface. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184121.591367-1-yhs@fb.com
| * selftests/bpf: Add test for bpf hash map iteratorsYonghong Song2020-07-253-0/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | Two subtests are added. $ ./test_progs -n 4 ... #4/18 bpf_hash_map:OK #4/19 bpf_percpu_hash_map:OK ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184120.590916-1-yhs@fb.com