Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-03-21 v2 We've added 137 non-merge commits during the last 17 day(s) which contain a total of 143 files changed, 7123 insertions(+), 1092 deletions(-). The main changes are: 1) Custom SEC() handling in libbpf, from Andrii. 2) subskeleton support, from Delyan. 3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao. 4) Fix net.core.bpf_jit_harden race, from Hou. 5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub. 6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami. The arch specific bits will come later. 7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri. 8) Enable non-atomic allocations in local storage, from Joanne. 9) Various var_off ptr_to_btf_id fixed, from Kumar. 10) bpf_ima_file_hash helper, from Roberto. 11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits) selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" bpftool: Fix a bug in subskeleton code generation bpf: Fix bpf_prog_pack when PMU_SIZE is not defined bpf: Fix bpf_prog_pack for multi-node setup bpf: Fix warning for cast from restricted gfp_t in verifier bpf, arm: Fix various typos in comments libbpf: Close fd in bpf_object__reuse_map bpftool: Fix print error when show bpf map bpf: Fix kprobe_multi return probe backtrace Revert "bpf: Add support to inline bpf_get_func_ip helper on x86" bpf: Simplify check in btf_parse_hdr() selftests/bpf/test_lirc_mode2.sh: Exit with proper code bpf: Check for NULL return from bpf_get_btf_vmlinux selftests/bpf: Test skipping stacktrace bpf: Adjust BPF stack helper functions to accommodate skip > 0 bpf: Select proper size for bpf_prog_pack ... ==================== Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
author: Jakub Kicinski <kuba@kernel.org> 2022-03-22 10:36:56 -0700
committer: Jakub Kicinski <kuba@kernel.org> 2022-03-22 11:18:49 -0700
commit: 0db8640df59512dbd423c32077919f10cf35ebc6 (patch)
tree: a8e0d6806763ce5c3c84a5a2c29a1cbdb58f5129 /Documentation
parent: 4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b (diff)
parent: 7f0059b58f0257d895fafd2f2e3afe3bbdf21e64 (diff)
download: linux-stable-0db8640df59512dbd423c32077919f10cf35ebc6.tar.gz
linux-stable-0db8640df59512dbd423c32077919f10cf35ebc6.tar.bz2
linux-stable-0db8640df59512dbd423c32077919f10cf35ebc6.zip
4 files changed, 293 insertions, 0 deletions
diff --git a/Documentation/bpf/bpf_prog_run.rst b/Documentation/bpf/bpf_prog_run.rst
new file mode 100644
index 000000000000..4868c909df5c
--- /dev/null
+++ b/Documentation/bpf/bpf_prog_run.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Running BPF programs from userspace
+===================================
+
+This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
+from userspace.
+
+.. contents::
+    :local:
+    :depth: 2
+
+
+Overview
+--------
+
+The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
+execute a BPF program in the kernel and return the results to userspace. This
+can be used to unit test BPF programs against user-supplied context objects, and
+as way to explicitly execute programs in the kernel for their side effects. The
+command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
+to be defined in the UAPI header, aliased to the same value.
+
+The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
+following types:
+
+- ``BPF_PROG_TYPE_SOCKET_FILTER``
+- ``BPF_PROG_TYPE_SCHED_CLS``
+- ``BPF_PROG_TYPE_SCHED_ACT``
+- ``BPF_PROG_TYPE_XDP``
+- ``BPF_PROG_TYPE_SK_LOOKUP``
+- ``BPF_PROG_TYPE_CGROUP_SKB``
+- ``BPF_PROG_TYPE_LWT_IN``
+- ``BPF_PROG_TYPE_LWT_OUT``
+- ``BPF_PROG_TYPE_LWT_XMIT``
+- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
+- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
+- ``BPF_PROG_TYPE_STRUCT_OPS``
+- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
+- ``BPF_PROG_TYPE_SYSCALL``
+
+When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
+object and (for program types operating on network packets) a buffer containing
+the packet data that the BPF program will operate on. The kernel will then
+execute the program and return the results to userspace. Note that programs will
+not have any side effects while being run in this mode; in particular, packets
+will not actually be redirected or dropped, the program return code will just be
+returned to userspace. A separate mode for live execution of XDP programs is
+provided, documented separately below.
+
+Running XDP programs in "live frame mode"
+-----------------------------------------
+
+The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
+which can be used to execute XDP programs in a way where packets will actually
+be processed by the kernel after the execution of the XDP program as if they
+arrived on a physical interface. This mode is activated by setting the
+``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
+``BPF_PROG_RUN``.
+
+The live packet mode is optimised for high performance execution of the supplied
+XDP program many times (suitable for, e.g., running as a traffic generator),
+which means the semantics are not quite as straight-forward as the regular test
+run mode. Specifically:
+
+- When executing an XDP program in live frame mode, the result of the execution
+  will not be returned to userspace; instead, the kernel will perform the
+  operation indicated by the program's return code (drop the packet, redirect
+  it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
+  in the syscall parameters when running in this mode will be rejected. In
+  addition, not all failures will be reported back to userspace directly;
+  specifically, only fatal errors in setup or during execution (like memory
+  allocation errors) will halt execution and return an error. If an error occurs
+  in packet processing, like a failure to redirect to a given interface,
+  execution will continue with the next repetition; these errors can be detected
+  via the same trace points as for regular XDP programs.
+
+- Userspace can supply an ifindex as part of the context object, just like in
+  the regular (non-live) mode. The XDP program will be executed as though the
+  packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
+  object will point to that interface. Furthermore, if the XDP program returns
+  ``XDP_PASS``, the packet will be injected into the kernel networking stack as
+  though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
+  will be transmitted *out* of that same interface. Do note, though, that
+  because the program execution is not happening in driver context, an
+  ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
+  that same interface (i.e., it will only work if the driver has support for the
+  ``ndo_xdp_xmit`` driver op).
+
+- When running the program with multiple repetitions, the execution will happen
+  in batches. The batch size defaults to 64 packets (which is same as the
+  maximum NAPI receive batch size), but can be specified by userspace through
+  the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
+  the kernel executes the XDP program repeatedly, each invocation getting a
+  separate copy of the packet data. For each repetition, if the program drops
+  the packet, the data page is immediately recycled (see below). Otherwise, the
+  packet is buffered until the end of the batch, at which point all packets
+  buffered this way during the batch are transmitted at once.
+
+- When setting up the test run, the kernel will initialise a pool of memory
+  pages of the same size as the batch size. Each memory page will be initialised
+  with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
+  invocation. When possible, the pages will be recycled on future program
+  invocations, to improve performance. Pages will generally be recycled a full
+  batch at a time, except when a packet is dropped (by return code or because
+  of, say, a redirection error), in which case that page will be recycled
+  immediately. If a packet ends up being passed to the regular networking stack
+  (because the XDP program returns ``XDP_PASS``, or because it ends up being
+  redirected to an interface that injects it into the stack), the page will be
+  released and a new one will be allocated when the pool is empty.
+
+  When recycling, the page content is not rewritten; only the packet boundary
+  pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
+  be reset to the original values. This means that if a program rewrites the
+  packet contents, it has to be prepared to see either the original content or
+  the modified version on subsequent invocations.
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index ef5c996547ec..96056a7447c7 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture.
    helpers
    programs
    maps
+   bpf_prog_run
    classic_vs_extended.rst
    bpf_licensing
    test_debug
diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
new file mode 100644
index 000000000000..b64bec1ce144
--- /dev/null
+++ b/Documentation/trace/fprobe.rst
@@ -0,0 +1,174 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Fprobe - Function entry/exit probe
+==================================
+
+.. Author: Masami Hiramatsu <mhiramat@kernel.org>
+
+Introduction
+============
+
+Fprobe is a function entry/exit probe mechanism based on ftrace.
+Instead of using ftrace full feature, if you only want to attach callbacks
+on function entry and exit, similar to the kprobes and kretprobes, you can
+use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
+instrumentation for multiple functions with single handler. This document
+describes how to use fprobe.
+
+The usage of fprobe
+===================
+
+The fprobe is a wrapper of ftrace (+ kretprobe-like return callback) to
+attach callbacks to multiple function entry and exit. User needs to set up
+the `struct fprobe` and pass it to `register_fprobe()`.
+
+Typically, `fprobe` data structure is initialized with the `entry_handler`
+and/or `exit_handler` as below.
+
+.. code-block:: c
+
+ struct fprobe fp = {
+        .entry_handler  = my_entry_callback,
+        .exit_handler   = my_exit_callback,
+ };
+
+To enable the fprobe, call one of register_fprobe(), register_fprobe_ips(), and
+register_fprobe_syms(). These functions register the fprobe with different types
+of parameters.
+
+The register_fprobe() enables a fprobe by function-name filters.
+E.g. this enables @fp on "func*()" function except "func2()".::
+
+  register_fprobe(&fp, "func*", "func2");
+
+The register_fprobe_ips() enables a fprobe by ftrace-location addresses.
+E.g.
+
+.. code-block:: c
+
+  unsigned long ips[] = { 0x.... };
+
+  register_fprobe_ips(&fp, ips, ARRAY_SIZE(ips));
+
+And the register_fprobe_syms() enables a fprobe by symbol names.
+E.g.
+
+.. code-block:: c
+
+  char syms[] = {"func1", "func2", "func3"};
+
+  register_fprobe_syms(&fp, syms, ARRAY_SIZE(syms));
+
+To disable (remove from functions) this fprobe, call::
+
+  unregister_fprobe(&fp);
+
+You can temporally (soft) disable the fprobe by::
+
+  disable_fprobe(&fp);
+
+and resume by::
+
+  enable_fprobe(&fp);
+
+The above is defined by including the header::
+
+  #include <linux/fprobe.h>
+
+Same as ftrace, the registered callbacks will start being called some time
+after the register_fprobe() is called and before it returns. See
+:file:`Documentation/trace/ftrace.rst`.
+
+Also, the unregister_fprobe() will guarantee that the both enter and exit
+handlers are no longer being called by functions after unregister_fprobe()
+returns as same as unregister_ftrace_function().
+
+The fprobe entry/exit handler
+=============================
+
+The prototype of the entry/exit callback function is as follows:
+
+.. code-block:: c
+
+ void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+
+Note that both entry and exit callbacks have same ptototype. The @entry_ip is
+saved at function entry and passed to exit handler.
+
+@fp
+        This is the address of `fprobe` data structure related to this handler.
+        You can embed the `fprobe` to your data structure and get it by
+        container_of() macro from @fp. The @fp must not be NULL.
+
+@entry_ip
+        This is the ftrace address of the traced function (both entry and exit).
+        Note that this may not be the actual entry address of the function but
+        the address where the ftrace is instrumented.
+
+@regs
+        This is the `pt_regs` data structure at the entry and exit. Note that
+        the instruction pointer of @regs may be different from the @entry_ip
+        in the entry_handler. If you need traced instruction pointer, you need
+        to use @entry_ip. On the other hand, in the exit_handler, the instruction
+        pointer of @regs is set to the currect return address.
+
+Share the callbacks with kprobes
+================================
+
+Since the recursion safeness of the fprobe (and ftrace) is a bit different
+from the kprobes, this may cause an issue if user wants to run the same
+code from the fprobe and the kprobes.
+
+Kprobes has per-cpu 'current_kprobe' variable which protects the kprobe
+handler from recursion in all cases. On the other hand, fprobe uses
+only ftrace_test_recursion_trylock(). This allows interrupt context to
+call another (or same) fprobe while the fprobe user handler is running.
+
+This is not a matter if the common callback code has its own recursion
+detection, or it can handle the recursion in the different contexts
+(normal/interrupt/NMI.)
+But if it relies on the 'current_kprobe' recursion lock, it has to check
+kprobe_running() and use kprobe_busy_*() APIs.
+
+Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback
+code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED
+*before* registering the fprobe, like:
+
+.. code-block:: c
+
+ fprobe.flags = FPROBE_FL_KPROBE_SHARED;
+
+ register_fprobe(&fprobe, "func*", NULL);
+
+This will protect your common callback from the nested call.
+
+The missed counter
+==================
+
+The `fprobe` data structure has `fprobe::nmissed` counter field as same as
+kprobes.
+This counter counts up when;
+
+ - fprobe fails to take ftrace_recursion lock. This usually means that a function
+   which is traced by other ftrace users is called from the entry_handler.
+
+ - fprobe fails to setup the function exit because of the shortage of rethook
+   (the shadow stack for hooking the function return.)
+
+The `fprobe::nmissed` field counts up in both cases. Therefore, the former
+skips both of entry and exit callback and the latter skips the exit
+callback, but in both case the counter will increase by 1.
+
+Note that if you set the FTRACE_OPS_FL_RECURSION and/or FTRACE_OPS_FL_RCU to
+`fprobe::ops::flags` (ftrace_ops::flags) when registering the fprobe, this
+counter may not work correctly, because ftrace skips the fprobe function which
+increase the counter.
+
+
+Functions and structures
+========================
+
+.. kernel-doc:: include/linux/fprobe.h
+.. kernel-doc:: kernel/trace/fprobe.c
+
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 3769b9b7aed8..b9f3757f8269 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -9,6 +9,7 @@ Linux Tracing Technologies
    tracepoint-analysis
    ftrace
    ftrace-uses
+   fprobe
    kprobes
    kprobetrace
    uprobetracer
author	Jakub Kicinski <kuba@kernel.org>	2022-03-22 10:36:56 -0700
committer	Jakub Kicinski <kuba@kernel.org>	2022-03-22 11:18:49 -0700
commit	0db8640df59512dbd423c32077919f10cf35ebc6 (patch)
tree	a8e0d6806763ce5c3c84a5a2c29a1cbdb58f5129 /Documentation
parent	4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b (diff)
parent	7f0059b58f0257d895fafd2f2e3afe3bbdf21e64 (diff)
download	linux-stable-0db8640df59512dbd423c32077919f10cf35ebc6.tar.gz linux-stable-0db8640df59512dbd423c32077919f10cf35ebc6.tar.bz2 linux-stable-0db8640df59512dbd423c32077919f10cf35ebc6.zip