From da70314917862d4da4a8d7601cd47339df8b3c23 Mon Sep 17 00:00:00 2001
From: Andrey Ignatov <rdna@fb.com>
Date: Wed, 17 Apr 2019 22:28:57 -0700
Subject: bpf: Document BPF_PROG_TYPE_CGROUP_SYSCTL

Add documentation for BPF_PROG_TYPE_CGROUP_SYSCTL, including general
info, attach type, context, return code, helpers, example and usage
considerations.

A separate file prog_cgroup_sysctl.rst is added to Documentation/bpf/.

In the future more program types can be documented in their own
prog_<name>.rst files.

Another way to place program type specific documentation would be to
group program types somehow (e.g. cgroup.rst for all cgroup-bpf
programs), but it may not scale well since some program types may belong
to different groups, e.g. BPF_PROG_TYPE_CGROUP_SKB can be documented
together with either cgroup-bpf programs or programs that access skb.

The new file is added to the index and verified by `make htmldocs` /
sanity-check by lynx.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 Documentation/bpf/index.rst              |   9 +++
 Documentation/bpf/prog_cgroup_sysctl.rst | 125 +++++++++++++++++++++++++++++++
 2 files changed, 134 insertions(+)
 create mode 100644 Documentation/bpf/prog_cgroup_sysctl.rst

(limited to 'Documentation')
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index 4e77932959cc..dadcaa9a9f5f 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -36,6 +36,15 @@ Two sets of Questions and Answers (Q&A) are maintained.
    bpf_devel_QA
 
 
+Program types
+=============
+
+.. toctree::
+   :maxdepth: 1
+
+   prog_cgroup_sysctl
+
+
 .. Links:
 .. _Documentation/networking/filter.txt: ../networking/filter.txt
 .. _man-pages: https://www.kernel.org/doc/man-pages/
diff --git a/Documentation/bpf/prog_cgroup_sysctl.rst b/Documentation/bpf/prog_cgroup_sysctl.rst
new file mode 100644
index 000000000000..677d6c637cf3
--- /dev/null
+++ b/Documentation/bpf/prog_cgroup_sysctl.rst
@@ -0,0 +1,125 @@
+.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+===========================
+BPF_PROG_TYPE_CGROUP_SYSCTL
+===========================
+
+This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
+provides cgroup-bpf hook for sysctl.
+
+The hook has to be attached to a cgroup and will be called every time a
+process inside that cgroup tries to read from or write to sysctl knob in proc.
+
+1. Attach type
+**************
+
+``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
+``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.
+
+2. Context
+**********
+
+``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
+BPF program::
+
+    struct bpf_sysctl {
+        __u32 write;
+        __u32 file_pos;
+    };
+
+* ``write`` indicates whether sysctl value is being read (``0``) or written
+  (``1``). This field is read-only.
+
+* ``file_pos`` indicates file position sysctl is being accessed at, read
+  or written. This field is read-write. Writing to the field sets the starting
+  position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
+  will be writing to. Writing zero to the field can be used e.g. to override
+  whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
+  when it's called by user space on ``file_pos > 0``. Writing non-zero
+  value to the field can be used to access part of sysctl value starting from
+  specified ``file_pos``. Not all sysctl support access with ``file_pos !=
+  0``, e.g. writes to numeric sysctl entries must always be at file position
+  ``0``. See also ``kernel.sysctl_writes_strict`` sysctl.
+
+See `linux/bpf.h`_ for more details on how context field can be accessed.
+
+3. Return code
+**************
+
+``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
+return codes:
+
+* ``0`` means "reject access to sysctl";
+* ``1`` means "proceed with access".
+
+If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
+``write(2)`` and ``errno`` will be set to ``EPERM``.
+
+4. Helpers
+**********
+
+Since sysctl knob is represented by a name and a value, sysctl specific BPF
+helpers focus on providing access to these properties:
+
+* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
+  ``/proc/sys`` into provided by BPF program buffer;
+
+* ``bpf_sysctl_get_current_value()`` to get string value currently held by
+  sysctl into provided by BPF program buffer. This helper is available on both
+  ``read(2)`` from and ``write(2)`` to sysctl;
+
+* ``bpf_sysctl_get_new_value()`` to get new string value currently being
+  written to sysctl before actual write happens. This helper can be used only
+  on ``ctx->write == 1``;
+
+* ``bpf_sysctl_set_new_value()`` to override new string value currently being
+  written to sysctl before actual write happens. Sysctl value will be
+  overridden starting from the current ``ctx->file_pos``. If the whole value
+  has to be overridden BPF program can set ``file_pos`` to zero before calling
+  to the helper. This helper can be used only on ``ctx->write == 1``. New
+  string value set by the helper is treated and verified by kernel same way as
+  an equivalent string passed by user space.
+
+BPF program sees sysctl value same way as user space does in proc filesystem,
+i.e. as a string. Since many sysctl values represent an integer or a vector
+of integers, the following helpers can be used to get numeric value from the
+string:
+
+* ``bpf_strtol()`` to convert initial part of the string to long integer
+  similar to user space `strtol(3)`_;
+* ``bpf_strtoul()`` to convert initial part of the string to unsigned long
+  integer similar to user space `strtoul(3)`_;
+
+See `linux/bpf.h`_ for more details on helpers described here.
+
+5. Examples
+***********
+
+See `test_sysctl_prog.c`_ for an example of BPF program in C that access
+sysctl name and value, parses string value to get vector of integers and uses
+the result to make decision whether to allow or deny access to sysctl.
+
+6. Notes
+********
+
+``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
+environment, for example to monitor sysctl usage or catch unreasonable values
+an application, running as root in a separate cgroup, is trying to set.
+
+Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
+may return results different from that at `sys_open` time, i.e. process that
+opened sysctl file in proc filesystem may differ from process that is trying
+to read from / write to it and two such processes may run in different
+cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
+security mechanism to limit sysctl usage.
+
+As with any cgroup-bpf program additional care should be taken if an
+application running as root in a cgroup should not be allowed to
+detach/replace BPF program attached by administrator.
+
+.. Links
+.. _linux/bpf.h: ../../include/uapi/linux/bpf.h
+.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
+.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
+.. _test_sysctl_prog.c:
+   ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c
-- 
cgit v1.2.3


From 80695946737dff4cfc1ecdefd4ebf300f132d8ee Mon Sep 17 00:00:00 2001
From: Stanislav Fomichev <sdf@google.com>
Date: Thu, 18 Apr 2019 16:47:52 -0700
Subject: bpf: move BPF_PROG_TYPE_FLOW_DISSECTOR documentation to a new common
 place

In commit da7031491786 ("bpf: Document BPF_PROG_TYPE_CGROUP_SYSCTL")
Andrey proposes to put per-prog type docs under Documentation/bpf/

Let's move flow dissector documentation there as well.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 Documentation/bpf/index.rst                     |   1 +
 Documentation/bpf/prog_flow_dissector.rst       | 126 ++++++++++++++++++++++++
 Documentation/networking/bpf_flow_dissector.rst | 126 ------------------------
 Documentation/networking/index.rst              |   1 -
 4 files changed, 127 insertions(+), 127 deletions(-)
 create mode 100644 Documentation/bpf/prog_flow_dissector.rst
 delete mode 100644 Documentation/networking/bpf_flow_dissector.rst

(limited to 'Documentation')

diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index dadcaa9a9f5f..d3fe4cac0c90 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -43,6 +43,7 @@ Program types
    :maxdepth: 1
 
    prog_cgroup_sysctl
+   prog_flow_dissector
 
 
 .. Links:
diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst
new file mode 100644
index 000000000000..ed343abe541e
--- /dev/null
+++ b/Documentation/bpf/prog_flow_dissector.rst
@@ -0,0 +1,126 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+BPF_PROG_TYPE_FLOW_DISSECTOR
+============================
+
+Overview
+========
+
+Flow dissector is a routine that parses metadata out of the packets. It's
+used in the various places in the networking subsystem (RFS, flow hash, etc).
+
+BPF flow dissector is an attempt to reimplement C-based flow dissector logic
+in BPF to gain all the benefits of BPF verifier (namely, limits on the
+number of instructions and tail calls).
+
+API
+===
+
+BPF flow dissector programs operate on an ``__sk_buff``. However, only the
+limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
+``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
+and output arguments.
+
+The inputs are:
+  * ``nhoff`` - initial offset of the networking header
+  * ``thoff`` - initial offset of the transport header, initialized to nhoff
+  * ``n_proto`` - L3 protocol type, parsed out of L2 header
+
+Flow dissector BPF program should fill out the rest of the ``struct
+bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
+also adjusted accordingly.
+
+The return code of the BPF program is either BPF_OK to indicate successful
+dissection, or BPF_DROP to indicate parsing error.
+
+__sk_buff->data
+===============
+
+In the VLAN-less case, this is what the initial state of the BPF flow
+dissector looks like::
+
+  +------+------+------------+-----------+
+  | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
+  +------+------+------------+-----------+
+                              ^
+                              |
+                              +-- flow dissector starts here
+
+
+.. code:: c
+
+  skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
+  flow_keys->thoff = nhoff
+  flow_keys->n_proto = ETHER_TYPE
+
+In case of VLAN, flow dissector can be called with the two different states.
+
+Pre-VLAN parsing::
+
+  +------+------+------+-----+-----------+-----------+
+  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+  +------+------+------+-----+-----------+-----------+
+                        ^
+                        |
+                        +-- flow dissector starts here
+
+.. code:: c
+
+  skb->data + flow_keys->nhoff point the to first byte of TCI
+  flow_keys->thoff = nhoff
+  flow_keys->n_proto = TPID
+
+Please note that TPID can be 802.1AD and, hence, BPF program would
+have to parse VLAN information twice for double tagged packets.
+
+
+Post-VLAN parsing::
+
+  +------+------+------+-----+-----------+-----------+
+  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+  +------+------+------+-----+-----------+-----------+
+                                          ^
+                                          |
+                                          +-- flow dissector starts here
+
+.. code:: c
+
+  skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
+  flow_keys->thoff = nhoff
+  flow_keys->n_proto = ETHER_TYPE
+
+In this case VLAN information has been processed before the flow dissector
+and BPF flow dissector is not required to handle it.
+
+
+The takeaway here is as follows: BPF flow dissector program can be called with
+the optional VLAN header and should gracefully handle both cases: when single
+or double VLAN is present and when it is not present. The same program
+can be called for both cases and would have to be written carefully to
+handle both cases.
+
+
+Reference Implementation
+========================
+
+See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
+implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
+for the loader. bpftool can be used to load BPF flow dissector program as well.
+
+The reference implementation is organized as follows:
+  * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
+  * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
+    does ``bpf_tail_call`` to the appropriate L3 handler
+
+Since BPF at this point doesn't support looping (or any jumping back),
+jmp_table is used instead to handle multiple levels of encapsulation (and
+IPv6 options).
+
+
+Current Limitations
+===================
+BPF flow dissector doesn't support exporting all the metadata that in-kernel
+C-based implementation can export. Notable example is single VLAN (802.1Q)
+and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
+for a set of information that's currently can be exported from the BPF context.
diff --git a/Documentation/networking/bpf_flow_dissector.rst b/Documentation/networking/bpf_flow_dissector.rst
deleted file mode 100644
index b375ae2ec2c4..000000000000
--- a/Documentation/networking/bpf_flow_dissector.rst
+++ /dev/null
@@ -1,126 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-==================
-BPF Flow Dissector
-==================
-
-Overview
-========
-
-Flow dissector is a routine that parses metadata out of the packets. It's
-used in the various places in the networking subsystem (RFS, flow hash, etc).
-
-BPF flow dissector is an attempt to reimplement C-based flow dissector logic
-in BPF to gain all the benefits of BPF verifier (namely, limits on the
-number of instructions and tail calls).
-
-API
-===
-
-BPF flow dissector programs operate on an ``__sk_buff``. However, only the
-limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
-``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
-and output arguments.
-
-The inputs are:
-  * ``nhoff`` - initial offset of the networking header
-  * ``thoff`` - initial offset of the transport header, initialized to nhoff
-  * ``n_proto`` - L3 protocol type, parsed out of L2 header
-
-Flow dissector BPF program should fill out the rest of the ``struct
-bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
-also adjusted accordingly.
-
-The return code of the BPF program is either BPF_OK to indicate successful
-dissection, or BPF_DROP to indicate parsing error.
-
-__sk_buff->data
-===============
-
-In the VLAN-less case, this is what the initial state of the BPF flow
-dissector looks like::
-
-  +------+------+------------+-----------+
-  | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
-  +------+------+------------+-----------+
-                              ^
-                              |
-                              +-- flow dissector starts here
-
-
-.. code:: c
-
-  skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
-  flow_keys->thoff = nhoff
-  flow_keys->n_proto = ETHER_TYPE
-
-In case of VLAN, flow dissector can be called with the two different states.
-
-Pre-VLAN parsing::
-
-  +------+------+------+-----+-----------+-----------+
-  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
-  +------+------+------+-----+-----------+-----------+
-                        ^
-                        |
-                        +-- flow dissector starts here
-
-.. code:: c
-
-  skb->data + flow_keys->nhoff point the to first byte of TCI
-  flow_keys->thoff = nhoff
-  flow_keys->n_proto = TPID
-
-Please note that TPID can be 802.1AD and, hence, BPF program would
-have to parse VLAN information twice for double tagged packets.
-
-
-Post-VLAN parsing::
-
-  +------+------+------+-----+-----------+-----------+
-  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
-  +------+------+------+-----+-----------+-----------+
-                                          ^
-                                          |
-                                          +-- flow dissector starts here
-
-.. code:: c
-
-  skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
-  flow_keys->thoff = nhoff
-  flow_keys->n_proto = ETHER_TYPE
-
-In this case VLAN information has been processed before the flow dissector
-and BPF flow dissector is not required to handle it.
-
-
-The takeaway here is as follows: BPF flow dissector program can be called with
-the optional VLAN header and should gracefully handle both cases: when single
-or double VLAN is present and when it is not present. The same program
-can be called for both cases and would have to be written carefully to
-handle both cases.
-
-
-Reference Implementation
-========================
-
-See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
-implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
-for the loader. bpftool can be used to load BPF flow dissector program as well.
-
-The reference implementation is organized as follows:
-  * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
-  * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
-    does ``bpf_tail_call`` to the appropriate L3 handler
-
-Since BPF at this point doesn't support looping (or any jumping back),
-jmp_table is used instead to handle multiple levels of encapsulation (and
-IPv6 options).
-
-
-Current Limitations
-===================
-BPF flow dissector doesn't support exporting all the metadata that in-kernel
-C-based implementation can export. Notable example is single VLAN (802.1Q)
-and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
-for a set of information that's currently can be exported from the BPF context.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 984e68f9e026..5449149be496 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -9,7 +9,6 @@ Contents:
    netdev-FAQ
    af_xdp
    batman-adv
-   bpf_flow_dissector
    can
    can_ucan_protocol
    device_drivers/freescale/dpaa2/index
-- 
cgit v1.2.3


From 3b8802446d27522cd6d32178ba975cc492611f31 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Wed, 17 Apr 2019 18:27:01 -0700
Subject: bpf: document the verifier limits

Document the verifier limits.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 Documentation/bpf/bpf_design_QA.rst | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
index 10453c627135..cb402c59eca5 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -85,8 +85,33 @@ Q: Can loops be supported in a safe way?
 A: It's not clear yet.
 
 BPF developers are trying to find a way to
-support bounded loops where the verifier can guarantee that
-the program terminates in less than 4096 instructions.
+support bounded loops.
+
+Q: What are the verifier limits?
+--------------------------------
+A: The only limit known to the user space is BPF_MAXINSNS (4096).
+It's the maximum number of instructions that the unprivileged bpf
+program can have. The verifier has various internal limits.
+Like the maximum number of instructions that can be explored during
+program analysis. Currently, that limit is set to 1 million.
+Which essentially means that the largest program can consist
+of 1 million NOP instructions. There is a limit to the maximum number
+of subsequent branches, a limit to the number of nested bpf-to-bpf
+calls, a limit to the number of the verifier states per instruction,
+a limit to the number of maps used by the program.
+All these limits can be hit with a sufficiently complex program.
+There are also non-numerical limits that can cause the program
+to be rejected. The verifier used to recognize only pointer + constant
+expressions. Now it can recognize pointer + bounded_register.
+bpf_lookup_map_elem(key) had a requirement that 'key' must be
+a pointer to the stack. Now, 'key' can be a pointer to map value.
+The verifier is steadily getting 'smarter'. The limits are
+being removed. The only way to know that the program is going to
+be accepted by the verifier is to try to load it.
+The bpf development process guarantees that the future kernel
+versions will accept all bpf programs that were accepted by
+the earlier versions.
+
 
 Instruction level questions
 ---------------------------
-- 
cgit v1.2.3