summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/sysctl/net.rst4
-rw-r--r--Documentation/bpf/instruction-set.rst9
-rw-r--r--Documentation/bpf/kfuncs.rst23
-rw-r--r--Documentation/bpf/llvm_reloc.rst18
-rw-r--r--Documentation/bpf/map_hash.rst53
-rw-r--r--Documentation/bpf/map_lru_hash_update.dot172
-rw-r--r--Documentation/bpf/prog_cgroup_sockopt.rst57
-rw-r--r--Documentation/devicetree/bindings/net/dsa/marvell.txt2
-rw-r--r--Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml32
-rw-r--r--Documentation/leds/leds-class.rst81
-rw-r--r--Documentation/netlink/genetlink-legacy.yaml8
-rw-r--r--Documentation/netlink/specs/ovs_flow.yaml831
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ice.rst18
-rw-r--r--Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst45
-rw-r--r--Documentation/networking/ip-sysctl.rst17
15 files changed, 1328 insertions, 42 deletions
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 466c560b0c30..4877563241f3 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -386,8 +386,8 @@ Default : 0 (for compatibility reasons)
txrehash
--------
-Controls default hash rethink behaviour on listening socket when SO_TXREHASH
-option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
+Controls default hash rethink behaviour on socket when SO_TXREHASH option is set
+to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
If set to 1 (default), hash rethink is performed on listening socket.
If set to 0, hash rethink is not performed.
diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 492980ece1ab..6644842cd3ea 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -163,13 +163,13 @@ BPF_MUL 0x20 dst \*= src
BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0
BPF_OR 0x40 dst \|= src
BPF_AND 0x50 dst &= src
-BPF_LSH 0x60 dst <<= src
-BPF_RSH 0x70 dst >>= src
+BPF_LSH 0x60 dst <<= (src & mask)
+BPF_RSH 0x70 dst >>= (src & mask)
BPF_NEG 0x80 dst = ~src
BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst
BPF_XOR 0xa0 dst ^= src
BPF_MOV 0xb0 dst = src
-BPF_ARSH 0xc0 sign extending shift right
+BPF_ARSH 0xc0 sign extending dst >>= (src & mask)
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
======== ===== ==========================================================
@@ -204,6 +204,9 @@ for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result
interpreted as an unsigned 64-bit value. There are no instructions for
signed division or modulo.
+Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
+for 32-bit operations.
+
Byte swap instructions
~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ea2516374d92..7a3d9de5f315 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -100,7 +100,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
-2.2.2 __uninit Annotation
+2.2.3 __uninit Annotation
-------------------------
This annotation is used to indicate that the argument will be treated as
@@ -117,6 +117,27 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this
annotation, the verifier will reject the program if the dynptr passed in is
not initialized.
+2.2.4 __opt Annotation
+-------------------------
+
+This annotation is used to indicate that the buffer associated with an __sz or __szk
+argument may be null. If the function is passed a nullptr in place of the buffer,
+the verifier will not check that length is appropriate for the buffer. The kfunc is
+responsible for checking if this buffer is null before using it.
+
+An example is given below::
+
+ __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
+ {
+ ...
+ }
+
+Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
+Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
+annotation, the verifier will reject the program if a null pointer is passed in with
+a nonzero size.
+
+
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
diff --git a/Documentation/bpf/llvm_reloc.rst b/Documentation/bpf/llvm_reloc.rst
index ca8957d5b671..e4a777a6a3a2 100644
--- a/Documentation/bpf/llvm_reloc.rst
+++ b/Documentation/bpf/llvm_reloc.rst
@@ -48,7 +48,7 @@ the code with ``llvm-objdump -dr test.o``::
14: 0f 10 00 00 00 00 00 00 r0 += r1
15: 95 00 00 00 00 00 00 00 exit
-There are four relations in the above for four ``LD_imm64`` instructions.
+There are four relocations in the above for four ``LD_imm64`` instructions.
The following ``llvm-readelf -r test.o`` shows the binary values of the four
relocations::
@@ -79,14 +79,16 @@ The following is the symbol table with ``llvm-readelf -s test.o``::
The 6th entry is global variable ``g1`` with value 0.
Similarly, the second relocation is at ``.text`` offset ``0x18``, instruction 3,
-for global variable ``g2`` which has a symbol value 4, the offset
-from the start of ``.data`` section.
-
-The third and fourth relocations refers to static variables ``l1``
-and ``l2``. From ``.rel.text`` section above, it is not clear
-which symbols they really refers to as they both refers to
+has a type of ``R_BPF_64_64`` and refers to entry 7 in the symbol table.
+The second relocation resolves to global variable ``g2`` which has a symbol
+value 4. The symbol value represents the offset from the start of ``.data``
+section where the initial value of the global variable ``g2`` is stored.
+
+The third and fourth relocations refer to static variables ``l1``
+and ``l2``. From the ``.rel.text`` section above, it is not clear
+to which symbols they really refer as they both refer to
symbol table entry 4, symbol ``sec``, which has ``STT_SECTION`` type
-and represents a section. So for static variable or function,
+and represents a section. So for a static variable or function,
the section offset is written to the original insn
buffer, which is called ``A`` (addend). Looking at
above insn ``7`` and ``11``, they have section offset ``8`` and ``12``.
diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst
index 8669426264c6..d2343952f2cb 100644
--- a/Documentation/bpf/map_hash.rst
+++ b/Documentation/bpf/map_hash.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
+.. Copyright (C) 2022-2023 Isovalent, Inc.
===============================================
BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
@@ -29,7 +30,16 @@ will automatically evict the least recently used entries when the hash
table reaches capacity. An LRU hash maintains an internal LRU list that
is used to select elements for eviction. This internal LRU list is
shared across CPUs but it is possible to request a per CPU LRU list with
-the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
+the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The
+following table outlines the properties of LRU maps depending on the a
+map type and the flags used to create the map.
+
+======================== ========================= ================================
+Flag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
+======================== ========================= ================================
+**BPF_F_NO_COMMON_LRU** Per-CPU LRU, global map Per-CPU LRU, per-cpu map
+**!BPF_F_NO_COMMON_LRU** Global LRU, global map Global LRU, per-cpu map
+======================== ========================= ================================
Usage
=====
@@ -206,3 +216,44 @@ Userspace walking the map elements from the map declared above:
cur_key = &next_key;
}
}
+
+Internals
+=========
+
+This section of the document is targeted at Linux developers and describes
+aspects of the map implementations that are not considered stable ABI. The
+following details are subject to change in future versions of the kernel.
+
+``BPF_MAP_TYPE_LRU_HASH`` and variants
+--------------------------------------
+
+Updating elements in LRU maps may trigger eviction behaviour when the capacity
+of the map is reached. There are various steps that the update algorithm
+attempts in order to enforce the LRU property which have increasing impacts on
+other CPUs involved in the following operation attempts:
+
+- Attempt to use CPU-local state to batch operations
+- Attempt to fetch free nodes from global lists
+- Attempt to pull any node from a global list and remove it from the hashmap
+- Attempt to pull any node from any CPU's list and remove it from the hashmap
+
+This algorithm is described visually in the following diagram. See the
+description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of
+the corresponding operations:
+
+.. kernel-figure:: map_lru_hash_update.dot
+ :alt: Diagram outlining the LRU eviction steps taken during map update.
+
+ LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and
+ variants. See the dot file source for kernel function name code references.
+
+Map updates start from the oval in the top right "begin ``bpf_map_update()``"
+and progress through the graph towards the bottom where the result may be
+either a successful update or a failure with various error codes. The key in
+the top right provides indicators for which locks may be involved in specific
+operations. This is intended as a visual hint for reasoning about how map
+contention may impact update operations, though the map type and flags may
+impact the actual contention on those locks, based on the logic described in
+the table above. For instance, if the map is created with type
+``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map
+properties would be per-cpu.
diff --git a/Documentation/bpf/map_lru_hash_update.dot b/Documentation/bpf/map_lru_hash_update.dot
new file mode 100644
index 000000000000..a0fee349d29c
--- /dev/null
+++ b/Documentation/bpf/map_lru_hash_update.dot
@@ -0,0 +1,172 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2022-2023 Isovalent, Inc.
+digraph {
+ node [colorscheme=accent4,style=filled] # Apply colorscheme to all nodes
+ graph [splines=ortho, nodesep=1]
+
+ subgraph cluster_key {
+ label = "Key\n(locks held during operation)";
+ rankdir = TB;
+
+ remote_lock [shape=rectangle,fillcolor=4,label="remote CPU LRU lock"]
+ hash_lock [shape=rectangle,fillcolor=3,label="hashtab lock"]
+ lru_lock [shape=rectangle,fillcolor=2,label="LRU lock"]
+ local_lock [shape=rectangle,fillcolor=1,label="local CPU LRU lock"]
+ no_lock [shape=rectangle,label="no locks held"]
+ }
+
+ begin [shape=oval,label="begin\nbpf_map_update()"]
+
+ // Nodes below with an 'fn_' prefix are roughly labeled by the C function
+ // names that initiate the corresponding logic in kernel/bpf/bpf_lru_list.c.
+ // Number suffixes and errno suffixes handle subsections of the corresponding
+ // logic in the function as of the writing of this dot.
+
+ // cf. __local_list_pop_free() / bpf_percpu_lru_pop_free()
+ local_freelist_check [shape=diamond,fillcolor=1,
+ label="Local freelist\nnode available?"];
+ use_local_node [shape=rectangle,
+ label="Use node owned\nby this CPU"]
+
+ // cf. bpf_lru_pop_free()
+ common_lru_check [shape=diamond,
+ label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"];
+
+ fn_bpf_lru_list_pop_free_to_local [shape=rectangle,fillcolor=2,
+ label="Flush local pending,
+ Rotate Global list, move
+ LOCAL_FREE_TARGET
+ from global -> local"]
+ // Also corresponds to:
+ // fn__local_list_flush()
+ // fn_bpf_lru_list_rotate()
+ fn___bpf_lru_node_move_to_free[shape=diamond,fillcolor=2,
+ label="Able to free\nLOCAL_FREE_TARGET\nnodes?"]
+
+ fn___bpf_lru_list_shrink_inactive [shape=rectangle,fillcolor=3,
+ label="Shrink inactive list
+ up to remaining
+ LOCAL_FREE_TARGET
+ (global LRU -> local)"]
+ fn___bpf_lru_list_shrink [shape=diamond,fillcolor=2,
+ label="> 0 entries in\nlocal free list?"]
+ fn___bpf_lru_list_shrink2 [shape=rectangle,fillcolor=2,
+ label="Steal one node from
+ inactive, or if empty,
+ from active global list"]
+ fn___bpf_lru_list_shrink3 [shape=rectangle,fillcolor=3,
+ label="Try to remove\nnode from hashtab"]
+
+ local_freelist_check2 [shape=diamond,label="Htab removal\nsuccessful?"]
+ common_lru_check2 [shape=diamond,
+ label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"];
+
+ subgraph cluster_remote_lock {
+ label = "Iterate through CPUs\n(start from current)";
+ style = dashed;
+ rankdir=LR;
+
+ local_freelist_check5 [shape=diamond,fillcolor=4,
+ label="Steal a node from\nper-cpu freelist?"]
+ local_freelist_check6 [shape=rectangle,fillcolor=4,
+ label="Steal a node from
+ (1) Unreferenced pending, or
+ (2) Any pending node"]
+ local_freelist_check7 [shape=rectangle,fillcolor=3,
+ label="Try to remove\nnode from hashtab"]
+ fn_htab_lru_map_update_elem [shape=diamond,
+ label="Stole node\nfrom remote\nCPU?"]
+ fn_htab_lru_map_update_elem2 [shape=diamond,label="Iterated\nall CPUs?"]
+ // Also corresponds to:
+ // use_local_node()
+ // fn__local_list_pop_pending()
+ }
+
+ fn_bpf_lru_list_pop_free_to_local2 [shape=rectangle,
+ label="Use node that was\nnot recently referenced"]
+ local_freelist_check4 [shape=rectangle,
+ label="Use node that was\nactively referenced\nin global list"]
+ fn_htab_lru_map_update_elem_ENOMEM [shape=oval,label="return -ENOMEM"]
+ fn_htab_lru_map_update_elem3 [shape=rectangle,
+ label="Use node that was\nactively referenced\nin (another?) CPU's cache"]
+ fn_htab_lru_map_update_elem4 [shape=rectangle,fillcolor=3,
+ label="Update hashmap\nwith new element"]
+ fn_htab_lru_map_update_elem5 [shape=oval,label="return 0"]
+ fn_htab_lru_map_update_elem_EBUSY [shape=oval,label="return -EBUSY"]
+ fn_htab_lru_map_update_elem_EEXIST [shape=oval,label="return -EEXIST"]
+ fn_htab_lru_map_update_elem_ENOENT [shape=oval,label="return -ENOENT"]
+
+ begin -> local_freelist_check
+ local_freelist_check -> use_local_node [xlabel="Y"]
+ local_freelist_check -> common_lru_check [xlabel="N"]
+ common_lru_check -> fn_bpf_lru_list_pop_free_to_local [xlabel="Y"]
+ common_lru_check -> fn___bpf_lru_list_shrink_inactive [xlabel="N"]
+ fn_bpf_lru_list_pop_free_to_local -> fn___bpf_lru_node_move_to_free
+ fn___bpf_lru_node_move_to_free ->
+ fn_bpf_lru_list_pop_free_to_local2 [xlabel="Y"]
+ fn___bpf_lru_node_move_to_free ->
+ fn___bpf_lru_list_shrink_inactive [xlabel="N"]
+ fn___bpf_lru_list_shrink_inactive -> fn___bpf_lru_list_shrink
+ fn___bpf_lru_list_shrink -> fn_bpf_lru_list_pop_free_to_local2 [xlabel = "Y"]
+ fn___bpf_lru_list_shrink -> fn___bpf_lru_list_shrink2 [xlabel="N"]
+ fn___bpf_lru_list_shrink2 -> fn___bpf_lru_list_shrink3
+ fn___bpf_lru_list_shrink3 -> local_freelist_check2
+ local_freelist_check2 -> local_freelist_check4 [xlabel = "Y"]
+ local_freelist_check2 -> common_lru_check2 [xlabel = "N"]
+ common_lru_check2 -> local_freelist_check5 [xlabel = "Y"]
+ common_lru_check2 -> fn_htab_lru_map_update_elem_ENOMEM [xlabel = "N"]
+ local_freelist_check5 -> fn_htab_lru_map_update_elem [xlabel = "Y"]
+ local_freelist_check5 -> local_freelist_check6 [xlabel = "N"]
+ local_freelist_check6 -> local_freelist_check7
+ local_freelist_check7 -> fn_htab_lru_map_update_elem
+
+ fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem3 [xlabel = "Y"]
+ fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem2 [xlabel = "N"]
+ fn_htab_lru_map_update_elem2 ->
+ fn_htab_lru_map_update_elem_ENOMEM [xlabel = "Y"]
+ fn_htab_lru_map_update_elem2 -> local_freelist_check5 [xlabel = "N"]
+ fn_htab_lru_map_update_elem3 -> fn_htab_lru_map_update_elem4
+
+ use_local_node -> fn_htab_lru_map_update_elem4
+ fn_bpf_lru_list_pop_free_to_local2 -> fn_htab_lru_map_update_elem4
+ local_freelist_check4 -> fn_htab_lru_map_update_elem4
+
+ fn_htab_lru_map_update_elem4 -> fn_htab_lru_map_update_elem5 [headlabel="Success"]
+ fn_htab_lru_map_update_elem4 ->
+ fn_htab_lru_map_update_elem_EBUSY [xlabel="Hashtab lock failed"]
+ fn_htab_lru_map_update_elem4 ->
+ fn_htab_lru_map_update_elem_EEXIST [xlabel="BPF_EXIST set and\nkey already exists"]
+ fn_htab_lru_map_update_elem4 ->
+ fn_htab_lru_map_update_elem_ENOENT [headlabel="BPF_NOEXIST set\nand no such entry"]
+
+ // Create invisible pad nodes to line up various nodes
+ pad0 [style=invis]
+ pad1 [style=invis]
+ pad2 [style=invis]
+ pad3 [style=invis]
+ pad4 [style=invis]
+
+ // Line up the key with the top of the graph
+ no_lock -> local_lock [style=invis]
+ local_lock -> lru_lock [style=invis]
+ lru_lock -> hash_lock [style=invis]
+ hash_lock -> remote_lock [style=invis]
+ remote_lock -> local_freelist_check5 [style=invis]
+ remote_lock -> fn___bpf_lru_list_shrink [style=invis]
+
+ // Line up return code nodes at the bottom of the graph
+ fn_htab_lru_map_update_elem -> pad0 [style=invis]
+ pad0 -> pad1 [style=invis]
+ pad1 -> pad2 [style=invis]
+ //pad2-> fn_htab_lru_map_update_elem_ENOMEM [style=invis]
+ fn_htab_lru_map_update_elem4 -> pad3 [style=invis]
+ pad3 -> fn_htab_lru_map_update_elem5 [style=invis]
+ pad3 -> fn_htab_lru_map_update_elem_EBUSY [style=invis]
+ pad3 -> fn_htab_lru_map_update_elem_EEXIST [style=invis]
+ pad3 -> fn_htab_lru_map_update_elem_ENOENT [style=invis]
+
+ // Reduce diagram width by forcing some nodes to appear above others
+ local_freelist_check4 -> fn_htab_lru_map_update_elem3 [style=invis]
+ common_lru_check2 -> pad4 [style=invis]
+ pad4 -> local_freelist_check5 [style=invis]
+}
diff --git a/Documentation/bpf/prog_cgroup_sockopt.rst b/Documentation/bpf/prog_cgroup_sockopt.rst
index 172f957204bf..1226a94af07a 100644
--- a/Documentation/bpf/prog_cgroup_sockopt.rst
+++ b/Documentation/bpf/prog_cgroup_sockopt.rst
@@ -98,10 +98,65 @@ can access only the first ``PAGE_SIZE`` of that data. So it has to options:
indicates that the kernel should use BPF's trimmed ``optval``.
When the BPF program returns with the ``optlen`` greater than
-``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
+``PAGE_SIZE``, the userspace will receive original kernel
+buffers without any modifications that the BPF program might have
+applied.
Example
=======
+Recommended way to handle BPF programs is as follows:
+
+.. code-block:: c
+
+ SEC("cgroup/getsockopt")
+ int getsockopt(struct bpf_sockopt *ctx)
+ {
+ /* Custom socket option. */
+ if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
+ ctx->retval = 0;
+ optval[0] = ...;
+ ctx->optlen = 1;
+ return 1;
+ }
+
+ /* Modify kernel's socket option. */
+ if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+ ctx->retval = 0;
+ optval[0] = ...;
+ ctx->optlen = 1;
+ return 1;
+ }
+
+ /* optval larger than PAGE_SIZE use kernel's buffer. */
+ if (ctx->optlen > PAGE_SIZE)
+ ctx->optlen = 0;
+
+ return 1;
+ }
+
+ SEC("cgroup/setsockopt")
+ int setsockopt(struct bpf_sockopt *ctx)
+ {
+ /* Custom socket option. */
+ if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
+ /* do something */
+ ctx->optlen = -1;
+ return 1;
+ }
+
+ /* Modify kernel's socket option. */
+ if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+ optval[0] = ...;
+ return 1;
+ }
+
+ /* optval larger than PAGE_SIZE use kernel's buffer. */
+ if (ctx->optlen > PAGE_SIZE)
+ ctx->optlen = 0;
+
+ return 1;
+ }
+
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
of BPF program that handles socket options.
diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt
index 2363b412410c..33726134f5c9 100644
--- a/Documentation/devicetree/bindings/net/dsa/marvell.txt
+++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt
@@ -20,7 +20,7 @@ which is at a different MDIO base address in different switch families.
6171, 6172, 6175, 6176, 6185, 6240, 6320, 6321,
6341, 6350, 6351, 6352
- "marvell,mv88e6190" : Switch has base address 0x00. Use with models:
- 6190, 6190X, 6191, 6290, 6390, 6390X
+ 6163, 6190, 6190X, 6191, 6290, 6390, 6390X
- "marvell,mv88e6250" : Switch has base address 0x08 or 0x18. Use with model:
6220, 6250
diff --git a/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml b/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
index 9a64ed658745..4d5f5cc6d031 100644
--- a/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
@@ -12,10 +12,6 @@ description:
cs_sck_delay of 500ns. Ensuring that this SPI timing requirement is observed
depends on the SPI bus master driver.
-allOf:
- - $ref: dsa.yaml#/$defs/ethernet-ports
- - $ref: /schemas/spi/spi-peripheral-props.yaml#
-
maintainers:
- Vladimir Oltean <vladimir.oltean@nxp.com>
@@ -36,6 +32,9 @@ properties:
reg:
maxItems: 1
+ spi-cpha: true
+ spi-cpol: true
+
# Optional container node for the 2 internal MDIO buses of the SJA1110
# (one for the internal 100base-T1 PHYs and the other for the single
# 100base-TX PHY). The "reg" property does not have physical significance.
@@ -109,6 +108,30 @@ $defs:
1860, 1880, 1900, 1920, 1940, 1960, 1980, 2000, 2020, 2040, 2060, 2080,
2100, 2120, 2140, 2160, 2180, 2200, 2220, 2240, 2260]
+allOf:
+ - $ref: dsa.yaml#/$defs/ethernet-ports
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
+ - if:
+ properties:
+ compatible:
+ enum:
+ - nxp,sja1105e
+ - nxp,sja1105p
+ - nxp,sja1105q
+ - nxp,sja1105r
+ - nxp,sja1105s
+ - nxp,sja1105t
+ then:
+ properties:
+ spi-cpol: false
+ required:
+ - spi-cpha
+ else:
+ properties:
+ spi-cpha: false
+ required:
+ - spi-cpol
+
unevaluatedProperties: false
examples:
@@ -120,6 +143,7 @@ examples:
ethernet-switch@1 {
reg = <0x1>;
compatible = "nxp,sja1105t";
+ spi-cpha;
ethernet-ports {
#address-cells = <1>;
diff --git a/Documentation/leds/leds-class.rst b/Documentation/leds/leds-class.rst
index cd155ead8703..5db620ed27aa 100644
--- a/Documentation/leds/leds-class.rst
+++ b/Documentation/leds/leds-class.rst
@@ -169,6 +169,87 @@ Setting the brightness to zero with brightness_set() callback function
should completely turn off the LED and cancel the previously programmed
hardware blinking function, if any.
+Hardware driven LEDs
+====================
+
+Some LEDs can be programmed to be driven by hardware. This is not
+limited to blink but also to turn off or on autonomously.
+To support this feature, a LED needs to implement various additional
+ops and needs to declare specific support for the supported triggers.
+
+With hw control we refer to the LED driven by hardware.
+
+LED driver must define the following value to support hw control:
+
+ - hw_control_trigger:
+ unique trigger name supported by the LED in hw control
+ mode.
+
+LED driver must implement the following API to support hw control:
+ - hw_control_is_supported:
+ check if the flags passed by the supported trigger can
+ be parsed and activate hw control on the LED.
+
+ Return 0 if the passed flags mask is supported and
+ can be set with hw_control_set().
+
+ If the passed flags mask is not supported -EOPNOTSUPP
+ must be returned, the LED trigger will use software
+ fallback in this case.
+
+ Return a negative error in case of any other error like
+ device not ready or timeouts.
+
+ - hw_control_set:
+ activate hw control. LED driver will use the provided
+ flags passed from the supported trigger, parse them to
+ a set of mode and setup the LED to be driven by hardware
+ following the requested modes.
+
+ Set LED_OFF via the brightness_set to deactivate hw control.
+
+ Return 0 on success, a negative error number on failing to
+ apply flags.
+
+ - hw_control_get:
+ get active modes from a LED already in hw control, parse
+ them and set in flags the current active flags for the
+ supported trigger.
+
+ Return 0 on success, a negative error number on failing
+ parsing the initial mode.
+ Error from this function is NOT FATAL as the device may
+ be in a not supported initial state by the attached LED
+ trigger.
+
+ - hw_control_get_device:
+ return the device associated with the LED driver in
+ hw control. A trigger might use this to match the
+ returned device from this function with a configured
+ device for the trigger as the source for blinking
+ events and correctly enable hw control.
+ (example a netdev trigger configured to blink for a
+ particular dev match the returned dev from get_device
+ to set hw control)
+
+ Returns a pointer to a struct device or NULL if nothing
+ is currently attached.
+
+LED driver can activate additional modes by default to workaround the
+impossibility of supporting each different mode on the supported trigger.
+Examples are hardcoding the blink speed to a set interval, enable special
+feature like bypassing blink if some requirements are not met.
+
+A trigger should first check if the hw control API are supported by the LED
+driver and check if the trigger is supported to verify if hw control is possible,
+use hw_control_is_supported to check if the flags are supported and only at
+the end use hw_control_set to activate hw control.
+
+A trigger can use hw_control_get to check if a LED is already in hw control
+and init their flags.
+
+When the LED is in hw control, no software blink is possible and doing so
+will effectively disable hw control.
Known Issues
============
diff --git a/Documentation/netlink/genetlink-legacy.yaml b/Documentation/netlink/genetlink-legacy.yaml
index b33541a51d6b..ac4350498f5e 100644
--- a/Documentation/netlink/genetlink-legacy.yaml
+++ b/Documentation/netlink/genetlink-legacy.yaml
@@ -122,6 +122,14 @@ properties:
enum: [ u8, u16, u32, u64, s8, s16, s32, s64, string ]
len:
$ref: '#/$defs/len-or-define'
+ byte-order:
+ enum: [ little-endian, big-endian ]
+ doc:
+ description: Documentation for the struct member attribute.
+ type: string
+ enum:
+ description: Name of the enum type used for the attribute.
+ type: string
# End genetlink-legacy
attribute-sets:
diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml
new file mode 100644
index 000000000000..3b0624c87074
--- /dev/null
+++ b/Documentation/netlink/specs/ovs_flow.yaml
@@ -0,0 +1,831 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+
+name: ovs_flow
+version: 1
+protocol: genetlink-legacy
+
+doc:
+ OVS flow configuration over generic netlink.
+
+definitions:
+ -
+ name: ovs-header
+ type: struct
+ doc: |
+ Header for OVS Generic Netlink messages.
+ members:
+ -
+ name: dp-ifindex
+ type: u32
+ doc: |
+ ifindex of local port for datapath (0 to make a request not specific
+ to a datapath).
+ -
+ name: ovs-flow-stats
+ type: struct
+ members:
+ -
+ name: n-packets
+ type: u64
+ doc: Number of matched packets.
+ -
+ name: n-bytes
+ type: u64
+ doc: Number of matched bytes.
+ -
+ name: ovs-key-mpls
+ type: struct
+ members:
+ -
+ name: mpls-lse
+ type: u32
+ byte-order: big-endian
+ -
+ name: ovs-key-ipv4
+ type: struct
+ members:
+ -
+ name: ipv4-src
+ type: u32
+ byte-order: big-endian
+ -
+ name: ipv4-dst
+ type: u32
+ byte-order: big-endian
+ -
+ name: ipv4-proto
+ type: u8
+ -
+ name: ipv4-tos
+ type: u8
+ -
+ name: ipv4-ttl
+ type: u8
+ -
+ name: ipv4-frag
+ type: u8
+ enum: ovs-frag-type
+ -
+ name: ovs-frag-type
+ type: enum
+ entries:
+ -
+ name: none
+ doc: Packet is not a fragment.
+ -
+ name: first
+ doc: Packet is a fragment with offset 0.
+ -
+ name: later
+ doc: Packet is a fragment with nonzero offset.
+ -
+ name: any
+ value: 255
+ -
+ name: ovs-key-tcp
+ type: struct
+ members:
+ -
+ name: tcp-src
+ type: u16
+ byte-order: big-endian
+ -
+ name: tcp-dst
+ type: u16
+ byte-order: big-endian
+ -
+ name: ovs-key-udp
+ type: struct
+ members:
+ -
+ name: udp-src
+ type: u16
+ byte-order: big-endian
+ -
+ name: udp-dst
+ type: u16
+ byte-order: big-endian
+ -
+ name: ovs-key-sctp
+ type: struct
+ members:
+ -
+ name: sctp-src
+ type: u16
+ byte-order: big-endian
+ -
+ name: sctp-dst
+ type: u16
+ byte-order: big-endian
+ -
+ name: ovs-key-icmp
+ type: struct
+ members:
+ -
+ name: icmp-type
+ type: u8
+ -
+ name: icmp-code
+ type: u8
+ -
+ name: ovs-key-ct-tuple-ipv4
+ type: struct
+ members:
+ -
+ name: ipv4-src
+ type: u32
+ byte-order: big-endian
+ -
+ name: ipv4-dst
+ type: u32
+ byte-order: big-endian
+ -
+ name: src-port
+ type: u16
+ byte-order: big-endian
+ -
+ name: dst-port
+ type: u16
+ byte-order: big-endian
+ -
+ name: ipv4-proto
+ type: u8
+ -
+ name: ovs-action-push-vlan
+ type: struct
+ members:
+ -
+ name: vlan_tpid
+ type: u16
+ byte-order: big-endian
+ doc: Tag protocol identifier (TPID) to push.
+ -
+ name: vlan_tci
+ type: u16
+ byte-order: big-endian
+ doc: Tag control identifier (TCI) to push.
+ -
+ name: ovs-ufid-flags
+ type: flags
+ entries:
+ - omit-key
+ - omit-mask
+ - omit-actions
+ -
+ name: ovs-action-hash
+ type: struct
+ members:
+ -
+ name: hash-algorithm
+ type: u32
+ doc: Algorithm used to compute hash prior to recirculation.
+ -
+ name: hash-basis
+ type: u32
+ doc: Basis used for computing hash.
+ -
+ name: ovs-hash-alg
+ type: enum
+ doc: |
+ Data path hash algorithm for computing Datapath hash. The algorithm type only specifies
+ the fields in a flow will be used as part of the hash. Each datapath is free to use its
+ own hash algorithm. The hash value will be opaque to the user space daemon.
+ entries:
+ - ovs-hash-alg-l4
+
+ -
+ name: ovs-action-push-mpls
+ type: struct
+ members:
+ -
+ name: lse
+ type: u32
+ byte-order: big-endian
+ doc: |
+ MPLS label stack entry to push
+ -
+ name: ethertype
+ type: u32
+ byte-order: big-endian
+ doc: |
+ Ethertype to set in the encapsulating ethernet frame. The only values
+ ethertype should ever be given are ETH_P_MPLS_UC and ETH_P_MPLS_MC,
+ indicating MPLS unicast or multicast. Other are rejected.
+ -
+ name: ovs-action-add-mpls
+ type: struct
+ members:
+ -
+ name: lse
+ type: u32
+ byte-order: big-endian
+ doc: |
+ MPLS label stack entry to push
+ -
+ name: ethertype
+ type: u32
+ byte-order: big-endian
+ doc: |
+ Ethertype to set in the encapsulating ethernet frame. The only values
+ ethertype should ever be given are ETH_P_MPLS_UC and ETH_P_MPLS_MC,
+ indicating MPLS unicast or multicast. Other are rejected.
+ -
+ name: tun-flags
+ type: u16
+ doc: |
+ MPLS tunnel attributes.
+ -
+ name: ct-state-flags
+ type: flags
+ entries:
+ -
+ name: new
+ doc: Beginning of a new connection.
+ -
+ name: established
+ doc: Part of an existing connenction
+ -
+ name: related
+ doc: Related to an existing connection.
+ -
+ name: reply-dir
+ doc: Flow is in the reply direction.
+ -
+ name: invalid
+ doc: Could not track the connection.
+ -
+ name: tracked
+ doc: Conntrack has occurred.
+ -
+ name: src-nat
+ doc: Packet's source address/port was mangled by NAT.
+ -
+ name: dst-nat
+ doc: Packet's destination address/port was mangled by NAT.
+
+attribute-sets:
+ -
+ name: flow-attrs
+ attributes:
+ -
+ name: key
+ type: nest
+ nested-attributes: key-attrs
+ doc: |
+ Nested attributes specifying the flow key. Always present in
+ notifications. Required for all requests (except dumps).
+ -
+ name: actions
+ type: nest
+ nested-attributes: action-attrs
+ doc: |
+ Nested attributes specifying the actions to take for packets that
+ match the key. Always present in notifications. Required for
+ OVS_FLOW_CMD_NEW requests, optional for OVS_FLOW_CMD_SET requests. An
+ OVS_FLOW_CMD_SET without OVS_FLOW_ATTR_ACTIONS will not modify the
+ actions. To clear the actions, an OVS_FLOW_ATTR_ACTIONS without any
+ nested attributes must be given.
+ -
+ name: stats
+ type: binary
+ struct: ovs-flow-stats
+ doc: |
+ Statistics for this flow. Present in notifications if the stats would
+ be nonzero. Ignored in requests.
+ -
+ name: tcp-flags
+ type: u8
+ doc: |
+ An 8-bit value giving the ORed value of all of the TCP flags seen on
+ packets in this flow. Only present in notifications for TCP flows, and
+ only if it would be nonzero. Ignored in requests.
+ -
+ name: used
+ type: u64
+ doc: |
+ A 64-bit integer giving the time, in milliseconds on the system
+ monotonic clock, at which a packet was last processed for this
+ flow. Only present in notifications if a packet has been processed for
+ this flow. Ignored in requests.
+ -
+ name: clear
+ type: flag
+ doc: |
+ If present in a OVS_FLOW_CMD_SET request, clears the last-used time,
+ accumulated TCP flags, and statistics for this flow. Otherwise
+ ignored in requests. Never present in notifications.
+ -
+ name: mask
+ type: nest
+ nested-attributes: key-attrs
+ doc: |
+ Nested attributes specifying the mask bits for wildcarded flow
+ match. Mask bit value '1' specifies exact match with corresponding
+ flow key bit, while mask bit value '0' specifies a wildcarded
+ match. Omitting attribute is treated as wildcarding all corresponding
+ fields. Optional for all requests. If not present, all flow key bits
+ are exact match bits.
+ -
+ name: probe
+ type: binary
+ doc: |
+ Flow operation is a feature probe, error logging should be suppressed.
+ -
+ name: ufid
+ type: binary
+ doc: |
+ A value between 1-16 octets specifying a unique identifier for the
+ flow. Causes the flow to be indexed by this value rather than the
+ value of the OVS_FLOW_ATTR_KEY attribute. Optional for all
+ requests. Present in notifications if the flow was created with this
+ attribute.
+ -
+ name: ufid-flags
+ type: u32
+ enum: ovs-ufid-flags
+ doc: |
+ A 32-bit value of ORed flags that provide alternative semantics for
+ flow installation and retrieval. Optional for all requests.
+ -
+ name: pad
+ type: binary
+
+ -
+ name: key-attrs
+ attributes:
+ -
+ name: encap
+ type: nest
+ nested-attributes: key-attrs
+ -
+ name: priority
+ type: u32
+ -
+ name: in-port
+ type: u32
+ -
+ name: ethernet
+ type: binary
+ doc: struct ovs_key_ethernet
+ -
+ name: vlan
+ type: u16
+ byte-order: big-endian
+ -
+ name: ethertype
+ type: u16
+ byte-order: big-endian
+ -
+ name: ipv4
+ type: binary
+ struct: ovs-key-ipv4
+ -
+ name: ipv6
+ type: binary
+ doc: struct ovs_key_ipv6
+ -
+ name: tcp
+ type: binary
+ struct: ovs-key-tcp
+ -
+ name: udp
+ type: binary
+ struct: ovs-key-udp
+ -
+ name: icmp
+ type: binary
+ struct: ovs-key-icmp
+ -
+ name: icmpv6
+ type: binary
+ struct: ovs-key-icmp
+ -
+ name: arp
+ type: binary
+ doc: struct ovs_key_arp
+ -
+ name: nd
+ type: binary
+ doc: struct ovs_key_nd
+ -
+ name: skb-mark
+ type: u32
+ -
+ name: tunnel
+ type: nest
+ nested-attributes: tunnel-key-attrs
+ -
+ name: sctp
+ type: binary
+ struct: ovs-key-sctp
+ -
+ name: tcp-flags
+ type: u16
+ byte-order: big-endian
+ -
+ name: dp-hash
+ type: u32
+ doc: Value 0 indicates the hash is not computed by the datapath.
+ -
+ name: recirc-id
+ type: u32
+ -
+ name: mpls
+ type: binary
+ struct: ovs-key-mpls
+ -
+ name: ct-state
+ type: u32
+ enum: ct-state-flags
+ enum-as-flags: true
+ -
+ name: ct-zone
+ type: u16
+ doc: connection tracking zone
+ -
+ name: ct-mark
+ type: u32
+ doc: connection tracking mark
+ -
+ name: ct-labels
+ type: binary
+ doc: 16-octet connection tracking label
+ -
+ name: ct-orig-tuple-ipv4
+ type: binary
+ struct: ovs-key-ct-tuple-ipv4
+ -
+ name: ct-orig-tuple-ipv6
+ type: binary
+ doc: struct ovs_key_ct_tuple_ipv6
+ -
+ name: nsh
+ type: nest
+ nested-attributes: ovs-nsh-key-attrs
+ -
+ name: packet-type
+ type: u32
+ byte-order: big-endian
+ doc: Should not be sent to the kernel
+ -
+ name: nd-extensions
+ type: binary
+ doc: Should not be sent to the kernel
+ -
+ name: tunnel-info
+ type: binary
+ doc: struct ip_tunnel_info
+ -
+ name: ipv6-exthdrs
+ type: binary
+ doc: struct ovs_key_ipv6_exthdr
+ -
+ name: action-attrs
+ attributes:
+ -
+ name: output
+ type: u32
+ doc: ovs port number in datapath
+ -
+ name: userspace
+ type: nest
+ nested-attributes: userspace-attrs
+ -
+ name: set
+ type: nest
+ nested-attributes: key-attrs
+ doc: Replaces the contents of an existing header. The single nested attribute specifies a header to modify and its value.
+ -
+ name: push-vlan
+ type: binary
+ struct: ovs-action-push-vlan
+ doc: Push a new outermost 802.1Q or 802.1ad header onto the packet.
+ -
+ name: pop-vlan
+ type: flag
+ doc: Pop the outermost 802.1Q or 802.1ad header from the packet.
+ -
+ name: sample
+ type: nest
+ nested-attributes: sample-attrs
+ doc: |
+ Probabilistically executes actions, as specified in the nested attributes.
+ -
+ name: recirc
+ type: u32
+ doc: recirc id
+ -
+ name: hash
+ type: binary
+ struct: ovs-action-hash
+ -
+ name: push-mpls
+ type: binary
+ struct: ovs-action-push-mpls
+ doc: |
+ Push a new MPLS label stack entry onto the top of the packets MPLS
+ label stack. Set the ethertype of the encapsulating frame to either
+ ETH_P_MPLS_UC or ETH_P_MPLS_MC to indicate the new packet contents.
+ -
+ name: pop-mpls
+ type: u16
+ byte-order: big-endian
+ doc: ethertype
+ -
+ name: set-masked
+ type: nest
+ nested-attributes: key-attrs
+ doc: |
+ Replaces the contents of an existing header. A nested attribute
+ specifies a header to modify, its value, and a mask. For every bit set
+ in the mask, the corresponding bit value is copied from the value to
+ the packet header field, rest of the bits are left unchanged. The
+ non-masked value bits must be passed in as zeroes. Masking is not
+ supported for the OVS_KEY_ATTR_TUNNEL attribute.
+ -
+ name: ct
+ type: nest
+ nested-attributes: ct-attrs
+ doc: |
+ Track the connection. Populate the conntrack-related entries
+ in the flow key.
+ -
+ name: trunc
+ type: u32
+ doc: struct ovs_action_trunc is a u32 max length
+ -
+ name: push-eth
+ type: binary
+ doc: struct ovs_action_push_eth
+ -
+ name: pop-eth
+ type: flag
+ -
+ name: ct-clear
+ type: flag
+ -
+ name: push-nsh
+ type: nest
+ nested-attributes: ovs-nsh-key-attrs
+ doc: |
+ Push NSH header to the packet.
+ -
+ name: pop-nsh
+ type: flag
+ doc: |
+ Pop the outermost NSH header off the packet.
+ -
+ name: meter
+ type: u32
+ doc: |
+ Run packet through a meter, which may drop the packet, or modify the
+ packet (e.g., change the DSCP field)
+ -
+ name: clone
+ type: nest
+ nested-attributes: action-attrs
+ doc: |
+ Make a copy of the packet and execute a list of actions without
+ affecting the original packet and key.
+ -
+ name: check-pkt-len
+ type: nest
+ nested-attributes: check-pkt-len-attrs
+ doc: |
+ Check the packet length and execute a set of actions if greater than
+ the specified packet length, else execute another set of actions.
+ -
+ name: add-mpls
+ type: binary
+ struct: ovs-action-add-mpls
+ doc: |
+ Push a new MPLS label stack entry at the start of the packet or at the
+ start of the l3 header depending on the value of l3 tunnel flag in the
+ tun_flags field of this OVS_ACTION_ATTR_ADD_MPLS argument.
+ -
+ name: dec-ttl
+ type: nest
+ nested-attributes: dec-ttl-attrs
+ -
+ name: tunnel-key-attrs
+ attributes:
+ -
+ name: id
+ type: u64
+ byte-order: big-endian
+ value: 0
+ -
+ name: ipv4-src
+ type: u32
+ byte-order: big-endian
+ -
+ name: ipv4-dst
+ type: u32
+ byte-order: big-endian
+ -
+ name: tos
+ type: u8
+ -
+ name: ttl
+ type: u8
+ -
+ name: dont-fragment
+ type: flag
+ -
+ name: csum
+ type: flag
+ -
+ name: oam
+ type: flag
+ -
+ name: geneve-opts
+ type: binary
+ sub-type: u32
+ -
+ name: tp-src
+ type: u16
+ byte-order: big-endian
+ -
+ name: tp-dst
+ type: u16
+ byte-order: big-endian
+ -
+ name: vxlan-opts
+ type: nest
+ nested-attributes: vxlan-ext-attrs
+ -
+ name: ipv6-src
+ type: binary
+ doc: |
+ struct in6_addr source IPv6 address
+ -
+ name: ipv6-dst
+ type: binary
+ doc: |
+ struct in6_addr destination IPv6 address
+ -
+ name: pad
+ type: binary
+ -
+ name: erspan-opts
+ type: binary
+ doc: |
+ struct erspan_metadata
+ -
+ name: ipv4-info-bridge
+ type: flag
+ -
+ name: check-pkt-len-attrs
+ attributes:
+ -
+ name: pkt-len
+ type: u16
+ -
+ name: actions-if-greater
+ type: nest
+ nested-attributes: action-attrs
+ -
+ name: actions-if-less-equal
+ type: nest
+ nested-attributes: action-attrs
+ -
+ name: sample-attrs
+ attributes:
+ -
+ name: probability
+ type: u32
+ -
+ name: actions
+ type: nest
+ nested-attributes: action-attrs
+ -
+ name: userspace-attrs
+ attributes:
+ -
+ name: pid
+ type: u32
+ -
+ name: userdata
+ type: binary
+ -
+ name: egress-tun-port
+ type: u32
+ -
+ name: actions
+ type: flag
+ -
+ name: ovs-nsh-key-attrs
+ attributes:
+ -
+ name: base
+ type: binary
+ -
+ name: md1
+ type: binary
+ -
+ name: md2
+ type: binary
+ -
+ name: ct-attrs
+ attributes:
+ -
+ name: commit
+ type: flag
+ -
+ name: zone
+ type: u16
+ -
+ name: mark
+ type: binary
+ -
+ name: labels
+ type: binary
+ -
+ name: helper
+ type: string
+ -
+ name: nat
+ type: nest
+ nested-attributes: nat-attrs
+ -
+ name: force-commit
+ type: flag
+ -
+ name: eventmask
+ type: u32
+ -
+ name: timeout
+ type: string
+ -
+ name: nat-attrs
+ attributes:
+ -
+ name: src
+ type: binary
+ -
+ name: dst
+ type: binary
+ -
+ name: ip-min
+ type: binary
+ -
+ name: ip-max
+ type: binary
+ -
+ name: proto-min
+ type: binary
+ -
+ name: proto-max
+ type: binary
+ -
+ name: persistent
+ type: binary
+ -
+ name: proto-hash
+ type: binary
+ -
+ name: proto-random
+ type: binary
+ -
+ name: dec-ttl-attrs
+ attributes:
+ -
+ name: action
+ type: nest
+ nested-attributes: action-attrs
+ -
+ name: vxlan-ext-attrs
+ attributes:
+ -
+ name: gbp
+ type: u32
+
+operations:
+ fixed-header: ovs-header
+ list:
+ -
+ name: flow-get
+ doc: Get / dump OVS flow configuration and state
+ value: 3
+ attribute-set: flow-attrs
+ do: &flow-get-op
+ request:
+ attributes:
+ - dp-ifindex
+ - key
+ - ufid
+ - ufid-flags
+ reply:
+ attributes:
+ - dp-ifindex
+ - key
+ - ufid
+ - mask
+ - stats
+ - actions
+ dump: *flow-get-op
+
+mcast-groups:
+ list:
+ -
+ name: ovs_flow
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst
index 69695e5511f4..e4d065c55ea8 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst
@@ -84,24 +84,6 @@ Once the VM shuts down, or otherwise releases the VF, the command will
complete.
-Important notes for SR-IOV and Link Aggregation
------------------------------------------------
-Link Aggregation is mutually exclusive with SR-IOV.
-
-- If Link Aggregation is active, SR-IOV VFs cannot be created on the PF.
-- If SR-IOV is active, you cannot set up Link Aggregation on the interface.
-
-Bridging and MACVLAN are also affected by this. If you wish to use bridging or
-MACVLAN with SR-IOV, you must set up bridging or MACVLAN before enabling
-SR-IOV. If you are using bridging or MACVLAN in conjunction with SR-IOV, and
-you want to remove the interface from the bridge or MACVLAN, you must follow
-these steps:
-
-1. Destroy SR-IOV VFs if they exist
-2. Remove the interface from the bridge or MACVLAN
-3. Recreate SRIOV VFs as needed
-
-
Additional Features and Configurations
======================================
diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
index 5ba9015336e2..bfd233cfac35 100644
--- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
+++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
@@ -13,6 +13,7 @@ Contents
- `Drivers`_
- `Basic packet flow`_
- `Devlink health reporters`_
+- `Quality of service`_
Overview
========
@@ -287,3 +288,47 @@ For example::
NIX_AF_ERR:
NIX Error Interrupt Reg : 64
Rx on unmapped PF_FUNC
+
+
+Quality of service
+==================
+
+
+Hardware algorithms used in scheduling
+--------------------------------------
+
+octeontx2 silicon and CN10K transmit interface consists of five transmit levels
+starting from SMQ/MDQ, TL4 to TL1. Each packet will traverse MDQ, TL4 to TL1
+levels. Each level contains an array of queues to support scheduling and shaping.
+The hardware uses the below algorithms depending on the priority of scheduler queues.
+once the usercreates tc classes with different priorities, the driver configures
+schedulers allocated to the class with specified priority along with rate-limiting
+configuration.
+
+1. Strict Priority
+
+ - Once packets are submitted to MDQ, hardware picks all active MDQs having different priority
+ using strict priority.
+
+2. Round Robin
+
+ - Active MDQs having the same priority level are chosen using round robin.
+
+
+Setup HTB offload
+-----------------
+
+1. Enable HW TC offload on the interface::
+
+ # ethtool -K <interface> hw-tc-offload on
+
+2. Crate htb root::
+
+ # tc qdisc add dev <interface> clsact
+ # tc qdisc replace dev <interface> root handle 1: htb offload
+
+3. Create tc classes with different priorities::
+
+ # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1
+
+ # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 6ec06a33688a..3f6d3d5f5626 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -881,9 +881,10 @@ tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 127. Default value
- is 6, which corresponds to 63seconds till the last retransmission
- with the current initial RTO of 1second. With this the final timeout
- for an active TCP connection attempt will happen after 127seconds.
+ is 6, which corresponds to 67seconds (with tcp_syn_linear_timeouts = 4)
+ till the last retransmission with the current initial RTO of 1second.
+ With this the final timeout for an active TCP connection attempt
+ will happen after 131seconds.
tcp_timestamps - INTEGER
Enable timestamps as defined in RFC1323.
@@ -946,6 +947,16 @@ tcp_pacing_ca_ratio - INTEGER
Default: 120
+tcp_syn_linear_timeouts - INTEGER
+ The number of times for an active TCP connection to retransmit SYNs with
+ a linear backoff timeout before defaulting to an exponential backoff
+ timeout. This has no effect on SYNACK at the passive TCP side.
+
+ With an initial RTO of 1 and tcp_syn_linear_timeouts = 4 we would
+ expect SYN RTOs to be: 1, 1, 1, 1, 1, 2, 4, ... (4 linear timeouts,
+ and the first exponential backoff using 2^0 * initial_RTO).
+ Default: 4
+
tcp_tso_win_divisor - INTEGER
This allows control over what percentage of the congestion window
can be consumed by a single TSO frame.