diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-27 12:03:20 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-27 12:03:20 -0700 |
commit | 468fc7ed5537615efe671d94248446ac24679773 (patch) | |
tree | 27bc9de792e863d6ec1630927b77ac9e7dabb38a /Documentation | |
parent | 08fd8c17686c6b09fa410a26d516548dd80ff147 (diff) | |
parent | 36232012344b8db67052432742deaf17f82e70e6 (diff) | |
download | linux-stable-468fc7ed5537615efe671d94248446ac24679773.tar.gz linux-stable-468fc7ed5537615efe671d94248446ac24679773.tar.bz2 linux-stable-468fc7ed5537615efe671d94248446ac24679773.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.
2) Make DSA binding more sane, from Andrew Lunn.
3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.
4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.
5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
packets as soon as the device sees them, with the option to mirror
the packet on TX via the same interface. From Brenden Blanco and
others.
6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.
7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.
8) Simplify netlink conntrack entry layout, from Florian Westphal.
9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
Schimmel, Yotam Gigi, and Jiri Pirko.
10) Add SKB array infrastructure and convert tun and macvtap over to it.
From Michael S Tsirkin and Jason Wang.
11) Support qdisc packet injection in pktgen, from John Fastabend.
12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.
13) Add NV congestion control support to TCP, from Lawrence Brakmo.
14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.
15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.
16) Support MPLS over IPV4, from Simon Horman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
xgene: Fix build warning with ACPI disabled.
be2net: perform temperature query in adapter regardless of its interface state
l2tp: Correctly return -EBADF from pppol2tp_getname.
net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
net: ipmr/ip6mr: update lastuse on entry change
macsec: ensure rx_sa is set when validation is disabled
tipc: dump monitor attributes
tipc: add a function to get the bearer name
tipc: get monitor threshold for the cluster
tipc: make cluster size threshold for monitoring configurable
tipc: introduce constants for tipc address validation
net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
MAINTAINERS: xgene: Add driver and documentation path
Documentation: dtb: xgene: Add MDIO node
dtb: xgene: Add MDIO node
drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
drivers: net: xgene: Use exported functions
drivers: net: xgene: Enable MDIO driver
drivers: net: xgene: Add backward compatibility
drivers: net: phy: xgene: Add MDIO driver
...
Diffstat (limited to 'Documentation')
26 files changed, 953 insertions, 129 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-net-batman-adv b/Documentation/ABI/testing/sysfs-class-net-batman-adv index 518f6a1dbc0c..898106849e27 100644 --- a/Documentation/ABI/testing/sysfs-class-net-batman-adv +++ b/Documentation/ABI/testing/sysfs-class-net-batman-adv @@ -1,19 +1,10 @@ -What: /sys/class/net/<iface>/batman-adv/throughput_override -Date: Feb 2014 -Contact: Antonio Quartulli <antonio@meshcoding.com> -description: - Defines the throughput value to be used by B.A.T.M.A.N. V - when estimating the link throughput using this interface. - If the value is set to 0 then batman-adv will try to - estimate the throughput by itself. - What: /sys/class/net/<iface>/batman-adv/elp_interval Date: Feb 2014 Contact: Linus Lüssing <linus.luessing@web.de> Description: Defines the interval in milliseconds in which batman - sends its probing packets for link quality measurements. + emits probing packets for neighbor sensing (ELP). What: /sys/class/net/<iface>/batman-adv/iface_status Date: May 2010 @@ -28,3 +19,12 @@ Description: The /sys/class/net/<iface>/batman-adv/mesh_iface file displays the batman mesh interface this <iface> currently is associated with. + +What: /sys/class/net/<iface>/batman-adv/throughput_override +Date: Feb 2014 +Contact: Antonio Quartulli <a@unstable.cc> +description: + Defines the throughput value to be used by B.A.T.M.A.N. V + when estimating the link throughput using this interface. + If the value is set to 0 then batman-adv will try to + estimate the throughput by itself. diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl index 5f7c55999c77..800fe7a9024c 100644 --- a/Documentation/DocBook/80211.tmpl +++ b/Documentation/DocBook/80211.tmpl @@ -136,6 +136,7 @@ !Finclude/net/cfg80211.h cfg80211_ibss_joined !Finclude/net/cfg80211.h cfg80211_connect_result !Finclude/net/cfg80211.h cfg80211_connect_bss +!Finclude/net/cfg80211.h cfg80211_connect_timeout !Finclude/net/cfg80211.h cfg80211_roamed !Finclude/net/cfg80211.h cfg80211_disconnected !Finclude/net/cfg80211.h cfg80211_ready_on_channel diff --git a/Documentation/devicetree/bindings/net/apm-xgene-mdio.txt b/Documentation/devicetree/bindings/net/apm-xgene-mdio.txt new file mode 100644 index 000000000000..78722d74cea8 --- /dev/null +++ b/Documentation/devicetree/bindings/net/apm-xgene-mdio.txt @@ -0,0 +1,37 @@ +APM X-Gene SoC MDIO node + +MDIO node is defined to describe on-chip MDIO controller. + +Required properties: + - compatible: Must be "apm,xgene-mdio-rgmii" or "apm,xgene-mdio-xfi" + - #address-cells: Must be <1>. + - #size-cells: Must be <0>. + - reg: Address and length of the register set + - clocks: Reference to the clock entry + +For the phys on the mdio bus, there must be a node with the following fields: + - compatible: PHY identifier. Please refer ./phy.txt for the format. + - reg: The ID number for the phy. + +Example: + + mdio: mdio@17020000 { + compatible = "apm,xgene-mdio-rgmii"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x0 0x17020000 0x0 0xd100>; + clocks = <&menetclk 0>; + }; + + /* Board-specific peripheral configurations */ + &mdio { + menetphy: phy@3 { + reg = <0x3>; + }; + sgenet0phy: phy@4 { + reg = <0x4>; + }; + sgenet1phy: phy@5 { + reg = <0x5>; + }; + }; diff --git a/Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt b/Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt new file mode 100644 index 000000000000..dfe287a5d6f2 --- /dev/null +++ b/Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt @@ -0,0 +1,59 @@ +Properties for an MDIO bus multiplexer found in Broadcom iProc based SoCs. + +This MDIO bus multiplexer defines buses that could be internal as well as +external to SoCs and could accept MDIO transaction compatible to C-22 or +C-45 Clause. When child bus is selected, one needs to select these two +properties as well to generate desired MDIO transaction on appropriate bus. + +Required properties in addition to the generic multiplexer properties: + +MDIO multiplexer node: +- compatible: brcm,mdio-mux-iproc. + +Every non-ethernet PHY requires a compatible so that it could be probed based +on this compatible string. + +Additional information regarding generic multiplexer properties can be found +at- Documentation/devicetree/bindings/net/mdio-mux.txt + + +for example: + mdio_mux_iproc: mdio-mux@6602023c { + compatible = "brcm,mdio-mux-iproc"; + reg = <0x6602023c 0x14>; + #address-cells = <1>; + #size-cells = <0>; + + mdio@0 { + reg = <0x0>; + #address-cells = <1>; + #size-cells = <0>; + + pci_phy0: pci-phy@0 { + compatible = "brcm,ns2-pcie-phy"; + reg = <0x0>; + #phy-cells = <0>; + }; + }; + + mdio@7 { + reg = <0x7>; + #address-cells = <1>; + #size-cells = <0>; + + pci_phy1: pci-phy@0 { + compatible = "brcm,ns2-pcie-phy"; + reg = <0x0>; + #phy-cells = <0>; + }; + }; + mdio@10 { + reg = <0x10>; + #address-cells = <1>; + #size-cells = <0>; + + gphy0: eth-phy@10 { + reg = <0x10>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/can/rcar_canfd.txt b/Documentation/devicetree/bindings/net/can/rcar_canfd.txt new file mode 100644 index 000000000000..22a6f10bab05 --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/rcar_canfd.txt @@ -0,0 +1,96 @@ +Renesas R-Car CAN FD controller Device Tree Bindings +---------------------------------------------------- + +Required properties: +- compatible: Must contain one or more of the following: + - "renesas,rcar-gen3-canfd" for R-Car Gen3 compatible controller. + - "renesas,r8a7795-canfd" for R8A7795 (R-Car H3) compatible controller. + + When compatible with the generic version, nodes must list the + SoC-specific version corresponding to the platform first, followed by the + family-specific and/or generic versions. + +- reg: physical base address and size of the R-Car CAN FD register map. +- interrupts: interrupt specifier for the Global & Channel interrupts +- clocks: phandles and clock specifiers for 3 clock inputs. +- clock-names: 3 clock input name strings: "fck", "canfd", "can_clk". +- pinctrl-0: pin control group to be used for this controller. +- pinctrl-names: must be "default". + +Required child nodes: +The controller supports two channels and each is represented as a child node. +The name of the child nodes are "channel0" and "channel1" respectively. Each +child node supports the "status" property only, which is used to +enable/disable the respective channel. + +Required properties for "renesas,r8a7795-canfd" compatible: +In R8A7795 SoC, canfd clock is a div6 clock and can be used by both CAN +and CAN FD controller at the same time. It needs to be scaled to maximum +frequency if any of these controllers use it. This is done using the +below properties. + +- assigned-clocks: phandle of canfd clock. +- assigned-clock-rates: maximum frequency of this clock. + +Optional property: +The controller can operate in either CAN FD only mode (default) or +Classical CAN only mode. The mode is global to both the channels. In order to +enable the later, define the following optional property. + - renesas,no-can-fd: puts the controller in Classical CAN only mode. + +Example +------- + +SoC common .dtsi file: + + canfd: can@e66c0000 { + compatible = "renesas,r8a7795-canfd", + "renesas,rcar-gen3-canfd"; + reg = <0 0xe66c0000 0 0x8000>; + interrupts = <GIC_SPI 29 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 30 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cpg CPG_MOD 914>, + <&cpg CPG_CORE R8A7795_CLK_CANFD>, + <&can_clk>; + clock-names = "fck", "canfd", "can_clk"; + assigned-clocks = <&cpg CPG_CORE R8A7795_CLK_CANFD>; + assigned-clock-rates = <40000000>; + power-domains = <&cpg>; + status = "disabled"; + + channel0 { + status = "disabled"; + }; + + channel1 { + status = "disabled"; + }; + }; + +Board specific .dts file: + +E.g. below enables Channel 1 alone in the board in Classical CAN only mode. + +&canfd { + pinctrl-0 = <&canfd1_pins>; + pinctrl-names = "default"; + renesas,no-can-fd; + status = "okay"; + + channel1 { + status = "okay"; + }; +}; + +E.g. below enables Channel 0 alone in the board using External clock +as fCAN clock. + +&canfd { + pinctrl-0 = <&canfd0_pins &can_clk_pins>; + pinctrl-names = "default"; + status = "okay"; + + channel0 { + status = "okay"; + }; +}; diff --git a/Documentation/devicetree/bindings/net/cirrus,cs89x0.txt b/Documentation/devicetree/bindings/net/cirrus,cs89x0.txt new file mode 100644 index 000000000000..c070076bacb9 --- /dev/null +++ b/Documentation/devicetree/bindings/net/cirrus,cs89x0.txt @@ -0,0 +1,13 @@ +* Cirrus Logic CS8900/CS8920 Network Controller + +Required properties: +- compatible : Should be "cirrus,cs8900" or "cirrus,cs8920". +- reg : Address and length of the IO space. +- interrupts : Should contain the controller interrupt line. + +Examples: + eth0: eth@10000000 { + compatible = "cirrus,cs8900"; + reg = <0x10000000 0x400>; + interrupts = <10>; + }; diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt index 0ae06491b430..5ad439f30135 100644 --- a/Documentation/devicetree/bindings/net/cpsw.txt +++ b/Documentation/devicetree/bindings/net/cpsw.txt @@ -15,7 +15,6 @@ Required properties: - cpdma_channels : Specifies number of channels in CPDMA - ale_entries : Specifies No of entries ALE can hold - bd_ram_size : Specifies internal descriptor RAM size -- rx_descs : Specifies number of Rx descriptors - mac_control : Specifies Default MAC control register content for the specific platform - slaves : Specifies number for slaves diff --git a/Documentation/devicetree/bindings/net/davinci-mdio.txt b/Documentation/devicetree/bindings/net/davinci-mdio.txt index 0369e25aabd2..621156ca4ffd 100644 --- a/Documentation/devicetree/bindings/net/davinci-mdio.txt +++ b/Documentation/devicetree/bindings/net/davinci-mdio.txt @@ -2,7 +2,10 @@ TI SoC Davinci/Keystone2 MDIO Controller Device Tree Bindings --------------------------------------------------- Required properties: -- compatible : Should be "ti,davinci_mdio" or "ti,keystone_mdio" +- compatible : Should be "ti,davinci_mdio" + and "ti,keystone_mdio" for Keystone 2 SoCs + and "ti,cpsw-mdio" for am335x, am472x, am57xx/dra7, dm814x SoCs + and "ti,am4372-mdio" for am472x SoC - reg : physical base address and size of the davinci mdio registers map - bus_freq : Mdio Bus frequency diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt b/Documentation/devicetree/bindings/net/dsa/b53.txt new file mode 100644 index 000000000000..d6c6e41648d4 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/b53.txt @@ -0,0 +1,97 @@ +Broadcom BCM53xx Ethernet switches +================================== + +Required properties: + +- compatible: For external switch chips, compatible string must be exactly one + of: "brcm,bcm5325" + "brcm,bcm53115" + "brcm,bcm53125" + "brcm,bcm53128" + "brcm,bcm5365" + "brcm,bcm5395" + "brcm,bcm5397" + "brcm,bcm5398" + + For the BCM5310x SoCs with an integrated switch, must be one of: + "brcm,bcm53010-srab" + "brcm,bcm53011-srab" + "brcm,bcm53012-srab" + "brcm,bcm53018-srab" + "brcm,bcm53019-srab" and the mandatory "brcm,bcm5301x-srab" string + + For the BCM585xx/586XX/88312 SoCs with an integrated switch, must be one of: + "brcm,bcm58522-srab" + "brcm,bcm58523-srab" + "brcm,bcm58525-srab" + "brcm,bcm58622-srab" + "brcm,bcm58623-srab" + "brcm,bcm58625-srab" + "brcm,bcm88312-srab" and the mandatory "brcm,nsp-srab string + + For the BCM63xx/33xx SoCs with an integrated switch, must be one of: + "brcm,bcm3384-switch" + "brcm,bcm6328-switch" + "brcm,bcm6368-switch" and the mandatory "brcm,bcm63xx-switch" + +See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional +required and optional properties. + +Examples: + +Ethernet switch connected via MDIO to the host, CPU port wired to eth0: + + eth0: ethernet@10001000 { + compatible = "brcm,unimac"; + reg = <0x10001000 0x1000>; + + fixed-link { + speed = <1000>; + duplex-full; + }; + }; + + mdio0: mdio@10000000 { + compatible = "brcm,unimac-mdio"; + #address-cells = <1>; + #size-cells = <0>; + + switch0: ethernet-switch@30 { + compatible = "brcm,bcm53125"; + #address-cells = <1>; + #size-cells = <0>; + + ports { + port0@0 { + reg = <0>; + label = "lan1"; + }; + + port1@1 { + reg = <1>; + label = "lan2"; + }; + + port5@5 { + reg = <5>; + label = "cable-modem"; + fixed-link { + speed = <1000>; + duplex-full; + }; + phy-mode = "rgmii-txid"; + }; + + port8@8 { + reg = <8>; + label = "cpu"; + fixed-link { + speed = <1000>; + duplex-full; + }; + phy-mode = "rgmii-txid"; + ethernet = <ð0>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt index 9f4807f90c31..9bbbe7f87d67 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt @@ -1,5 +1,279 @@ -Marvell Distributed Switch Architecture Device Tree Bindings ------------------------------------------------------------- +Distributed Switch Architecture Device Tree Bindings +---------------------------------------------------- + +Two bindings exist, one of which has been deprecated due to +limitations. + +Current Binding +--------------- + +Switches are true Linux devices and can be probes by any means. Once +probed, they register to the DSA framework, passing a node +pointer. This node is expected to fulfil the following binding, and +may contain additional properties as required by the device it is +embedded within. + +Required properties: + +- ports : A container for child nodes representing switch ports. + +Optional properties: + +- dsa,member : A two element list indicates which DSA cluster, and position + within the cluster a switch takes. <0 0> is cluster 0, + switch 0. <0 1> is cluster 0, switch 1. <1 0> is cluster 1, + switch 0. A switch not part of any cluster (single device + hanging off a CPU port) must not specify this property + +The ports container has the following properties + +Required properties: + +- #address-cells : Must be 1 +- #size-cells : Must be 0 + +Each port children node must have the following mandatory properties: +- reg : Describes the port address in the switch +- label : Describes the label associated with this port, which + will become the netdev name. Special labels are + "cpu" to indicate a CPU port and "dsa" to + indicate an uplink/downlink port between switches in + the cluster. + +A port labelled "dsa" has the following mandatory property: + +- link : Should be a list of phandles to other switch's DSA + port. This port is used as the outgoing port + towards the phandle ports. The full routing + information must be given, not just the one hop + routes to neighbouring switches. + +A port labelled "cpu" has the following mandatory property: + +- ethernet : Should be a phandle to a valid Ethernet device node. + This host device is what the switch port is + connected to. + +Port child nodes may also contain the following optional standardised +properties, described in binding documents: + +- phy-handle : Phandle to a PHY on an MDIO bus. See + Documentation/devicetree/bindings/net/ethernet.txt + for details. + +- phy-mode : See + Documentation/devicetree/bindings/net/ethernet.txt + for details. + +- fixed-link : Fixed-link subnode describing a link to a non-MDIO + managed entity. See + Documentation/devicetree/bindings/net/fixed-link.txt + for details. + +Example + +The following example shows three switches on three MDIO busses, +linked into one DSA cluster. + +&mdio1 { + #address-cells = <1>; + #size-cells = <0>; + + switch0: switch0@0 { + compatible = "marvell,mv88e6085"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + + dsa,member = <0 0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "lan0"; + }; + + port@1 { + reg = <1>; + label = "lan1"; + }; + + port@2 { + reg = <2>; + label = "lan2"; + }; + + switch0port5: port@5 { + reg = <5>; + label = "dsa"; + phy-mode = "rgmii-txid"; + link = <&switch1port6 + &switch2port9>; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + + port@6 { + reg = <6>; + label = "cpu"; + ethernet = <&fec1>; + fixed-link { + speed = <100>; + full-duplex; + }; + }; + }; + }; +}; + +&mdio2 { + #address-cells = <1>; + #size-cells = <0>; + + switch1: switch1@0 { + compatible = "marvell,mv88e6085"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + + dsa,member = <0 1>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "lan3"; + phy-handle = <&switch1phy0>; + }; + + port@1 { + reg = <1>; + label = "lan4"; + phy-handle = <&switch1phy1>; + }; + + port@2 { + reg = <2>; + label = "lan5"; + phy-handle = <&switch1phy2>; + }; + + switch1port5: port@5 { + reg = <5>; + label = "dsa"; + link = <&switch2port9>; + phy-mode = "rgmii-txid"; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + + switch1port6: port@6 { + reg = <6>; + label = "dsa"; + phy-mode = "rgmii-txid"; + link = <&switch0port5>; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + mdio-bus { + #address-cells = <1>; + #size-cells = <0>; + switch1phy0: switch1phy0@0 { + reg = <0>; + }; + switch1phy1: switch1phy0@1 { + reg = <1>; + }; + switch1phy2: switch1phy0@2 { + reg = <2>; + }; + }; + }; +}; + +&mdio4 { + #address-cells = <1>; + #size-cells = <0>; + + switch2: switch2@0 { + compatible = "marvell,mv88e6085"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + + dsa,member = <0 2>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "lan6"; + }; + + port@1 { + reg = <1>; + label = "lan7"; + }; + + port@2 { + reg = <2>; + label = "lan8"; + }; + + port@3 { + reg = <3>; + label = "optical3"; + fixed-link { + speed = <1000>; + full-duplex; + link-gpios = <&gpio6 2 + GPIO_ACTIVE_HIGH>; + }; + }; + + port@4 { + reg = <4>; + label = "optical4"; + fixed-link { + speed = <1000>; + full-duplex; + link-gpios = <&gpio6 3 + GPIO_ACTIVE_HIGH>; + }; + }; + + switch2port9: port@9 { + reg = <9>; + label = "dsa"; + phy-mode = "rgmii-txid"; + link = <&switch1port5 + &switch0port5>; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + }; +}; + +Deprecated Binding +------------------ + +The deprecated binding makes use of a platform device to represent the +switches. The switches themselves are not Linux devices, and make use +of an MDIO bus for management. Required properties: - compatible : Should be "marvell,dsa" diff --git a/Documentation/devicetree/bindings/net/hisilicon-femac-mdio.txt b/Documentation/devicetree/bindings/net/hisilicon-femac-mdio.txt new file mode 100644 index 000000000000..23a39a309d17 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-femac-mdio.txt @@ -0,0 +1,22 @@ +Hisilicon Fast Ethernet MDIO Controller interface + +Required properties: +- compatible: should be "hisilicon,hisi-femac-mdio". +- reg: address and length of the register set for the device. +- clocks: A phandle to the reference clock for this device. + +- PHY subnode: inherits from phy binding [1] +[1] Documentation/devicetree/bindings/net/phy.txt + +Example: +mdio: mdio@10091100 { + compatible = "hisilicon,hisi-femac-mdio"; + reg = <0x10091100 0x10>; + clocks = <&crg HI3516CV300_MDIO_CLK>; + #address-cells = <1>; + #size-cells = <0>; + + phy0: phy@1 { + reg = <1>; + }; +}; diff --git a/Documentation/devicetree/bindings/net/hisilicon-femac.txt b/Documentation/devicetree/bindings/net/hisilicon-femac.txt new file mode 100644 index 000000000000..d11af5ecace8 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-femac.txt @@ -0,0 +1,39 @@ +Hisilicon Fast Ethernet MAC controller + +Required properties: +- compatible: should contain one of the following version strings: + * "hisilicon,hisi-femac-v1" + * "hisilicon,hisi-femac-v2" + and the soc string "hisilicon,hi3516cv300-femac". +- reg: specifies base physical address(s) and size of the device registers. + The first region is the MAC core register base and size. + The second region is the global MAC control register. +- interrupts: should contain the MAC interrupt. +- clocks: A phandle to the MAC main clock. +- resets: should contain the phandle to the MAC reset signal(required) and + the PHY reset signal(optional). +- reset-names: should contain the reset signal name "mac"(required) + and "phy"(optional). +- mac-address: see ethernet.txt [1]. +- phy-mode: see ethernet.txt [1]. +- phy-handle: see ethernet.txt [1]. +- hisilicon,phy-reset-delays-us: triplet of delays if PHY reset signal given. + The 1st cell is reset pre-delay in micro seconds. + The 2nd cell is reset pulse in micro seconds. + The 3rd cell is reset post-delay in micro seconds. + +[1] Documentation/devicetree/bindings/net/ethernet.txt + +Example: + hisi_femac: ethernet@10090000 { + compatible = "hisilicon,hi3516cv300-femac","hisilicon,hisi-femac-v2"; + reg = <0x10090000 0x1000>,<0x10091300 0x200>; + interrupts = <12>; + clocks = <&crg HI3518EV200_ETH_CLK>; + resets = <&crg 0xec 0>,<&crg 0xec 3>; + reset-names = "mac","phy"; + mac-address = [00 00 00 00 00 00]; + phy-mode = "mii"; + phy-handle = <&phy0>; + hisilicon,phy-reset-delays-us = <10000 20000 20000>; + }; diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt index b30ab6b5cbfa..04ba1dc34fd6 100644 --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt @@ -2,7 +2,7 @@ This document describes the device tree bindings associated with the keystone network coprocessor(NetCP) driver support. The network coprocessor (NetCP) is a hardware accelerator that processes -Ethernet packets. NetCP has a gigabit Ethernet (GbE) subsytem with a ethernet +Ethernet packets. NetCP has a gigabit Ethernet (GbE) subsystem with a ethernet switch sub-module to send and receive packets. NetCP also includes a packet accelerator (PA) module to perform packet classification operations such as header matching, and packet modification operations such as checksum diff --git a/Documentation/devicetree/bindings/net/mdio-mux.txt b/Documentation/devicetree/bindings/net/mdio-mux.txt index 491f5bd55203..f58571f36570 100644 --- a/Documentation/devicetree/bindings/net/mdio-mux.txt +++ b/Documentation/devicetree/bindings/net/mdio-mux.txt @@ -5,11 +5,12 @@ numbered uniquely in a device dependent manner. The nodes for an MDIO bus multiplexer/switch will have one child node for each child bus. Required properties: -- mdio-parent-bus : phandle to the parent MDIO bus. - #address-cells = <1>; - #size-cells = <0>; Optional properties: +- mdio-parent-bus : phandle to the parent MDIO bus. + - Other properties specific to the multiplexer/switch hardware. Required properties for child nodes: diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt index 87496a8c64ab..8d157f0295a5 100644 --- a/Documentation/devicetree/bindings/net/micrel.txt +++ b/Documentation/devicetree/bindings/net/micrel.txt @@ -35,3 +35,13 @@ Optional properties: supported clocks: - KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII reference input clock. Used to determine the XI input clock. + + - micrel,fiber-mode: If present the PHY is configured to operate in fiber mode + + Some PHYs, such as the KSZ8041FTL variant, support fiber mode, enabled + by the FXEN boot strapping pin. It can't be determined from the PHY + registers whether the PHY is in fiber mode, so this boolean device tree + property can be used to describe it. + + In fiber mode, auto-negotiation is disabled and the PHY can only work in + 100base-fx (full and half duplex) modes. diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt index 93eac7ce1446..cccd945fc45b 100644 --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt @@ -3,7 +3,8 @@ Rockchip SoC RK3288 10/100/1000 Ethernet driver(GMAC) The device node has following properties. Required properties: - - compatible: Can be one of "rockchip,rk3288-gmac", "rockchip,rk3368-gmac" + - compatible: Can be one of "rockchip,rk3228-gmac", "rockchip,rk3288-gmac", + "rockchip,rk3368-gmac" - reg: addresses and length of the register sets for the device. - interrupts: Should contain the GMAC interrupts. - interrupt-names: Should contain the interrupt names "macirq". diff --git a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt index 72d82d684342..2e68a3cd8513 100644 --- a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt +++ b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt @@ -17,9 +17,26 @@ Required properties: Optional properties: altr,emac-splitter: Should be the phandle to the emac splitter soft IP node if DWMAC controller is connected emac splitter. +phy-mode: The phy mode the ethernet operates in +altr,sgmii-to-sgmii-converter: phandle to the TSE SGMII converter + +This device node has additional phandle dependency, the sgmii converter: + +Required properties: + - compatible : Should be altr,gmii-to-sgmii-2.0 + - reg-names : Should be "eth_tse_control_port" Example: +gmii_to_sgmii_converter: phy@0x100000240 { + compatible = "altr,gmii-to-sgmii-2.0"; + reg = <0x00000001 0x00000240 0x00000008>, + <0x00000001 0x00000200 0x00000040>; + reg-names = "eth_tse_control_port"; + clocks = <&sgmii_1_clk_0 &emac1 1 &sgmii_clk_125 &sgmii_clk_125>; + clock-names = "tse_pcs_ref_clk_clock_connection", "tse_rx_cdr_refclk"; +}; + gmac0: ethernet@ff700000 { compatible = "altr,socfpga-stmmac", "snps,dwmac-3.70a", "snps,dwmac"; altr,sysmgr-syscon = <&sysmgr 0x60 0>; @@ -30,4 +47,6 @@ gmac0: ethernet@ff700000 { mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */ clocks = <&emac_0_clk>; clock-names = "stmmaceth"; + phy-mode = "sgmii"; + altr,gmii-to-sgmii-converter = <&gmii_to_sgmii_converter>; }; diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index 95816c5fc589..41b49e6075f5 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -47,6 +47,9 @@ Optional properties: supported by this device instance - snps,perfect-filter-entries: Number of perfect filter entries supported by this device instance +- snps,ps-speed: port selection speed that can be passed to the core when + PCS is supported. For example, this is used in case of SGMII + and MAC2MAC connection. - AXI BUS Mode parameters: below the list of all the parameters to program the AXI register inside the DMA module: - snps,lpi_en: enable Low Power Interface diff --git a/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt b/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt index 9180724e182c..8f9ced076fe1 100644 --- a/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt +++ b/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt @@ -1,19 +1,30 @@ -* Texas Instruments wl1271 wireless lan controller +* Texas Instruments wl12xx/wl18xx wireless lan controller -The wl1271 chip can be connected via SPI or via SDIO. This +The wl12xx/wl18xx chips can be connected via SPI or via SDIO. This document describes the binding for the SPI connected chip. Required properties: -- compatible : Should be "ti,wl1271" +- compatible : Should be one of the following: + * "ti,wl1271" + * "ti,wl1273" + * "ti,wl1281" + * "ti,wl1283" + * "ti,wl1801" + * "ti,wl1805" + * "ti,wl1807" + * "ti,wl1831" + * "ti,wl1835" + * "ti,wl1837" - reg : Chip select address of device - spi-max-frequency : Maximum SPI clocking speed of device in Hz -- ref-clock-frequency : Reference clock frequency - interrupt-parent, interrupts : Should contain parameters for 1 interrupt line. Interrupt parameters: parent, line number, type. -- vwlan-supply : Point the node of the regulator that powers/enable the wl1271 chip +- vwlan-supply : Point the node of the regulator that powers/enable the + wl12xx/wl18xx chip Optional properties: +- ref-clock-frequency : Reference clock frequency (should be set for wl12xx) - clock-xtal : boolean, clock is generated from XTAL - Please consult Documentation/devicetree/bindings/spi/spi-bus.txt @@ -21,16 +32,28 @@ Optional properties: Examples: +For wl12xx family: &spi1 { - wl1271@1 { + wlcore: wlcore@1 { compatible = "ti,wl1271"; - reg = <1>; spi-max-frequency = <48000000>; - clock-xtal; - ref-clock-frequency = <38400000>; interrupt-parent = <&gpio3>; interrupts = <8 IRQ_TYPE_LEVEL_HIGH>; vwlan-supply = <&vwlan_fixed>; + clock-xtal; + ref-clock-frequency = <38400000>; + }; +}; + +For wl18xx family: +&spi0 { + wlcore: wlcore@0 { + compatible = "ti,wl1835"; + reg = <0>; + spi-max-frequency = <48000000>; + interrupt-parent = <&gpio0>; + interrupts = <27 IRQ_TYPE_EDGE_RISING>; + vwlan-supply = <&vwlan_fixed>; }; }; diff --git a/Documentation/devicetree/bindings/phy/brcm,mdio-mux-bus-pci.txt b/Documentation/devicetree/bindings/phy/brcm,mdio-mux-bus-pci.txt new file mode 100644 index 000000000000..5b51007c6f24 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/brcm,mdio-mux-bus-pci.txt @@ -0,0 +1,27 @@ +* Broadcom NS2 PCIe PHY binding document + +Required bus properties: +- reg: MDIO Bus number for the MDIO interface +- #address-cells: must be 1 +- #size-cells: must be 0 + +Required PHY properties: +- compatible: should be "brcm,ns2-pcie-phy" +- reg: MDIO Phy ID for the MDIO interface +- #phy-cells: must be 0 + +This is a child bus node of "brcm,mdio-mux-iproc" node. + +Example: + +mdio@0 { + reg = <0x0>; + #address-cells = <1>; + #size-cells = <0>; + + pci_phy0: pci-phy@0 { + compatible = "brcm,ns2-pcie-phy"; + reg = <0x0>; + #phy-cells = <0>; + }; +}; diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index d58ff8467953..aa15b9ee2e70 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -31,6 +31,7 @@ This file contains 4.2.4 Broadcast Manager message sequence transmission 4.2.5 Broadcast Manager receive filter timers 4.2.6 Broadcast Manager multiplex message receive filter + 4.2.7 Broadcast Manager CAN FD support 4.3 connected transport protocols (SOCK_SEQPACKET) 4.4 unconnected transport protocols (SOCK_DGRAM) @@ -799,7 +800,7 @@ solution for a couple of reasons: } mytxmsg; (..) - mytxmsg.nframes = 4; + mytxmsg.msg_head.nframes = 4; (..) write(s, &mytxmsg, sizeof(mytxmsg)); @@ -852,6 +853,28 @@ solution for a couple of reasons: write(s, &msg, sizeof(msg)); + 4.2.7 Broadcast Manager CAN FD support + + The programming API of the CAN_BCM depends on struct can_frame which is + given as array directly behind the bcm_msg_head structure. To follow this + schema for the CAN FD frames a new flag 'CAN_FD_FRAME' in the bcm_msg_head + flags indicates that the concatenated CAN frame structures behind the + bcm_msg_head are defined as struct canfd_frame. + + struct { + struct bcm_msg_head msg_head; + struct canfd_frame frame[5]; + } msg; + + msg.msg_head.opcode = RX_SETUP; + msg.msg_head.can_id = 0x42; + msg.msg_head.flags = CAN_FD_FRAME; + msg.msg_head.nframes = 5; + (..) + + When using CAN FD frames for multiplex filtering the MUX mask is still + expected in the first 64 bit of the struct canfd_frame data section. + 4.3 connected transport protocols (SOCK_SEQPACKET) 4.4 unconnected transport protocols (SOCK_DGRAM) diff --git a/Documentation/networking/gen_stats.txt b/Documentation/networking/gen_stats.txt index ff630a87b511..179b18ce45ff 100644 --- a/Documentation/networking/gen_stats.txt +++ b/Documentation/networking/gen_stats.txt @@ -21,7 +21,7 @@ struct mystruct { ... }; -Update statistics: +Update statistics, in dequeue() methods only, (while owning qdisc->running) mystruct->tstats.packet++; mystruct->qstats.backlog += skb->pkt_len; diff --git a/Documentation/networking/nf_conntrack-sysctl.txt b/Documentation/networking/nf_conntrack-sysctl.txt index f55599c62c9d..4fb51d32fccc 100644 --- a/Documentation/networking/nf_conntrack-sysctl.txt +++ b/Documentation/networking/nf_conntrack-sysctl.txt @@ -7,12 +7,13 @@ nf_conntrack_acct - BOOLEAN Enable connection tracking flow accounting. 64-bit byte and packet counters per flow are added. -nf_conntrack_buckets - INTEGER (read-only) +nf_conntrack_buckets - INTEGER Size of hash table. If not specified as parameter during module loading, the default size is calculated by dividing total memory by 16384 to determine the number of buckets but the hash table will never have fewer than 32 and limited to 16384 buckets. For systems with more than 4GB of memory it will be 65536 buckets. + This sysctl is only writeable in the initial net namespace. nf_conntrack_checksum - BOOLEAN 0 - disabled diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt index 9d219d856d46..0235ae69af2a 100644 --- a/Documentation/networking/rds.txt +++ b/Documentation/networking/rds.txt @@ -85,7 +85,8 @@ Socket Interface bind(fd, &sockaddr_in, ...) This binds the socket to a local IP address and port, and a - transport. + transport, if one has not already been selected via the + SO_RDS_TRANSPORT socket option sendmsg(fd, ...) Sends a message to the indicated recipient. The kernel will @@ -146,6 +147,20 @@ Socket Interface operation. In this case, it would use RDS_CANCEL_SENT_TO to nuke any pending messages. + setsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..) + getsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..) + Set or read an integer defining the underlying + encapsulating transport to be used for RDS packets on the + socket. When setting the option, integer argument may be + one of RDS_TRANS_TCP or RDS_TRANS_IB. When retrieving the + value, RDS_TRANS_NONE will be returned on an unbound socket. + This socket option may only be set exactly once on the socket, + prior to binding it via the bind(2) system call. Attempts to + set SO_RDS_TRANSPORT on a socket for which the transport has + been previously attached explicitly (by SO_RDS_TRANSPORT) or + implicitly (via bind(2)) will return an error of EOPNOTSUPP. + An attempt to set SO_RDS_TRANSPPORT to RDS_TRANS_NONE will + always return EINVAL. RDMA for RDS ============ @@ -350,4 +365,59 @@ The recv path handle CMSGs return to application +Multipath RDS (mprds) +===================== + Mprds is multipathed-RDS, primarily intended for RDS-over-TCP + (though the concept can be extended to other transports). The classical + implementation of RDS-over-TCP is implemented by demultiplexing multiple + PF_RDS sockets between any 2 endpoints (where endpoint == [IP address, + port]) over a single TCP socket between the 2 IP addresses involved. This + has the limitation that it ends up funneling multiple RDS flows over a + single TCP flow, thus it is + (a) upper-bounded to the single-flow bandwidth, + (b) suffers from head-of-line blocking for all the RDS sockets. + + Better throughput (for a fixed small packet size, MTU) can be achieved + by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed + RDS (mprds). Each such TCP/IP flow constitutes a path for the rds/tcp + connection. RDS sockets will be attached to a path based on some hash + (e.g., of local address and RDS port number) and packets for that RDS + socket will be sent over the attached path using TCP to segment/reassemble + RDS datagrams on that path. + + Multipathed RDS is implemented by splitting the struct rds_connection into + a common (to all paths) part, and a per-path struct rds_conn_path. All + I/O workqs and reconnect threads are driven from the rds_conn_path. + Transports such as TCP that are multipath capable may then set up a + TPC socket per rds_conn_path, and this is managed by the transport via + the transport privatee cp_transport_data pointer. + + Transports announce themselves as multipath capable by setting the + t_mp_capable bit during registration with the rds core module. When the + transport is multipath-capable, rds_sendmsg() hashes outgoing traffic + across multiple paths. The outgoing hash is computed based on the + local address and port that the PF_RDS socket is bound to. + + Additionally, even if the transport is MP capable, we may be + peering with some node that does not support mprds, or supports + a different number of paths. As a result, the peering nodes need + to agree on the number of paths to be used for the connection. + This is done by sending out a control packet exchange before the + first data packet. The control packet exchange must have completed + prior to outgoing hash completion in rds_sendmsg() when the transport + is mutlipath capable. + + The control packet is an RDS ping packet (i.e., packet to rds dest + port 0) with the ping packet having a rds extension header option of + type RDS_EXTHDR_NPATHS, length 2 bytes, and the value is the + number of paths supported by the sender. The "probe" ping packet will + get sent from some reserved port, RDS_FLAG_PROBE_PORT (in <linux/rds.h>) + The receiver of a ping from RDS_FLAG_PROBE_PORT will thus immediately + be able to compute the min(sender_paths, rcvr_paths). The pong + sent in response to a probe-ping should contain the rcvr's npaths + when the rcvr is mprds-capable. + + If the rcvr is not mprds-capable, the exthdr in the ping will be + ignored. In this case the pong will not have any exthdrs, so the sender + of the probe-ping can default to single-path mprds. diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 671fe3dd56d3..e226f8925c9e 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -285,6 +285,7 @@ Please see the following document: o mmc_core.c/mmc.h: Management MAC Counters; o stmmac_hwtstamp.c: HW timestamp support for PTP; o stmmac_ptp.c: PTP 1588 clock; + o stmmac_pcs.h: Physical Coding Sublayer common implementation; o dwmac-<XXX>.c: these are for the platform glue-logic file; e.g. dwmac-sti.c for STMicroelectronics SoCs. diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt index 5da679c573d2..755dab856392 100644 --- a/Documentation/networking/vrf.txt +++ b/Documentation/networking/vrf.txt @@ -15,9 +15,9 @@ the use of higher priority ip rules (Policy Based Routing, PBR) to take precedence over the VRF device rules directing specific traffic as desired. In addition, VRF devices allow VRFs to be nested within namespaces. For -example network namespaces provide separation of network interfaces at L1 -(Layer 1 separation), VLANs on the interfaces within a namespace provide -L2 separation and then VRF devices provide L3 separation. +example network namespaces provide separation of network interfaces at the +device layer, VLANs on the interfaces within a namespace provide L2 separation +and then VRF devices provide L3 separation. Design ------ @@ -37,21 +37,22 @@ are then enslaved to a VRF device: +------+ +------+ Packets received on an enslaved device and are switched to the VRF device -using an rx_handler which gives the impression that packets flow through -the VRF device. Similarly on egress routing rules are used to send packets -to the VRF device driver before getting sent out the actual interface. This -allows tcpdump on a VRF device to capture all packets into and out of the -VRF as a whole.[1] Similarly, netfilter [2] and tc rules can be applied -using the VRF device to specify rules that apply to the VRF domain as a whole. +in the IPv4 and IPv6 processing stacks giving the impression that packets +flow through the VRF device. Similarly on egress routing rules are used to +send packets to the VRF device driver before getting sent out the actual +interface. This allows tcpdump on a VRF device to capture all packets into +and out of the VRF as a whole.[1] Similarly, netfilter[2] and tc rules can be +applied using the VRF device to specify rules that apply to the VRF domain +as a whole. [1] Packets in the forwarded state do not flow through the device, so those packets are not seen by tcpdump. Will revisit this limitation in a future release. -[2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev - set to real ingress device and egress is limited to NF_INET_POST_ROUTING. - Will revisit this limitation in a future release. - +[2] Iptables on ingress supports PREROUTING with skb->dev set to the real + ingress device and both INPUT and PREROUTING rules with skb->dev set to + the VRF device. For egress POSTROUTING and OUTPUT rules can be written + using either the VRF device or real egress device. Setup ----- @@ -59,23 +60,33 @@ Setup e.g, ip link add vrf-blue type vrf table 10 ip link set dev vrf-blue up -2. Rules are added that send lookups to the associated FIB table when the - iif or oif is the VRF device. e.g., +2. An l3mdev FIB rule directs lookups to the table associated with the device. + A single l3mdev rule is sufficient for all VRFs. The VRF device adds the + l3mdev rule for IPv4 and IPv6 when the first device is created with a + default preference of 1000. Users may delete the rule if desired and add + with a different priority or install per-VRF rules. + + Prior to the v4.8 kernel iif and oif rules are needed for each VRF device: ip ru add oif vrf-blue table 10 ip ru add iif vrf-blue table 10 - Set the default route for the table (and hence default route for the VRF). - e.g, ip route add table 10 prohibit default +3. Set the default route for the table (and hence default route for the VRF). + ip route add table 10 unreachable default -3. Enslave L3 interfaces to a VRF device. - e.g, ip link set dev eth1 master vrf-blue +4. Enslave L3 interfaces to a VRF device. + ip link set dev eth1 master vrf-blue Local and connected routes for enslaved devices are automatically moved to the table associated with VRF device. Any additional routes depending on - the enslaved device will need to be reinserted following the enslavement. + the enslaved device are dropped and will need to be reinserted to the VRF + FIB table following the enslavement. + + The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global + addresses as VRF enslavement changes. + sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 -4. Additional VRF routes are added to associated table. - e.g., ip route add table 10 ... +5. Additional VRF routes are added to associated table. + ip route add table 10 ... Applications @@ -87,39 +98,34 @@ VRF device: or to specify the output device using cmsg and IP_PKTINFO. +TCP services running in the default VRF context (ie., not bound to any VRF +device) can work across all VRF domains by enabling the tcp_l3mdev_accept +sysctl option: + sysctl -w net.ipv4.tcp_l3mdev_accept=1 -Limitations ------------ -Index of original ingress interface is not available via cmsg. Will address -soon. +netfilter rules on the VRF device can be used to limit access to services +running in the default VRF context as well. + +The default VRF does not have limited scope with respect to port bindings. +That is, if a process does a wildcard bind to a port in the default VRF it +owns the port across all VRF domains within the network namespace. ################################################################################ Using iproute2 for VRFs ======================= -VRF devices do *not* have to start with 'vrf-'. That is a convention used here -for emphasis of the device type, similar to use of 'br' in bridge names. +iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this +section lists both commands where appropriate -- with the vrf keyword and the +older form without it. 1. Create a VRF To instantiate a VRF device and associate it with a table: $ ip link add dev NAME type vrf table ID - Remember to add the ip rules as well: - $ ip ru add oif NAME table 10 - $ ip ru add iif NAME table 10 - $ ip -6 ru add oif NAME table 10 - $ ip -6 ru add iif NAME table 10 - - Without the rules route lookups are not directed to the table. - - For example: - $ ip link add dev vrf-blue type vrf table 10 - $ ip ru add pref 200 oif vrf-blue table 10 - $ ip ru add pref 200 iif vrf-blue table 10 - $ ip -6 ru add pref 200 oif vrf-blue table 10 - $ ip -6 ru add pref 200 iif vrf-blue table 10 - + As of v4.8 the kernel supports the l3mdev FIB rule where a single rule + covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first + device create. 2. List VRFs @@ -129,16 +135,16 @@ for emphasis of the device type, similar to use of 'br' in bridge names. For example: $ ip -d link show type vrf - 11: vrf-mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + 11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0 vrf table 1 addrgenmode eui64 - 12: vrf-red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + 12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0 vrf table 10 addrgenmode eui64 - 13: vrf-blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + 13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0 vrf table 66 addrgenmode eui64 - 14: vrf-green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + 14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 vrf table 81 addrgenmode eui64 @@ -146,43 +152,44 @@ for emphasis of the device type, similar to use of 'br' in bridge names. Or in brief output: $ ip -br link show type vrf - vrf-mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP> - vrf-red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP> - vrf-blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP> - vrf-green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP> + mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP> + red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP> + blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP> + green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP> 3. Assign a Network Interface to a VRF Network interfaces are assigned to a VRF by enslaving the netdevice to a VRF device: - $ ip link set dev NAME master VRF-NAME + $ ip link set dev NAME master NAME On enslavement connected and local routes are automatically moved to the table associated with the VRF device. For example: - $ ip link set dev eth0 master vrf-mgmt + $ ip link set dev eth0 master mgmt 4. Show Devices Assigned to a VRF To show devices that have been assigned to a specific VRF add the master option to the ip command: - $ ip link show master VRF-NAME + $ ip link show vrf NAME + $ ip link show master NAME For example: - $ ip link show master vrf-red - 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP mode DEFAULT group default qlen 1000 + $ ip link show vrf red + 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000 link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff - 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP mode DEFAULT group default qlen 1000 + 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000 link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff - 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master vrf-red state DOWN mode DEFAULT group default qlen 1000 + 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000 link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff Or using the brief output: - $ ip -br link show master vrf-red + $ ip -br link show vrf red eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP> eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST> @@ -192,26 +199,28 @@ for emphasis of the device type, similar to use of 'br' in bridge names. To list neighbor entries associated with devices enslaved to a VRF device add the master option to the ip command: - $ ip [-6] neigh show master VRF-NAME + $ ip [-6] neigh show vrf NAME + $ ip [-6] neigh show master NAME For example: - $ ip neigh show master vrf-red + $ ip neigh show vrf red 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE - $ ip -6 neigh show master vrf-red - 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE + $ ip -6 neigh show vrf red + 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE 6. Show Addresses for a VRF To show addresses for interfaces associated with a VRF add the master option to the ip command: - $ ip addr show master VRF-NAME + $ ip addr show vrf NAME + $ ip addr show master NAME For example: - $ ip addr show master vrf-red - 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP group default qlen 1000 + $ ip addr show vrf red + 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1 valid_lft forever preferred_lft forever @@ -219,7 +228,7 @@ for emphasis of the device type, similar to use of 'br' in bridge names. valid_lft forever preferred_lft forever inet6 fe80::ff:fe00:202/64 scope link valid_lft forever preferred_lft forever - 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP group default qlen 1000 + 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2 valid_lft forever preferred_lft forever @@ -227,11 +236,11 @@ for emphasis of the device type, similar to use of 'br' in bridge names. valid_lft forever preferred_lft forever inet6 fe80::ff:fe00:203/64 scope link valid_lft forever preferred_lft forever - 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master vrf-red state DOWN group default qlen 1000 + 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000 link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff Or in brief format: - $ ip -br addr show master vrf-red + $ ip -br addr show vrf red eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64 eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64 eth5 DOWN @@ -241,10 +250,11 @@ for emphasis of the device type, similar to use of 'br' in bridge names. To show routes for a VRF use the ip command to display the table associated with the VRF device: + $ ip [-6] route show vrf NAME $ ip [-6] route show table ID For example: - $ ip route show table vrf-red + $ ip route show vrf red prohibit default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 @@ -255,7 +265,7 @@ for emphasis of the device type, similar to use of 'br' in bridge names. local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2 broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2 - $ ip -6 route show table vrf-red + $ ip -6 route show vrf red local 2002:1:: dev lo proto none metric 0 pref medium local 2002:1::2 dev lo proto none metric 0 pref medium 2002:1::/120 dev eth1 proto kernel metric 256 pref medium @@ -268,23 +278,24 @@ for emphasis of the device type, similar to use of 'br' in bridge names. local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium fe80::/64 dev eth1 proto kernel metric 256 pref medium fe80::/64 dev eth2 proto kernel metric 256 pref medium - ff00::/8 dev vrf-red metric 256 pref medium + ff00::/8 dev red metric 256 pref medium ff00::/8 dev eth1 metric 256 pref medium ff00::/8 dev eth2 metric 256 pref medium 8. Route Lookup for a VRF - A test route lookup can be done for a VRF by adding the oif option to ip: - $ ip [-6] route get oif VRF-NAME ADDRESS + A test route lookup can be done for a VRF: + $ ip [-6] route get vrf NAME ADDRESS + $ ip [-6] route get oif NAME ADDRESS For example: - $ ip route get 10.2.1.40 oif vrf-red - 10.2.1.40 dev eth1 table vrf-red src 10.2.1.2 + $ ip route get 10.2.1.40 vrf red + 10.2.1.40 dev eth1 table red src 10.2.1.2 cache - $ ip -6 route get 2002:1::32 oif vrf-red - 2002:1::32 from :: dev eth1 table vrf-red proto kernel src 2002:1::2 metric 256 pref medium + $ ip -6 route get 2002:1::32 vrf red + 2002:1::32 from :: dev eth1 table red proto kernel src 2002:1::2 metric 256 pref medium 9. Removing Network Interface from a VRF @@ -303,46 +314,40 @@ for emphasis of the device type, similar to use of 'br' in bridge names. Commands used in this example: -cat >> /etc/iproute2/rt_tables <<EOF -1 vrf-mgmt -10 vrf-red -66 vrf-blue -81 vrf-green +cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF +1 mgmt +10 red +66 blue +81 green EOF function vrf_create { VRF=$1 TBID=$2 - # create VRF device - ip link add vrf-${VRF} type vrf table ${TBID} - # add rules that direct lookups to vrf table - ip ru add pref 200 oif vrf-${VRF} table ${TBID} - ip ru add pref 200 iif vrf-${VRF} table ${TBID} - ip -6 ru add pref 200 oif vrf-${VRF} table ${TBID} - ip -6 ru add pref 200 iif vrf-${VRF} table ${TBID} + # create VRF device + ip link add ${VRF} type vrf table ${TBID} if [ "${VRF}" != "mgmt" ]; then - ip route add table ${TBID} prohibit default + ip route add table ${TBID} unreachable default fi - ip link set dev vrf-${VRF} up - ip link set dev vrf-${VRF} state up + ip link set dev ${VRF} up } vrf_create mgmt 1 -ip link set dev eth0 master vrf-mgmt +ip link set dev eth0 master mgmt vrf_create red 10 -ip link set dev eth1 master vrf-red -ip link set dev eth2 master vrf-red -ip link set dev eth5 master vrf-red +ip link set dev eth1 master red +ip link set dev eth2 master red +ip link set dev eth5 master red vrf_create blue 66 -ip link set dev eth3 master vrf-blue +ip link set dev eth3 master blue vrf_create green 81 -ip link set dev eth4 master vrf-green +ip link set dev eth4 master green Interface addresses from /etc/network/interfaces: |