From a47aaa69de88913d1640c4bd28c67fad142c61a3 Mon Sep 17 00:00:00 2001 From: Raja Mani Date: Fri, 18 Mar 2016 11:44:21 +0200 Subject: dt: bindings: add new dt entry for pre calibration in qcom, ath10k.txt There two things done in this patch, 1) Existing device tree entry 'qcom,ath10k-calibration-data' carries not only calibration data, it carries board specific data too. So, make appropriate update in doc. 2) ipq4019 wifi needs new devie tree entry to carry calibration data alone (called pre cal data, it doesn't include any other info). Using 'qcom,ath10k-calibration-data' for ipq4019 would alter the purpose of it. Hence, add new device tree entry called 'qcom,ath10k-pre-calibration-data' to carry only pre calibration data. Signed-off-by: Raja Mani Acked-by: Rob Herring Signed-off-by: Kalle Valo --- .../bindings/net/wireless/qcom,ath10k.txt | 23 +++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt index 96aae6b4f736..74d7f0af209c 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -5,12 +5,18 @@ Required properties: * "qcom,ath10k" * "qcom,ipq4019-wifi" -PCI based devices uses compatible string "qcom,ath10k" and takes only -calibration data via "qcom,ath10k-calibration-data". Rest of the properties -are not applicable for PCI based devices. +PCI based devices uses compatible string "qcom,ath10k" and takes calibration +data along with board specific data via "qcom,ath10k-calibration-data". +Rest of the properties are not applicable for PCI based devices. AHB based devices (i.e. ipq4019) uses compatible string "qcom,ipq4019-wifi" -and also uses most of the properties defined in this doc. +and also uses most of the properties defined in this doc (except +"qcom,ath10k-calibration-data"). It uses "qcom,ath10k-pre-calibration-data" +to carry pre calibration data. + +In general, entry "qcom,ath10k-pre-calibration-data" and +"qcom,ath10k-calibration-data" conflict with each other and only one +can be provided per device. Optional properties: - reg: Address and length of the register set for the device. @@ -35,8 +41,11 @@ Optional properties: - qcom,msi_addr: MSI interrupt address. - qcom,msi_base: Base value to add before writing MSI data into MSI address register. -- qcom,ath10k-calibration-data : calibration data as an array, the - length can vary between hw versions +- qcom,ath10k-calibration-data : calibration data + board specific data + as an array, the length can vary between + hw versions. +- qcom,ath10k-pre-calibration-data : pre calibration data as an array, + the length can vary between hw versions. Example (to supply the calibration data alone): @@ -105,5 +114,5 @@ wifi0: wifi@a000000 { "legacy"; qcom,msi_addr = <0x0b006040>; qcom,msi_base = <0x40>; - qcom,ath10k-calibration-data = [ 01 02 03 ... ]; + qcom,ath10k-pre-calibration-data = [ 01 02 03 ... ]; }; -- cgit v1.2.3 From ee2ae1ed46251dcbdcc2c59b5e30f664ddfbacb1 Mon Sep 17 00:00:00 2001 From: Alexandre TORGUE Date: Fri, 1 Apr 2016 11:37:33 +0200 Subject: stmmac: add new DT platform entries for GMAC4 This is to support the snps,dwmac-4.00 and snps,dwmac-4.10a and related features on the platform driver. See binding doc for further details. Signed-off-by: Giuseppe Cavallaro Signed-off-by: Alexandre TORGUE Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/stmmac.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index 6605d19601c2..4d302db657c0 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -59,6 +59,8 @@ Optional properties: - snps,fb: fixed-burst - snps,mb: mixed-burst - snps,rb: rebuild INCRx Burst + - snps,tso: this enables the TSO feature otherwise it will be managed by + MAC HW capability register. - mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus. Examples: -- cgit v1.2.3 From 0b7a43d37633614113ac54af73c193862dff4e50 Mon Sep 17 00:00:00 2001 From: Alexandre TORGUE Date: Fri, 1 Apr 2016 11:37:35 +0200 Subject: Documentation: networking: update stmmac Update stmmac driver documentation according to new GMAC 4.x family. Signed-off-by: Alexandre TORGUE Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index d64a14714236..671fe3dd56d3 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -1,6 +1,6 @@ STMicroelectronics 10/100/1000 Synopsys Ethernet driver -Copyright (C) 2007-2014 STMicroelectronics Ltd +Copyright (C) 2007-2015 STMicroelectronics Ltd Author: Giuseppe Cavallaro This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers @@ -138,6 +138,8 @@ struct plat_stmmacenet_data { int (*init)(struct platform_device *pdev, void *priv); void (*exit)(struct platform_device *pdev, void *priv); void *bsp_priv; + int has_gmac4; + bool tso_en; }; Where: @@ -181,6 +183,8 @@ Where: registers. init/exit callbacks should not use or modify platform data. o bsp_priv: another private pointer. + o has_gmac4: uses GMAC4 core. + o tso_en: Enables TSO (TCP Segmentation Offload) feature. For MDIO bus The we have: @@ -278,6 +282,13 @@ Please see the following document: o stmmac_ethtool.c: to implement the ethtool support; o stmmac.h: private driver structure; o common.h: common definitions and VFTs; + o mmc_core.c/mmc.h: Management MAC Counters; + o stmmac_hwtstamp.c: HW timestamp support for PTP; + o stmmac_ptp.c: PTP 1588 clock; + o dwmac-.c: these are for the platform glue-logic file; e.g. dwmac-sti.c + for STMicroelectronics SoCs. + +- GMAC 3.x o descs.h: descriptor structure definitions; o dwmac1000_core.c: dwmac GiGa core functions; o dwmac1000_dma.c: dma functions for the GMAC chip; @@ -289,11 +300,32 @@ Please see the following document: o enh_desc.c: functions for handling enhanced descriptors; o norm_desc.c: functions for handling normal descriptors; o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes; - o mmc_core.c/mmc.h: Management MAC Counters; - o stmmac_hwtstamp.c: HW timestamp support for PTP; - o stmmac_ptp.c: PTP 1588 clock; - o dwmac-.c: these are for the platform glue-logic file; e.g. dwmac-sti.c - for STMicroelectronics SoCs. + +- GMAC4.x generation + o dwmac4_core.c: dwmac GMAC4.x core functions; + o dwmac4_desc.c: functions for handling GMAC4.x descriptors; + o dwmac4_descs.h: descriptor definitions; + o dwmac4_dma.c: dma functions for the GMAC4.x chip; + o dwmac4_dma.h: dma definitions for the GMAC4.x chip; + o dwmac4.h: core definitions for the GMAC4.x chip; + o dwmac4_lib.c: generic GMAC4.x functions; + +4.12) TSO support (GMAC4.x) + +TSO (Tcp Segmentation Offload) feature is supported by GMAC 4.x chip family. +When a packet is sent through TCP protocol, the TCP stack ensures that +the SKB provided to the low level driver (stmmac in our case) matches with +the maximum frame len (IP header + TCP header + payload <= 1500 bytes (for +MTU set to 1500)). It means that if an application using TCP want to send a +packet which will have a length (after adding headers) > 1514 the packet +will be split in several TCP packets: The data payload is split and headers +(TCP/IP ..) are added. It is done by software. + +When TSO is enabled, the TCP stack doesn't care about the maximum frame +length and provide SKB packet to stmmac as it is. The GMAC IP will have to +perform the segmentation by it self to match with maximum frame length. + +This feature can be enabled in device tree through "snps,tso" entry. 5) Debug Information -- cgit v1.2.3 From fd91e12f594b40fdb2dad530e8b895cc5c07db21 Mon Sep 17 00:00:00 2001 From: Soheil Hassas Yeganeh Date: Sat, 2 Apr 2016 23:08:13 -0400 Subject: sock: document timestamping via cmsg in Documentation Update docs and add code snippet for using cmsg for timestamping. Signed-off-by: Soheil Hassas Yeganeh Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/timestamping.txt | 48 +++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index a977339fbe0a..671cccf0dcd2 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -44,11 +44,17 @@ timeval of SO_TIMESTAMP (ms). Supports multiple types of timestamp requests. As a result, this socket option takes a bitmap of flags, not a boolean. In - err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val, &val); + err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val, + sizeof(val)); val is an integer with any of the following bits set. Setting other bit returns EINVAL and does not change the current state. +The socket option configures timestamp generation for individual +sk_buffs (1.3.1), timestamp reporting to the socket's error +queue (1.3.2) and options (1.3.3). Timestamp generation can also +be enabled for individual sendmsg calls using cmsg (1.3.4). + 1.3.1 Timestamp Generation @@ -71,13 +77,16 @@ SOF_TIMESTAMPING_RX_SOFTWARE: kernel receive stack. SOF_TIMESTAMPING_TX_HARDWARE: - Request tx timestamps generated by the network adapter. + Request tx timestamps generated by the network adapter. This flag + can be enabled via both socket options and control messages. SOF_TIMESTAMPING_TX_SOFTWARE: Request tx timestamps when data leaves the kernel. These timestamps are generated in the device driver as close as possible, but always prior to, passing the packet to the network interface. Hence, they require driver support and may not be available for all devices. + This flag can be enabled via both socket options and control messages. + SOF_TIMESTAMPING_TX_SCHED: Request tx timestamps prior to entering the packet scheduler. Kernel @@ -90,7 +99,8 @@ SOF_TIMESTAMPING_TX_SCHED: machines with virtual devices where a transmitted packet travels through multiple devices and, hence, multiple packet schedulers, a timestamp is generated at each layer. This allows for fine - grained measurement of queuing delay. + grained measurement of queuing delay. This flag can be enabled + via both socket options and control messages. SOF_TIMESTAMPING_TX_ACK: Request tx timestamps when all data in the send buffer has been @@ -99,6 +109,7 @@ SOF_TIMESTAMPING_TX_ACK: over-report measurement, because the timestamp is generated when all data up to and including the buffer at send() was acknowledged: the cumulative acknowledgment. The mechanism ignores SACK and FACK. + This flag can be enabled via both socket options and control messages. 1.3.2 Timestamp Reporting @@ -183,6 +194,37 @@ having access to the contents of the original packet, so cannot be combined with SOF_TIMESTAMPING_OPT_TSONLY. +1.3.4. Enabling timestamps via control messages + +In addition to socket options, timestamp generation can be requested +per write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1). +Using this feature, applications can sample timestamps per sendmsg() +without paying the overhead of enabling and disabling timestamps via +setsockopt: + + struct msghdr *msg; + ... + cmsg = CMSG_FIRSTHDR(msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SO_TIMESTAMPING; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED | + SOF_TIMESTAMPING_TX_SOFTWARE | + SOF_TIMESTAMPING_TX_ACK; + err = sendmsg(fd, msg, 0); + +The SOF_TIMESTAMPING_TX_* flags set via cmsg will override +the SOF_TIMESTAMPING_TX_* flags set via setsockopt. + +Moreover, applications must still enable timestamp reporting via +setsockopt to receive timestamps: + + __u32 val = SOF_TIMESTAMPING_SOFTWARE | + SOF_TIMESTAMPING_OPT_ID /* or any other flag */; + err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val, + sizeof(val)); + + 1.4 Bytestream Timestamps The SO_TIMESTAMPING interface supports timestamping of bytes in a -- cgit v1.2.3 From 646e76bb5daf4ca38438c69ffb72cccb605f3466 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Tue, 23 Feb 2016 15:43:35 +0100 Subject: mac80211: parse VHT info in injected frames Add VHT radiotap parsing support to ieee80211_parse_tx_radiotap(). That capability has been tested using a d-link dir-860l rev b1 running OpenWrt trunk and mt76 driver Signed-off-by: Lorenzo Bianconi Signed-off-by: Johannes Berg --- Documentation/networking/mac80211-injection.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt index ec8f934c2eb2..e0efcaf5b0ee 100644 --- a/Documentation/networking/mac80211-injection.txt +++ b/Documentation/networking/mac80211-injection.txt @@ -45,6 +45,19 @@ radiotap headers and used to control injection: number of retries when either IEEE80211_RADIOTAP_RATE or IEEE80211_RADIOTAP_MCS was used + * IEEE80211_RADIOTAP_VHT + + VHT mcs and number of streams used in the transmission (only for devices + without own rate control). Also other fields are parsed + + flags field + IEEE80211_RADIOTAP_VHT_FLAG_SGI: use short guard interval + + bandwidth field + 1: send using 40MHz channel width + 4: send using 80MHz channel width + 11: send using 160MHz channel width + The injection code can also skip all other currently defined radiotap fields facilitating replay of captured radiotap headers directly. -- cgit v1.2.3 From 5c05803a3e2054257d7e8e737a6efaf2c7f6b725 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 24 Feb 2016 16:25:48 +0100 Subject: mac80211: document only injected *_RADIOTAP_* flags Not the internal flags but the radiotap flags are parsed when the monitor injected frames are prepared for transmission. Thus the documentation should only document these. Reported-by: Lorenzo Bianconi Reported-by: Johannes Berg Fixes: dfdfc2beb0dd ("mac80211: Parse legacy and HT rate in injected frames") Signed-off-by: Sven Eckelmann Signed-off-by: Johannes Berg --- Documentation/networking/mac80211-injection.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt index e0efcaf5b0ee..d58d78df9ca2 100644 --- a/Documentation/networking/mac80211-injection.txt +++ b/Documentation/networking/mac80211-injection.txt @@ -37,8 +37,8 @@ radiotap headers and used to control injection: HT rate for the transmission (only for devices without own rate control). Also some flags are parsed - IEEE80211_TX_RC_SHORT_GI: use short guard interval - IEEE80211_TX_RC_40_MHZ_WIDTH: send in HT40 mode + IEEE80211_RADIOTAP_MCS_SGI: use short guard interval + IEEE80211_RADIOTAP_MCS_BW_40: send in HT40 mode * IEEE80211_RADIOTAP_DATA_RETRIES -- cgit v1.2.3 From 75f3a1018f0103025558caa60e24132a4cc9ce8f Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 5 Apr 2016 10:20:03 +0200 Subject: switchdev: Use switch ID in suggested udev rule Since there can be multiple switch ASICs on the same system we should use the switch ID in order to differentiate between them and set the switch name (e.g. swX) accordingly. Also, replace the order of the "Switch ID" and "Port Netdev Naming" sections following the above change. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller --- Documentation/networking/switchdev.txt | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index 2f659129694b..31c39115834d 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -89,6 +89,18 @@ Typically, the management port is not participating in offloaded data plane and is loaded with a different driver, such as a NIC driver, on the management port device. +Switch ID +^^^^^^^^^ + +The switchdev driver must implement the switchdev op switchdev_port_attr_get +for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same +physical ID for each port of a switch. The ID must be unique between switches +on the same system. The ID does not need to be unique between switches on +different systems. + +The switch ID is used to locate ports on a switch and to know if aggregated +ports belong to the same switch. + Port Netdev Naming ^^^^^^^^^^^^^^^^^^ @@ -104,25 +116,13 @@ external configuration. For example, if a physical 40G port is split logically into 4 10G ports, resulting in 4 port netdevs, the device can give a unique name for each port using port PHYS name. The udev rule would be: -SUBSYSTEM=="net", ACTION=="add", DRIVER="", ATTR{phys_port_name}!="", \ - NAME="$attr{phys_port_name}" +SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="", \ + ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}" Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y is the port name or ID, and Z is the sub-port name or ID. For example, sw1p1s0 would be sub-port 0 on port 1 on switch 1. -Switch ID -^^^^^^^^^ - -The switchdev driver must implement the switchdev op switchdev_port_attr_get -for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same -physical ID for each port of a switch. The ID must be unique between switches -on the same system. The ID does not need to be unique between switches on -different systems. - -The switch ID is used to locate ports on a switch and to know if aggregated -ports belong to the same switch. - Port Features ^^^^^^^^^^^^^ -- cgit v1.2.3 From f453939c1a4a758312f799748b344bacd1db701f Mon Sep 17 00:00:00 2001 From: Vivien Didelot Date: Wed, 6 Apr 2016 11:06:20 -0400 Subject: net: dsa: document missing functions Add description for the missing port_vlan_prepare, port_fdb_prepare, port_fdb_dump functions in the DSA documentation. Signed-off-by: Vivien Didelot Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- Documentation/networking/dsa/dsa.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index 3b196c304b73..013b67066b82 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -542,6 +542,12 @@ Bridge layer Bridge VLAN filtering --------------------- +- port_vlan_prepare: bridge layer function invoked when the bridge prepares the + configuration of a VLAN on the given port. If the operation is not supported + by the hardware, this function should return -EOPNOTSUPP to inform the bridge + code to fallback to a software implementation. No hardware setup must be done + in this function. See port_vlan_add for this and details. + - port_vlan_add: bridge layer function invoked when a VLAN is configured (tagged or untagged) for the given switch port @@ -552,6 +558,12 @@ Bridge VLAN filtering function that the driver has to call for each VLAN the given port is a member of. A switchdev object is used to carry the VID and bridge flags. +- port_fdb_prepare: bridge layer function invoked when the bridge prepares the + installation of a Forwarding Database entry. If the operation is not + supported, this function should return -EOPNOTSUPP to inform the bridge code + to fallback to a software implementation. No hardware setup must be done in + this function. See port_fdb_add for this and details. + - port_fdb_add: bridge layer function invoked when the bridge wants to install a Forwarding Database entry, the switch hardware should be programmed with the specified address in the specified VLAN Id in the forwarding database @@ -565,6 +577,10 @@ of DSA, would be the its port-based VLAN, used by the associated bridge device. the specified MAC address from the specified VLAN ID if it was mapped into this port forwarding database +- port_fdb_dump: bridge layer function invoked with a switchdev callback + function that the driver has to call for each MAC address known to be behind + the given port. A switchdev object is used to carry the VID and FDB info. + TODO ==== -- cgit v1.2.3 From 43c44a9f655170fb92536167b95b1c6ae8b732cb Mon Sep 17 00:00:00 2001 From: Vivien Didelot Date: Wed, 6 Apr 2016 11:55:03 -0400 Subject: net: dsa: make the STP state function return void The DSA layer doesn't care about the return code of the port_stp_update routine, so make it void in the layer and the DSA drivers. Replace the useless dsa_slave_stp_update function with a dsa_slave_stp_state function used to reply to the switchdev SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute. In the meantime, rename port_stp_update to port_stp_state_set to explicit the state change. Signed-off-by: Vivien Didelot Signed-off-by: David S. Miller --- Documentation/networking/dsa/dsa.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index 013b67066b82..ba698c56919d 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -533,7 +533,7 @@ Bridge layer out at the switch hardware for the switch to (re) learn MAC addresses behind this port. -- port_stp_update: bridge layer function invoked when a given switch port STP +- port_stp_state_set: bridge layer function invoked when a given switch port STP state is computed by the bridge layer and should be propagated to switch hardware to forward/block/learn traffic. The switch driver is responsible for computing a STP state change based on current and asked parameters and perform -- cgit v1.2.3 From a6db4494d218c2e559173661ee972e048dc04fdd Mon Sep 17 00:00:00 2001 From: David Ahern Date: Thu, 7 Apr 2016 07:21:00 -0700 Subject: net: ipv4: Consider failed nexthops in multipath routes Multipath route lookups should consider knowledge about next hops and not select a hop that is known to be failed. Example: [h2] [h3] 15.0.0.5 | | 3| 3| [SP1] [SP2]--+ 1 2 1 2 | | /-------------+ | | \ / | | X | | / \ | | / \---------------\ | 1 2 1 2 12.0.0.2 [TOR1] 3-----------------3 [TOR2] 12.0.0.3 4 4 \ / \ / \ / -------| |-----/ 1 2 [TOR3] 3| | [h1] 12.0.0.1 host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5: root@h1:~# ip ro ls ... 12.0.0.0/24 dev swp1 proto kernel scope link src 12.0.0.1 15.0.0.0/16 nexthop via 12.0.0.2 dev swp1 weight 1 nexthop via 12.0.0.3 dev swp1 weight 1 ... If the link between tor3 and tor1 is down and the link between tor1 and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and ssh 15.0.0.5 gets the other. Connections that attempt to use the 12.0.0.2 nexthop fail since that neighbor is not reachable: root@h1:~# ip neigh show ... 12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE 12.0.0.2 dev swp1 FAILED ... The failed path can be avoided by considering known neighbor information when selecting next hops. If the neighbor lookup fails we have no knowledge about the nexthop, so give it a shot. If there is an entry then only select the nexthop if the state is sane. This is similar to what fib_detect_death does. To maintain backward compatibility use of the neighbor information is based on a new sysctl, fib_multipath_use_neigh. Signed-off-by: David Ahern Reviewed-by: Julian Anastasov Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index b183e2b606c8..6c7f365b1515 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN fwmark of the packet they are replying to. Default: 0 +fib_multipath_use_neigh - BOOLEAN + Use status of existing neighbor entry when determining nexthop for + multipath routes. If disabled, neighbor information is not used and + packets could be directed to a failed nexthop. Only valid for kernels + built with CONFIG_IP_ROUTE_MULTIPATH enabled. + Default: 0 (disabled) + Possible values: + 0 - disabled + 1 - enabled + route/max_size - INTEGER Maximum number of routes allowed in the kernel. Increase this when using large numbers of interfaces and/or routes. -- cgit v1.2.3 From 57fbcce37be7c1d2622b56587c10ade00e96afa3 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 12 Apr 2016 15:56:15 +0200 Subject: cfg80211: remove enum ieee80211_band This enum is already perfectly aliased to enum nl80211_band, and the only reason for it is that we get IEEE80211_NUM_BANDS out of it. There's no really good reason to not declare the number of bands in nl80211 though, so do that and remove the cfg80211 one. Signed-off-by: Johannes Berg --- Documentation/DocBook/80211.tmpl | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl index f9b9ad7894f5..f2a312b35875 100644 --- a/Documentation/DocBook/80211.tmpl +++ b/Documentation/DocBook/80211.tmpl @@ -75,7 +75,6 @@ Device registration !Pinclude/net/cfg80211.h Device registration -!Finclude/net/cfg80211.h ieee80211_band !Finclude/net/cfg80211.h ieee80211_channel_flags !Finclude/net/cfg80211.h ieee80211_channel !Finclude/net/cfg80211.h ieee80211_rate_flags -- cgit v1.2.3 From bf91795e4a77eb75602702e4c4d9b98b155039e9 Mon Sep 17 00:00:00 2001 From: Masanari Iida Date: Sat, 9 Apr 2016 00:00:25 +0900 Subject: Doc: networking: Fix typo in dsa This patch fix typos in Documentation/networking/dsa. Signed-off-by: Masanari Iida Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- Documentation/networking/dsa/bcm_sf2.txt | 2 +- Documentation/networking/dsa/dsa.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/dsa/bcm_sf2.txt b/Documentation/networking/dsa/bcm_sf2.txt index d999d0c1c5b8..eba3a2431e91 100644 --- a/Documentation/networking/dsa/bcm_sf2.txt +++ b/Documentation/networking/dsa/bcm_sf2.txt @@ -38,7 +38,7 @@ Implementation details ====================== The driver is located in drivers/net/dsa/bcm_sf2.c and is implemented as a DSA -driver; see Documentation/networking/dsa/dsa.txt for details on the subsytem +driver; see Documentation/networking/dsa/dsa.txt for details on the subsystem and what it provides. The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index ba698c56919d..631b0f7ae16f 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -334,7 +334,7 @@ more specifically with its VLAN filtering portion when configuring VLANs on top of per-port slave network devices. Since DSA primarily deals with MDIO-connected switches, although not exclusively, SWITCHDEV's prepare/abort/commit phases are often simplified into a prepare phase which -checks whether the operation is supporte by the DSA switch driver, and a commit +checks whether the operation is supported by the DSA switch driver, and a commit phase which applies the changes. As of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN -- cgit v1.2.3 From f7a6272bf3cbd2576165dba020e0329c9ca67c1f Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Sun, 10 Apr 2016 21:45:09 -0400 Subject: Documentation: Add documentation for TSO and GSO features This document is a starting point for defining the TSO and GSO features. The whole thing is starting to get a bit messy so I wanted to make sure we have notes somwhere to start describing what does and doesn't work. Signed-off-by: Alexander Duyck Signed-off-by: David S. Miller --- Documentation/networking/segmentation-offloads.txt | 130 +++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 Documentation/networking/segmentation-offloads.txt (limited to 'Documentation') diff --git a/Documentation/networking/segmentation-offloads.txt b/Documentation/networking/segmentation-offloads.txt new file mode 100644 index 000000000000..f200467ade38 --- /dev/null +++ b/Documentation/networking/segmentation-offloads.txt @@ -0,0 +1,130 @@ +Segmentation Offloads in the Linux Networking Stack + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack +to take advantage of segmentation offload capabilities of various NICs. + +The following technologies are described: + * TCP Segmentation Offload - TSO + * UDP Fragmentation Offload - UFO + * IPIP, SIT, GRE, and UDP Tunnel Offloads + * Generic Segmentation Offload - GSO + * Generic Receive Offload - GRO + * Partial Generic Segmentation Offload - GSO_PARTIAL + +TCP Segmentation Offload +======================== + +TCP segmentation allows a device to segment a single frame into multiple +frames with a data payload size specified in skb_shinfo()->gso_size. +When TCP segmentation requested the bit for either SKB_GSO_TCP or +SKB_GSO_TCP6 should be set in skb_shinfo()->gso_type and +skb_shinfo()->gso_size should be set to a non-zero value. + +TCP segmentation is dependent on support for the use of partial checksum +offload. For this reason TSO is normally disabled if the Tx checksum +offload for a given device is disabled. + +In order to support TCP segmentation offload it is necessary to populate +the network and transport header offsets of the skbuff so that the device +drivers will be able determine the offsets of the IP or IPv6 header and the +TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should +also point to the TCP header of the packet. + +For IPv4 segmentation we support one of two types in terms of the IP ID. +The default behavior is to increment the IP ID with every segment. If the +GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP +ID and all segments will use the same IP ID. If a device has +NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO +and we will either increment the IP ID for all frames, or leave it at a +static value based on driver preference. + +UDP Fragmentation Offload +========================= + +UDP fragmentation offload allows a device to fragment an oversized UDP +datagram into multiple IPv4 fragments. Many of the requirements for UDP +fragmentation offload are the same as TSO. However the IPv4 ID for +fragments should not increment as a single IPv4 datagram is fragmented. + +IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads +======================================================== + +In addition to the offloads described above it is possible for a frame to +contain additional headers such as an outer tunnel. In order to account +for such instances an additional set of segmentation offload types were +introduced including SKB_GSO_IPIP, SKB_GSO_SIT, SKB_GSO_GRE, and +SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify +cases where there are more than just 1 set of headers. For example in the +case of IPIP and SIT we should have the network and transport headers moved +from the standard list of headers to "inner" header offsets. + +Currently only two levels of headers are supported. The convention is to +refer to the tunnel headers as the outer headers, while the encapsulated +data is normally referred to as the inner headers. Below is the list of +calls to access the given headers: + +IPIP/SIT Tunnel: + Outer Inner +MAC skb_mac_header +Network skb_network_header skb_inner_network_header +Transport skb_transport_header + +UDP/GRE Tunnel: + Outer Inner +MAC skb_mac_header skb_inner_mac_header +Network skb_network_header skb_inner_network_header +Transport skb_transport_header skb_inner_transport_header + +In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and +SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the +fact that the outer header also requests to have a non-zero checksum +included in the outer header. + +Finally there is SKB_GSO_REMCSUM which indicates that a given tunnel header +has requested a remote checksum offload. In this case the inner headers +will be left with a partial checksum and only the outer header checksum +will be computed. + +Generic Segmentation Offload +============================ + +Generic segmentation offload is a pure software offload that is meant to +deal with cases where device drivers cannot perform the offloads described +above. What occurs in GSO is that a given skbuff will have its data broken +out over multiple skbuffs that have been resized to match the MSS provided +via skb_shinfo()->gso_size. + +Before enabling any hardware segmentation offload a corresponding software +offload is required in GSO. Otherwise it becomes possible for a frame to +be re-routed between devices and end up being unable to be transmitted. + +Generic Receive Offload +======================= + +Generic receive offload is the complement to GSO. Ideally any frame +assembled by GRO should be segmented to create an identical sequence of +frames using GSO, and any sequence of frames segmented by GSO should be +able to be reassembled back to the original by GRO. The only exception to +this is IPv4 ID in the case that the DF bit is set for a given IP header. +If the value of the IPv4 ID is not sequentially incrementing it will be +altered so that it is when a frame assembled via GRO is segmented via GSO. + +Partial Generic Segmentation Offload +==================================== + +Partial generic segmentation offload is a hybrid between TSO and GSO. What +it effectively does is take advantage of certain traits of TCP and tunnels +so that instead of having to rewrite the packet headers for each segment +only the inner-most transport header and possibly the outer-most network +header need to be updated. This allows devices that do not support tunnel +offloads or tunnel offloads with checksum to still make use of segmentation. + +With the partial offload what occurs is that all headers excluding the +inner transport header are updated such that they will contain the correct +values for if the header was simply duplicated. The one exception to this +is the outer IPv4 ID field. It is up to the device drivers to guarantee +that the IPv4 ID field is incremented in the case that a given header does +not have the DF bit set. -- cgit v1.2.3 From 2fc695a1bb00141a0f5df74e7a19e125f4babaa5 Mon Sep 17 00:00:00 2001 From: "Yisen.Zhuang\\(Zhuangyuzeng\\)" Date: Sat, 23 Apr 2016 17:05:15 +0800 Subject: Documentation: Bindings: Update DT binding for separating dsaf dev support Because debug dsaf port was separated from service dsaf port, this patch updates the related information of DT binding. Signed-off-by: Yisen Zhuang Signed-off-by: David S. Miller --- .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 59 ++++++++++++++++++---- 1 file changed, 49 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt index ecacfa44b1eb..5ccd4f002a67 100644 --- a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt @@ -7,19 +7,47 @@ Required properties: - mode: dsa fabric mode string. only support one of dsaf modes like these: "2port-64vf", "6port-16rss", - "6port-16vf". + "6port-16vf", + "single-port". - interrupt-parent: the interrupt parent of this device. - interrupts: should contain the DSA Fabric and rcb interrupt. - reg: specifies base physical address(es) and size of the device registers. - The first region is external interface control register base and size. - The second region is SerDes base register and size. + The first region is external interface control register base and size(optional, + only be used when subctrl-syscon is not exists). It is recommended using + subctrl-syscon rather than this address. + The second region is SerDes base register and size(optional, only be used when + serdes-syscon in port node is not exists. It is recommended using + serdes-syscon rather than this address. The third region is the PPE register base and size. - The fourth region is dsa fabric base register and size. - The fifth region is cpld base register and size, it is not required if do not use cpld. -- phy-handle: phy handle of physicl port, 0 if not any phy device. see ethernet.txt [1]. + The fourth region is dsa fabric base register and size. It is not required for + single-port mode. +- reg-names: may be ppe-base and(or) dsaf-base. It is used to find the + corresponding reg's index. + +- phy-handle: phy handle of physicl port, 0 if not any phy device. It is optional + attribute. If port node is exists, phy-handle in each port node will be used. + see ethernet.txt [1]. +- subctrl-syscon: is syscon handle for external interface control register. +- reset-field-offset: is offset of reset field. Its value depends on the hardware + user manual. - buf-size: rx buffer size, should be 16-1024. - desc-num: number of description in TX and RX queue, should be 512, 1024, 2048 or 4096. +- port: subnodes of dsaf. A dsaf node may contain several port nodes(Depending + on mode of dsaf). Port node contain some attributes listed below: +- port-id: is physical port index in one dsaf. +- phy-handle: phy handle of physicl port. It is not required if there isn't + phy device. see ethernet.txt [1]. +- serdes-syscon: is syscon handle for SerDes register. +- cpld-syscon: is syscon handle for cpld register. It is not required if there + isn't cpld device. +- cpld-ctrl-reg: is cpld register offset. It is not required if there isn't + cpld-syscon. +- port-rst-offset: is offset of reset field for each port in dsaf. Its value + depends on the hardware user manual. +- port-mode-offset: is offset of port mode field for each port in dsaf. Its + value depends on the hardware user manual. + [1] Documentation/devicetree/bindings/net/phy.txt Example: @@ -28,11 +56,11 @@ dsaf0: dsa@c7000000 { compatible = "hisilicon,hns-dsaf-v1"; mode = "6port-16rss"; interrupt-parent = <&mbigen_dsa>; - reg = <0x0 0xC0000000 0x0 0x420000 - 0x0 0xC2000000 0x0 0x300000 - 0x0 0xc5000000 0x0 0x890000 + reg = <0x0 0xc5000000 0x0 0x890000 0x0 0xc7000000 0x0 0x60000>; - phy-handle = <0 0 0 0 &soc0_phy4 &soc0_phy5 0 0>; + reg-names = "ppe-base", "dsaf-base"; + subctrl-syscon = <&subctrl>; + reset-field-offset = 0; interrupts = <131 4>,<132 4>, <133 4>,<134 4>, <135 4>,<136 4>, <137 4>,<138 4>, <139 4>,<140 4>, <141 4>,<142 4>, @@ -43,4 +71,15 @@ dsaf0: dsa@c7000000 { buf-size = <4096>; desc-num = <1024>; dma-coherent; + + prot@0 { + port-id = 0; + phy-handle = <&phy0>; + serdes-syscon = <&serdes>; + }; + + prot@1 { + port-id = 1; + serdes-syscon = <&serdes>; + }; }; -- cgit v1.2.3 From c132cdccb71ee000d6456ec63acdf0535b5f35da Mon Sep 17 00:00:00 2001 From: "Yisen.Zhuang\\(Zhuangyuzeng\\)" Date: Sat, 23 Apr 2016 17:05:16 +0800 Subject: Documentation: Bindings: add port-idx-in-ae for enet node This patch adds description for port-idx-in-ae attribute. Signed-off-by: Yisen Zhuang Signed-off-by: David S. Miller --- .../devicetree/bindings/net/hisilicon-hns-nic.txt | 30 +++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt index e6a9d1c30878..b9ff4ba6454e 100644 --- a/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt @@ -36,6 +36,34 @@ Required properties: | | | | | | external port + This attribute is remained for compatible purpose. It is not recommended to + use it in new code. + +- port-idx-in-ae: is the index of port provided by AE. + In NIC mode of DSAF, all 6 PHYs of service DSAF are taken as ethernet ports + to the CPU. The port-idx-in-ae can be 0 to 5. Here is the diagram: + +-----+---------------+ + | CPU | + +-+-+-+---+-+-+-+-+-+-+ + | | | | | | | | + debug debug service + port port port + (0) (0) (0-5) + + In Switch mode of DSAF, all 6 PHYs of service DSAF are taken as physical + ports connected to a LAN Switch while the CPU side assume itself have one + single NIC connected to this switch. In this case, the port-idx-in-ae + will be 0 only. + +-----+-----+------+------+ + | CPU | + +-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | service| port(0) + debug debug +------------+ + port port | switch | + (0) (0) +-+-+-+-+-+-++ + | | | | | | + external port + - local-mac-address: mac addr of the ethernet interface Example: @@ -43,6 +71,6 @@ Example: ethernet@0{ compatible = "hisilicon,hns-nic-v1"; ae-handle = <&dsaf0>; - port-id = <0>; + port-idx-in-ae = <0>; local-mac-address = [a2 14 e4 4b 56 76]; }; -- cgit v1.2.3 From e705498945ad3a3b945771c5d683df064bb9819c Mon Sep 17 00:00:00 2001 From: "Kanchanapally, Vidyullatha" Date: Mon, 11 Apr 2016 15:16:01 +0530 Subject: cfg80211: Add option to report the bss entry in connect result Since cfg80211 maintains separate BSS table entries for APs if the same BSSID, SSID pair is seen on multiple channels, it is possible that it can map the current_bss to a BSS entry on the wrong channel. This current_bss will not get flushed unless disconnected and cfg80211 reports a wrong channel as the associated channel. Fix this by introducing a new cfg80211_connect_bss() function which is similar to cfg80211_connect_result(), but it includes an additional parameter: the bss the STA is connected to. This allows drivers to provide the exact bss entry that matches the BSS to which the connection was completed. Reviewed-by: Jouni Malinen Signed-off-by: Vidyullatha Kanchanapally Signed-off-by: Sunil Dutt Signed-off-by: Johannes Berg --- Documentation/DocBook/80211.tmpl | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl index f2a312b35875..5f7c55999c77 100644 --- a/Documentation/DocBook/80211.tmpl +++ b/Documentation/DocBook/80211.tmpl @@ -135,6 +135,7 @@ !Finclude/net/cfg80211.h cfg80211_tx_mlme_mgmt !Finclude/net/cfg80211.h cfg80211_ibss_joined !Finclude/net/cfg80211.h cfg80211_connect_result +!Finclude/net/cfg80211.h cfg80211_connect_bss !Finclude/net/cfg80211.h cfg80211_roamed !Finclude/net/cfg80211.h cfg80211_disconnected !Finclude/net/cfg80211.h cfg80211_ready_on_channel -- cgit v1.2.3 From 84039920bdff60030b2b79e50e4c9d230ae00dad Mon Sep 17 00:00:00 2001 From: Xinming Hu Date: Mon, 18 Apr 2016 05:22:22 -0700 Subject: dt: bindings: add MARVELL's sd8xxx wireless device Add device tree binding documentation for MARVELL's sd8xxx (sd8897 and sd8997) wlan chip. Signed-off-by: Xinming Hu Signed-off-by: Amitkumar Karwar Acked-by: Rob Herring Signed-off-by: Kalle Valo --- .../bindings/net/wireless/marvell-sd8xxx.txt | 63 ++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt b/Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt new file mode 100644 index 000000000000..c421aba0a5bc --- /dev/null +++ b/Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt @@ -0,0 +1,63 @@ +Marvell 8897/8997 (sd8897/sd8997) SDIO devices +------ + +This node provides properties for controlling the marvell sdio wireless device. +The node is expected to be specified as a child node to the SDIO controller that +connects the device to the system. + +Required properties: + + - compatible : should be one of the following: + * "marvell,sd8897" + * "marvell,sd8997" + +Optional properties: + + - marvell,caldata* : A series of properties with marvell,caldata prefix, + represent calibration data downloaded to the device during + initialization. This is an array of unsigned 8-bit values. + the properties should follow below property name and + corresponding array length: + "marvell,caldata-txpwrlimit-2g" (length = 566). + "marvell,caldata-txpwrlimit-5g-sub0" (length = 502). + "marvell,caldata-txpwrlimit-5g-sub1" (length = 688). + "marvell,caldata-txpwrlimit-5g-sub2" (length = 750). + "marvell,caldata-txpwrlimit-5g-sub3" (length = 502). + - marvell,wakeup-pin : a wakeup pin number of wifi chip which will be configured + to firmware. Firmware will wakeup the host using this pin + during suspend/resume. + - interrupt-parent: phandle of the parent interrupt controller + - interrupts : interrupt pin number to the cpu. driver will request an irq based on + this interrupt number. during system suspend, the irq will be enabled + so that the wifi chip can wakeup host platform under certain condition. + during system resume, the irq will be disabled to make sure + unnecessary interrupt is not received. + +Example: + +Tx power limit calibration data is configured in below example. +The calibration data is an array of unsigned values, the length +can vary between hw versions. +IRQ pin 38 is used as system wakeup source interrupt. wakeup pin 3 is configured +so that firmware can wakeup host using this device side pin. + +&mmc3 { + status = "okay"; + vmmc-supply = <&wlan_en_reg>; + bus-width = <4>; + cap-power-off-card; + keep-power-in-suspend; + + #address-cells = <1>; + #size-cells = <0>; + mwifiex: wifi@1 { + compatible = "marvell,sd8897"; + reg = <1>; + interrupt-parent = <&pio>; + interrupts = <38 IRQ_TYPE_LEVEL_LOW>; + + marvell,caldata_00_txpwrlimit_2g_cfg_set = /bits/ 8 < + 0x01 0x00 0x06 0x00 0x08 0x02 0x89 0x01>; + marvell,wakeup-pin = <3>; + }; +}; -- cgit v1.2.3 From 9854518ea04db33738602d45ebc96a200e6f5198 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Tue, 26 Apr 2016 10:06:18 +0200 Subject: sched: align nlattr properly when needed Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller --- Documentation/networking/gen_stats.txt | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/gen_stats.txt b/Documentation/networking/gen_stats.txt index 70e6275b757a..ff630a87b511 100644 --- a/Documentation/networking/gen_stats.txt +++ b/Documentation/networking/gen_stats.txt @@ -33,7 +33,8 @@ my_dumping_routine(struct sk_buff *skb, ...) { struct gnet_dump dump; - if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump) < 0) + if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump, + TCA_PAD) < 0) goto rtattr_failure; if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 || @@ -56,7 +57,8 @@ existing TLV types. my_dumping_routine(struct sk_buff *skb, ...) { if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS, - TCA_XSTATS, &mystruct->lock, &dump) < 0) + TCA_XSTATS, &mystruct->lock, &dump, + TCA_PAD) < 0) goto rtattr_failure; ... } -- cgit v1.2.3 From f0cdf76c103ffa34ca5ac87dcdef7edffc722cbf Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Sun, 24 Apr 2016 21:38:14 +0200 Subject: net: remove NETDEV_TX_LOCKED support No more users in the tree, remove NETDEV_TX_LOCKED support. Adds another hole in softnet_stats struct, but better than keeping the unused collision counter around. Signed-off-by: Florian Westphal Signed-off-by: David S. Miller --- Documentation/networking/netdev-features.txt | 10 ++++------ Documentation/networking/netdevices.txt | 9 +++------ 2 files changed, 7 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt index f310edec8a77..7413eb05223b 100644 --- a/Documentation/networking/netdev-features.txt +++ b/Documentation/networking/netdev-features.txt @@ -131,13 +131,11 @@ stack. Driver should not change behaviour based on them. * LLTX driver (deprecated for hardware drivers) -NETIF_F_LLTX should be set in drivers that implement their own locking in -transmit path or don't need locking at all (e.g. software tunnels). -In ndo_start_xmit, it is recommended to use a try_lock and return -NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly -protect against other callbacks (the rules you need to find out). +NETIF_F_LLTX is meant to be used by drivers that don't need locking at all, +e.g. software tunnels. -Don't use it for new drivers. +This is also used in a few legacy drivers that implement their +own locking, don't use it for new (hardware) drivers. * netns-local device diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index 0b1cf6b2a592..7fec2061a334 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -69,10 +69,9 @@ ndo_start_xmit: When the driver sets NETIF_F_LLTX in dev->features this will be called without holding netif_tx_lock. In this case the driver - has to lock by itself when needed. It is recommended to use a try lock - for this and return NETDEV_TX_LOCKED when the spin lock fails. - The locking there should also properly protect against - set_rx_mode. Note that the use of NETIF_F_LLTX is deprecated. + has to lock by itself when needed. + The locking there should also properly protect against + set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated. Don't use it for new drivers. Context: Process with BHs disabled or BH (timer), @@ -83,8 +82,6 @@ ndo_start_xmit: o NETDEV_TX_BUSY Cannot transmit packet, try later Usually a bug, means queue start/stop flow control is broken in the driver. Note: the driver must NOT put the skb in its DMA ring. - o NETDEV_TX_LOCKED Locking failed, please retry quickly. - Only valid when NETIF_F_LLTX is set. ndo_tx_timeout: Synchronization: netif_tx_lock spinlock; all TX queues frozen. -- cgit v1.2.3 From 570d8e9398011a63590c281a36cdce311196608e Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Wed, 27 Apr 2016 17:53:08 +0200 Subject: taskstats: fix nl parsing in accounting/getdelays.c The type TASKSTATS_TYPE_NULL should always be ignored. When jumping to the next attribute, only the length of the current attribute should be added, not the length of all nested attributes. This last bug was not visible before commit 80df554275c2, because the kernel didn't put more than two nested attributes. Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation") Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit") Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller --- Documentation/accounting/getdelays.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index 7785fb5eb93f..b5ca536e56a8 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -505,6 +505,8 @@ int main(int argc, char *argv[]) if (!loop) goto done; break; + case TASKSTATS_TYPE_NULL: + break; default: fprintf(stderr, "Unknown nested" " nla_type %d\n", @@ -512,7 +514,8 @@ int main(int argc, char *argv[]) break; } len2 += NLA_ALIGN(na->nla_len); - na = (struct nlattr *) ((char *) na + len2); + na = (struct nlattr *)((char *)na + + NLA_ALIGN(na->nla_len)); } break; -- cgit v1.2.3 From a1ecde2c6f00825e3a6d90dc774cddc18cb0e247 Mon Sep 17 00:00:00 2001 From: "Yisen.Zhuang\\(Zhuangyuzeng\\)" Date: Thu, 28 Apr 2016 15:09:03 +0800 Subject: Documentation: Bindings: Update DT binding for hns dsaf node This patch changes property port-id to reg in dsaf port node, removes property cpld-ctrl-reg, and fixes some typos. Signed-off-by: Yisen Zhuang Signed-off-by: David S. Miller --- .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 28 ++++++++++------------ 1 file changed, 13 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt index 5ccd4f002a67..d4b7f2e49984 100644 --- a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt @@ -13,10 +13,10 @@ Required properties: - interrupts: should contain the DSA Fabric and rcb interrupt. - reg: specifies base physical address(es) and size of the device registers. The first region is external interface control register base and size(optional, - only be used when subctrl-syscon is not exists). It is recommended using + only used when subctrl-syscon does not exist). It is recommended using subctrl-syscon rather than this address. - The second region is SerDes base register and size(optional, only be used when - serdes-syscon in port node is not exists. It is recommended using + The second region is SerDes base register and size(optional, only used when + serdes-syscon in port node does not exist). It is recommended using serdes-syscon rather than this address. The third region is the PPE register base and size. The fourth region is dsa fabric base register and size. It is not required for @@ -24,8 +24,8 @@ Required properties: - reg-names: may be ppe-base and(or) dsaf-base. It is used to find the corresponding reg's index. -- phy-handle: phy handle of physicl port, 0 if not any phy device. It is optional - attribute. If port node is exists, phy-handle in each port node will be used. +- phy-handle: phy handle of physical port, 0 if not any phy device. It is optional + attribute. If port node exists, phy-handle in each port node will be used. see ethernet.txt [1]. - subctrl-syscon: is syscon handle for external interface control register. - reset-field-offset: is offset of reset field. Its value depends on the hardware @@ -35,14 +35,12 @@ Required properties: - port: subnodes of dsaf. A dsaf node may contain several port nodes(Depending on mode of dsaf). Port node contain some attributes listed below: -- port-id: is physical port index in one dsaf. -- phy-handle: phy handle of physicl port. It is not required if there isn't +- reg: is physical port index in one dsaf. +- phy-handle: phy handle of physical port. It is not required if there isn't phy device. see ethernet.txt [1]. - serdes-syscon: is syscon handle for SerDes register. -- cpld-syscon: is syscon handle for cpld register. It is not required if there - isn't cpld device. -- cpld-ctrl-reg: is cpld register offset. It is not required if there isn't - cpld-syscon. +- cpld-syscon: is syscon handle + register offset pair for cpld register. It is + not required if there isn't cpld device. - port-rst-offset: is offset of reset field for each port in dsaf. Its value depends on the hardware user manual. - port-mode-offset: is offset of port mode field for each port in dsaf. Its @@ -72,14 +70,14 @@ dsaf0: dsa@c7000000 { desc-num = <1024>; dma-coherent; - prot@0 { - port-id = 0; + port@0 { + reg = 0; phy-handle = <&phy0>; serdes-syscon = <&serdes>; }; - prot@1 { - port-id = 1; + port@1 { + reg = 1; serdes-syscon = <&serdes>; }; }; -- cgit v1.2.3 From 2dd355a007e44960ec049c75920ddb6778fec9ee Mon Sep 17 00:00:00 2001 From: Michael Heimpold Date: Thu, 28 Apr 2016 22:06:15 +0200 Subject: net: ethernet: enc28j60: add device tree support The following patch adds the required match table for device tree support (and while at, fix the indent). It's also possible to specify the MAC address in the DT blob. Also add the corresponding binding documentation file. Signed-off-by: Michael Heimpold Signed-off-by: David S. Miller --- .../devicetree/bindings/net/microchip,enc28j60.txt | 59 ++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/microchip,enc28j60.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/microchip,enc28j60.txt b/Documentation/devicetree/bindings/net/microchip,enc28j60.txt new file mode 100644 index 000000000000..1dc3bc75539d --- /dev/null +++ b/Documentation/devicetree/bindings/net/microchip,enc28j60.txt @@ -0,0 +1,59 @@ +* Microchip ENC28J60 + +This is a standalone 10 MBit ethernet controller with SPI interface. + +For each device connected to a SPI bus, define a child node within +the SPI master node. + +Required properties: +- compatible: Should be "microchip,enc28j60" +- reg: Specify the SPI chip select the ENC28J60 is wired to +- interrupt-parent: Specify the phandle of the source interrupt, see interrupt + binding documentation for details. Usually this is the GPIO bank + the interrupt line is wired to. +- interrupts: Specify the interrupt index within the interrupt controller (referred + to above in interrupt-parent) and interrupt type. The ENC28J60 natively + generates falling edge interrupts, however, additional board logic + might invert the signal. +- pinctrl-names: List of assigned state names, see pinctrl binding documentation. +- pinctrl-0: List of phandles to configure the GPIO pin used as interrupt line, + see also generic and your platform specific pinctrl binding + documentation. + +Optional properties: +- spi-max-frequency: Maximum frequency of the SPI bus when accessing the ENC28J60. + According to the ENC28J80 datasheet, the chip allows a maximum of 20 MHz, however, + board designs may need to limit this value. +- local-mac-address: See ethernet.txt in the same directory. + + +Example (for NXP i.MX28 with pin control stuff for GPIO irq): + + ssp2: ssp@80014000 { + compatible = "fsl,imx28-spi"; + pinctrl-names = "default"; + pinctrl-0 = <&spi2_pins_b &spi2_sck_cfg>; + status = "okay"; + + enc28j60: ethernet@0 { + compatible = "microchip,enc28j60"; + pinctrl-names = "default"; + pinctrl-0 = <&enc28j60_pins>; + reg = <0>; + interrupt-parent = <&gpio3>; + interrupts = <3 IRQ_TYPE_EDGE_FALLING>; + spi-max-frequency = <12000000>; + }; + }; + + pinctrl@80018000 { + enc28j60_pins: enc28j60_pins@0 { + reg = <0>; + fsl,pinmux-ids = < + MX28_PAD_AUART0_RTS__GPIO_3_3 /* Interrupt */ + >; + fsl,drive-strength = ; + fsl,voltage = ; + fsl,pull-up = ; + }; + }; -- cgit v1.2.3 From 71c08eac2e88b01ecbfba1b1a485a748a4632727 Mon Sep 17 00:00:00 2001 From: Michael Thalmeier Date: Mon, 11 Apr 2016 16:36:02 +0200 Subject: nfc: pn533: Add device tree documentation for i2c phy Add pn533-i2c phy devicetree documentation Signed-off-by: Michael Thalmeier Signed-off-by: Samuel Ortiz --- .../devicetree/bindings/net/nfc/pn533-i2c.txt | 31 ++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/nfc/pn533-i2c.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/nfc/pn533-i2c.txt b/Documentation/devicetree/bindings/net/nfc/pn533-i2c.txt new file mode 100644 index 000000000000..1aea822d4530 --- /dev/null +++ b/Documentation/devicetree/bindings/net/nfc/pn533-i2c.txt @@ -0,0 +1,31 @@ +* NXP Semiconductors PN532 NFC Controller + +Required properties: +- compatible: Should be "nxp,pn532-i2c" or "nxp,pn533-i2c". +- clock-frequency: I²C work frequency. +- reg: address on the bus +- interrupt-parent: phandle for the interrupt gpio controller +- interrupts: GPIO interrupt to which the chip is connected + +Optional SoC Specific Properties: +- pinctrl-names: Contains only one value - "default". +- pintctrl-0: Specifies the pin control groups used for this controller. + +Example (for ARM-based BeagleBone with PN532 on I2C2): + +&i2c2 { + + status = "okay"; + + pn532: pn532@24 { + + compatible = "nxp,pn532-i2c"; + + reg = <0x24>; + clock-frequency = <400000>; + + interrupt-parent = <&gpio1>; + interrupts = <17 IRQ_TYPE_EDGE_FALLING>; + + }; +}; -- cgit v1.2.3 From 0065d1c5acdb60ee2c0e54585a29243718465bb7 Mon Sep 17 00:00:00 2001 From: Xinming Hu Date: Tue, 26 Apr 2016 06:57:26 -0700 Subject: dt: bindings: add MARVELL's bt-sd8xxx wireless device Add device tree binding documentation for MARVELL's bluetooth sdio (sd8897 and sd8997) chip. Signed-off-by: Xinming Hu Signed-off-by: Amitkumar Karwar Acked-by: Rob Herring Signed-off-by: Marcel Holtmann --- Documentation/devicetree/bindings/btmrvl.txt | 29 ----------- .../devicetree/bindings/net/marvell-bt-sd8xxx.txt | 56 ++++++++++++++++++++++ 2 files changed, 56 insertions(+), 29 deletions(-) delete mode 100644 Documentation/devicetree/bindings/btmrvl.txt create mode 100644 Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/btmrvl.txt b/Documentation/devicetree/bindings/btmrvl.txt deleted file mode 100644 index 58f964bb0a52..000000000000 --- a/Documentation/devicetree/bindings/btmrvl.txt +++ /dev/null @@ -1,29 +0,0 @@ -btmrvl ------- - -Required properties: - - - compatible : must be "btmrvl,cfgdata" - -Optional properties: - - - btmrvl,cal-data : Calibration data downloaded to the device during - initialization. This is an array of 28 values(u8). - - - btmrvl,gpio-gap : gpio and gap (in msecs) combination to be - configured. - -Example: - -GPIO pin 13 is configured as a wakeup source and GAP is set to 100 msecs -in below example. - -btmrvl { - compatible = "btmrvl,cfgdata"; - - btmrvl,cal-data = /bits/ 8 < - 0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02 - 0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00 - 0x00 0x00 0xf0 0x00>; - btmrvl,gpio-gap = <0x0d64>; -}; diff --git a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt b/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt new file mode 100644 index 000000000000..14aa6cf58201 --- /dev/null +++ b/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt @@ -0,0 +1,56 @@ +Marvell 8897/8997 (sd8897/sd8997) bluetooth SDIO devices +------ + +Required properties: + + - compatible : should be one of the following: + * "marvell,sd8897-bt" + * "marvell,sd8997-bt" + +Optional properties: + + - marvell,cal-data: Calibration data downloaded to the device during + initialization. This is an array of 28 values(u8). + + - marvell,wakeup-pin: It represents wakeup pin number of the bluetooth chip. + firmware will use the pin to wakeup host system. + - marvell,wakeup-gap-ms: wakeup gap represents wakeup latency of the host + platform. The value will be configured to firmware. This + is needed to work chip's sleep feature as expected. + - interrupt-parent: phandle of the parent interrupt controller + - interrupts : interrupt pin number to the cpu. Driver will request an irq based + on this interrupt number. During system suspend, the irq will be + enabled so that the bluetooth chip can wakeup host platform under + certain condition. During system resume, the irq will be disabled + to make sure unnecessary interrupt is not received. + +Example: + +IRQ pin 119 is used as system wakeup source interrupt. +wakeup pin 13 and gap 100ms are configured so that firmware can wakeup host +using this device side pin and wakeup latency. +calibration data is also available in below example. + +&mmc3 { + status = "okay"; + vmmc-supply = <&wlan_en_reg>; + bus-width = <4>; + cap-power-off-card; + keep-power-in-suspend; + + #address-cells = <1>; + #size-cells = <0>; + btmrvl: bluetooth@2 { + compatible = "marvell,sd8897-bt"; + reg = <2>; + interrupt-parent = <&pio>; + interrupts = <119 IRQ_TYPE_LEVEL_LOW>; + + marvell,cal-data = /bits/ 8 < + 0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02 + 0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00 + 0x00 0x00 0xf0 0x00>; + marvell,wakeup-pin = <0x0d>; + marvell,wakeup-gap-ms = <0x64>; + }; +}; -- cgit v1.2.3 From 4cac949f59a133df11d88bc3d1512786507b02bf Mon Sep 17 00:00:00 2001 From: Iyappan Subramanian Date: Fri, 29 Apr 2016 11:10:14 -0700 Subject: Documentation: dtb: xgene: Add channel property Signed-off-by: Iyappan Subramanian Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/apm-xgene-enet.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt index 078060a97f95..05f705e32a4a 100644 --- a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt +++ b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt @@ -18,6 +18,8 @@ Required properties for all the ethernet interfaces: - First is the Rx interrupt. This irq is mandatory. - Second is the Tx completion interrupt. This is supported only on SGMII based 1GbE and 10GbE interfaces. +- channel: Ethernet to CPU, start channel (prefetch buffer) number + - Must map to the first irq and irqs must be sequential - port-id: Port number (0 or 1) - clocks: Reference to the clock entry. - local-mac-address: MAC address assigned to this device -- cgit v1.2.3 From 5c2a9644d05e98b3c06b073351cd363ff91b22e8 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 4 May 2016 22:51:47 +0200 Subject: bonding: update documentation section after dev->trans_start removal Drivers that use LLTX need to update trans_start of the netdev_queue. (Most drivers don't use LLTX; stack does this update if .ndo_start_xmit returned TX_OK). Signed-off-by: Florian Westphal Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 334b49ef02d1..57f52cdce32e 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1880,8 +1880,8 @@ or more peers on the local network. The ARP monitor relies on the device driver itself to verify that traffic is flowing. In particular, the driver must keep up to -date the last receive time, dev->last_rx, and transmit start time, -dev->trans_start. If these are not updated by the driver, then the +date the last receive time, dev->last_rx. Drivers that use NETIF_F_LLTX +flag must also update netdev_queue->trans_start. If they do not, then the ARP monitor will immediately fail any slaves using that driver, and those slaves will stay down. If networking monitoring (tcpdump, etc) shows the ARP requests and replies on the network, then it may be that -- cgit v1.2.3 From f9c8d19d6c7c15a59963f80ec47e68808914abd4 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Thu, 5 May 2016 19:49:13 -0700 Subject: bpf: add documentation for 'direct packet access' explain how verifier checks safety of packet access and update email addresses. Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann Signed-off-by: David S. Miller --- Documentation/networking/filter.txt | 85 ++++++++++++++++++++++++++++++++++++- 1 file changed, 83 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index 96da119a47e7..6aef0b5f3bc7 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -1095,6 +1095,87 @@ all use cases. See details of eBPF verifier in kernel/bpf/verifier.c +Direct packet access +-------------------- +In cls_bpf and act_bpf programs the verifier allows direct access to the packet +data via skb->data and skb->data_end pointers. +Ex: +1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */ +2: r3 = *(u32 *)(r1 +76) /* load skb->data */ +3: r5 = r3 +4: r5 += 14 +5: if r5 > r4 goto pc+16 +R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp +6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */ + +this 2byte load from the packet is safe to do, since the program author +did check 'if (skb->data + 14 > skb->data_end) goto err' at insn #5 which +means that in the fall-through case the register R3 (which points to skb->data) +has at least 14 directly accessible bytes. The verifier marks it +as R3=pkt(id=0,off=0,r=14). +id=0 means that no additional variables were added to the register. +off=0 means that no additional constants were added. +r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok. +Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points +to the packet data, but constant 14 was added to the register, so +it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14) +which is zero bytes. + +More complex packet access may look like: + R0=imm1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp + 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */ + 7: r4 = *(u8 *)(r3 +12) + 8: r4 *= 14 + 9: r3 = *(u32 *)(r1 +76) /* load skb->data */ +10: r3 += r4 +11: r2 = r1 +12: r2 <<= 48 +13: r2 >>= 48 +14: r3 += r2 +15: r2 = r3 +16: r2 += 8 +17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */ +18: if r2 > r1 goto pc+2 + R0=inv56 R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv52 R5=pkt(id=0,off=14,r=14) R10=fp +19: r1 = *(u8 *)(r3 +4) +The state of the register R3 is R3=pkt(id=2,off=0,r=8) +id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some +offset within a packet and since the program author did +'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8). +The verifier only allows 'add' operation on packet registers. Any other +operation will set the register state to 'unknown_value' and it won't be +available for direct packet access. +Operation 'r3 += rX' may overflow and become less than original skb->data, +therefore the verifier has to prevent that. So it tracks the number of +upper zero bits in all 'uknown_value' registers, so when it sees +'r3 += rX' instruction and rX is more than 16-bit value, it will error as: +"cannot add integer value with N upper zero bits to ptr_to_packet" +Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is +R4=inv56 which means that upper 56 bits on the register are guaranteed +to be zero. After insn 'r4 *= 14' the state becomes R4=inv52, since +multiplying 8-bit value by constant 14 will keep upper 52 bits as zero. +Similarly 'r2 >>= 48' will make R2=inv48, since the shift is not sign +extending. This logic is implemented in evaluate_reg_alu() function. + +The end result is that bpf program author can access packet directly +using normal C code as: + void *data = (void *)(long)skb->data; + void *data_end = (void *)(long)skb->data_end; + struct eth_hdr *eth = data; + struct iphdr *iph = data + sizeof(*eth); + struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph); + + if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) + return 0; + if (eth->h_proto != htons(ETH_P_IP)) + return 0; + if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) + return 0; + if (udp->dest == 53 || udp->source == 9) + ...; +which makes such programs easier to write comparing to LD_ABS insn +and significantly faster. + eBPF maps --------- 'maps' is a generic storage of different types for sharing data between kernel @@ -1293,5 +1374,5 @@ to give potential BPF hackers or security auditors a better overview of the underlying architecture. Jay Schulist -Daniel Borkmann -Alexei Starovoitov +Daniel Borkmann +Alexei Starovoitov -- cgit v1.2.3 From 14c7b3c3877075e6df22e071d4619cbdeac82ffd Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Tue, 10 May 2016 23:27:21 +0200 Subject: dsa: Add mdio device support to Marvell switches Allow Marvell switches to be mdio devices. Currently the driver just allocate the private structure and detects what device is on the bus. Later patches will make them register with the DSA framework. Signed-off-by: Andrew Lunn Signed-off-by: David S. Miller --- .../devicetree/bindings/net/dsa/marvell.txt | 27 ++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/dsa/marvell.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt new file mode 100644 index 000000000000..cdd70cebdea7 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt @@ -0,0 +1,27 @@ +Marvell DSA Switch Device Tree Bindings +--------------------------------------- + +WARNING: This binding is currently unstable. Do not program it into a +FLASH never to be changed again. Once this binding is stable, this +warning will be removed. + +If you need a stable binding, use the old dsa.txt binding. + +Marvell Switches are MDIO devices. The following properties should be +placed as a child node of an mdio device. + +Required properties: +- compatible : Should be one of "marvell,mv88e6085", +- reg : Address on the MII bus for the switch. + +Example: + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + switch0: switch@0 { + compatible = "marvell,mv88e6085"; + reg = <0>; + }; + }; -- cgit v1.2.3 From 52638f71fcff9386fe64c83a18a129b122333fdf Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Tue, 10 May 2016 23:27:22 +0200 Subject: dsa: Move gpio reset into switch driver Resetting the switch is something the driver does, not the framework. So move the parsing of this property into the driver. There are no in kernel users of this property, so moving it does not break anything. There is however a board which will make use of this property making its way into the kernel. Signed-off-by: Andrew Lunn Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/dsa/dsa.txt | 2 -- Documentation/devicetree/bindings/net/dsa/marvell.txt | 8 ++++++++ 2 files changed, 8 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt index 5fdbbcdf8c4b..9f4807f90c31 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt @@ -31,8 +31,6 @@ A switch child node has the following optional property: switch. Must be set if the switch can not detect the presence and/or size of a connected EEPROM, otherwise optional. -- reset-gpios : phandle and specifier to a gpio line connected to - reset pin of the switch chip. A switch may have multiple "port" children nodes diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt index cdd70cebdea7..7629189398aa 100644 --- a/Documentation/devicetree/bindings/net/dsa/marvell.txt +++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt @@ -10,10 +10,17 @@ If you need a stable binding, use the old dsa.txt binding. Marvell Switches are MDIO devices. The following properties should be placed as a child node of an mdio device. +The properties described here are those specific to Marvell devices. +Additional required and optional properties can be found in dsa.txt. + Required properties: - compatible : Should be one of "marvell,mv88e6085", - reg : Address on the MII bus for the switch. +Optional properties: + +- reset-gpios : Should be a gpio specifier for a reset line + Example: mdio { @@ -23,5 +30,6 @@ Example: switch0: switch@0 { compatible = "marvell,mv88e6085"; reg = <0>; + reset-gpios = <&gpio5 1 GPIO_ACTIVE_LOW>; }; }; -- cgit v1.2.3 From da47b4572056487fd7941c26f73b3e8815ff712a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Thu, 12 May 2016 12:00:33 +0200 Subject: phy: add support for a reset-gpio specification MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The framework only asserts (for now) that the reset gpio is not active. Signed-off-by: Uwe Kleine-König Reviewed-by: Roger Quadros Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/phy.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt index bc1c3c8bf8fa..c00a9a894547 100644 --- a/Documentation/devicetree/bindings/net/phy.txt +++ b/Documentation/devicetree/bindings/net/phy.txt @@ -35,6 +35,8 @@ Optional Properties: - broken-turn-around: If set, indicates the PHY device does not correctly release the turn around line low at the end of a MDIO transaction. +- reset-gpios: Reference to a GPIO used to reset the phy. + Example: ethernet-phy@0 { @@ -42,4 +44,5 @@ ethernet-phy@0 { interrupt-parent = <40000>; interrupts = <35 1>; reg = <0>; + reset-gpios = <&gpio1 17 GPIO_ACTIVE_LOW>; }; -- cgit v1.2.3 From 4f3446bb809f20ad56cadf712e6006815ae7a8f9 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 13 May 2016 19:08:32 +0200 Subject: bpf: add generic constant blinding for use in jits This work adds a generic facility for use from eBPF JIT compilers that allows for further hardening of JIT generated images through blinding constants. In response to the original work on BPF JIT spraying published by Keegan McAllister [1], most BPF JITs were changed to make images read-only and start at a randomized offset in the page, where the rest was filled with trap instructions. We have this nowadays in x86, arm, arm64 and s390 JIT compilers. Additionally, later work also made eBPF interpreter images read only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86, arm, arm64 and s390 archs as well currently. This is done by default for mentioned JITs when JITing is enabled. Furthermore, we had a generic and configurable constant blinding facility on our todo for quite some time now to further make spraying harder, and first implementation since around netconf 2016. We found that for systems where untrusted users can load cBPF/eBPF code where JIT is enabled, start offset randomization helps a bit to make jumps into crafted payload harder, but in case where larger programs that cross page boundary are injected, we again have some part of the program opcodes at a page start offset. With improved guessing and more reliable payload injection, chances can increase to jump into such payload. Elena Reshetova recently wrote a test case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which can leave some more room for payloads. Note that for all this, additional bugs in the kernel are still required to make the jump (and of course to guess right, to not jump into a trap) and naturally the JIT must be enabled, which is disabled by default. For helping mitigation, the general idea is to provide an option bpf_jit_harden that admins can tweak along with bpf_jit_enable, so that for cases where JIT should be enabled for performance reasons, the generated image can be further hardened with blinding constants for unpriviledged users (bpf_jit_harden == 1), with trading off performance for these, but not for privileged ones. We also added the option of blinding for all users (bpf_jit_harden == 2), which is quite helpful for testing f.e. with test_bpf.ko. There are no further e.g. hardening levels of bpf_jit_harden switch intended, rationale is to have it dead simple to use as on/off. Since this functionality would need to be duplicated over and over for JIT compilers to use, which are already complex enough, we provide a generic eBPF byte-code level based blinding implementation, which is then just transparently JITed. JIT compilers need to make only a few changes to integrate this facility and can be migrated one by one. This option is for eBPF JITs and will be used in x86, arm64, s390 without too much effort, and soon ppc64 JITs, thus that native eBPF can be blinded as well as cBPF to eBPF migrations, so that both can be covered with a single implementation. The rule for JITs is that bpf_jit_blind_constants() must be called from bpf_int_jit_compile(), and in case blinding is disabled, we follow normally with JITing the passed program. In case blinding is enabled and we fail during the process of blinding itself, we must return with the interpreter. Similarly, in case the JITing process after the blinding failed, we return normally to the interpreter with the non-blinded code. Meaning, interpreter doesn't change in any way and operates on eBPF code as usual. For doing this pre-JIT blinding step, we need to make use of a helper/auxiliary register, here BPF_REG_AX. This is strictly internal to the JIT and not in any way part of the eBPF architecture. Just like in the same way as JITs internally make use of some helper registers when emitting code, only that here the helper register is one abstraction level higher in eBPF bytecode, but nevertheless in JIT phase. That helper register is needed since f.e. manually written program can issue loads to all registers of eBPF architecture. The core concept with the additional register is: blind out all 32 and 64 bit constants by converting BPF_K based instructions into a small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND, and REG BPF_REG_AX, so actual operation on the target register is translated from BPF_K into BPF_X one that is operating on BPF_REG_AX's content. During rewriting phase when blinding, RND is newly generated via prandom_u32() for each processed instruction. 64 bit loads are split into two 32 bit loads to make translation and patching not too complex. Only basic thing required by JITs is to call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other() pair, and to map BPF_REG_AX into an unused register. Small bpf_jit_disasm extract from [2] when applied to x86 JIT: echo 0 > /proc/sys/net/core/bpf_jit_harden ffffffffa034f5e9 + : [...] 39: mov $0xa8909090,%eax 3e: mov $0xa8909090,%eax 43: mov $0xa8ff3148,%eax 48: mov $0xa89081b4,%eax 4d: mov $0xa8900bb0,%eax 52: mov $0xa810e0c1,%eax 57: mov $0xa8908eb4,%eax 5c: mov $0xa89020b0,%eax [...] echo 1 > /proc/sys/net/core/bpf_jit_harden ffffffffa034f1e5 + : [...] 39: mov $0xe1192563,%r10d 3f: xor $0x4989b5f3,%r10d 46: mov %r10d,%eax 49: mov $0xb8296d93,%r10d 4f: xor $0x10b9fd03,%r10d 56: mov %r10d,%eax 59: mov $0x8c381146,%r10d 5f: xor $0x24c7200e,%r10d 66: mov %r10d,%eax 69: mov $0xeb2a830e,%r10d 6f: xor $0x43ba02ba,%r10d 76: mov %r10d,%eax 79: mov $0xd9730af,%r10d 7f: xor $0xa5073b1f,%r10d 86: mov %r10d,%eax 89: mov $0x9a45662b,%r10d 8f: xor $0x325586ea,%r10d 96: mov %r10d,%eax [...] As can be seen, original constants that carry payload are hidden when enabled, actual operations are transformed from constant-based to register-based ones, making jumps into constants ineffective. Above extract/example uses single BPF load instruction over and over, but of course all instructions with constants are blinded. Performance wise, JIT with blinding performs a bit slower than just JIT and faster than interpreter case. This is expected, since we still get all the performance benefits from JITing and in normal use-cases not every single instruction needs to be blinded. Summing up all 296 test cases averaged over multiple runs from test_bpf.ko suite, interpreter was 55% slower than JIT only and JIT with blinding was 8% slower than JIT only. Since there are also some extremes in the test suite, I expect for ordinary workloads that the performance for the JIT with blinding case is even closer to JIT only case, f.e. nmap test case from suite has averaged timings in ns 29 (JIT), 35 (+ blinding), and 151 (interpreter). BPF test suite, seccomp test suite, eBPF sample code and various bigger networking eBPF programs have been tested with this and were running fine. For testing purposes, I also adapted interpreter and redirected blinded eBPF image to interpreter and also here all tests pass. [1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html [2] https://github.com/01org/jit-spray-poc-for-ksp/ [3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5 Signed-off-by: Daniel Borkmann Reviewed-by: Elena Reshetova Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller --- Documentation/sysctl/net.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index 809ab6efcc74..f0480f7ea740 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -43,6 +43,17 @@ Values : 1 - enable the JIT 2 - enable the JIT and ask the compiler to emit traces on kernel log. +bpf_jit_harden +-------------- + +This enables hardening for the Berkeley Packet Filter Just in Time compiler. +Supported are eBPF JIT backends. Enabling hardening trades off performance, +but can mitigate JIT spraying. +Values : + 0 - disable JIT hardening (default value) + 1 - enable JIT hardening for unprivileged users only + 2 - enable JIT hardening for all users + dev_weight -------------- -- cgit v1.2.3 From 9295c034726e025395e6eff3013fa9e3753bcb39 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 16 May 2016 23:06:53 +0200 Subject: bpf, doc: fix typo on bpf_asm descriptions Fix description of some of the bpf_asm tool related jump instructions and generally move them to format A k. Reported-by: Sebastian Amend Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller --- Documentation/networking/filter.txt | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index 6aef0b5f3bc7..b9a4edf21ade 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for: jmp 6 Jump to label ja 6 Jump to label - jeq 7, 8 Jump on k == A - jneq 8 Jump on k != A - jne 8 Jump on k != A - jlt 8 Jump on k < A - jle 8 Jump on k <= A - jgt 7, 8 Jump on k > A - jge 7, 8 Jump on k >= A - jset 7, 8 Jump on k & A + jeq 7, 8 Jump on A == k + jneq 8 Jump on A != k + jne 8 Jump on A != k + jlt 8 Jump on A < k + jle 8 Jump on A <= k + jgt 7, 8 Jump on A > k + jge 7, 8 Jump on A >= k + jset 7, 8 Jump on A & k add 0, 4 A + sub 0, 4 A - -- cgit v1.2.3