linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	net: sched: move tc_classify function to cls_api.c	Jiri Pirko	2017-05-17	17	-65/+72
\| \| \| \| \| \| \| \| \|	Move tc_classify function to cls_api.c where it belongs, rename it to fit the namespace. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'dsa-sort'	David S. Miller	2017-05-17	7	-52/+52
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrew Lunn says: ==================== net: dsa: Sort various lists As we gain more DSA drivers and tagging protocols, the lists are getting a bit unruly. Do some sorting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: DSA: Sort drivers	Andrew Lunn	2017-05-17	2	-23/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With more drivers being added, it is time to sort the drivers to impose some order. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: Sort DSA tagging protocol drivers	Andrew Lunn	2017-05-17	5	-29/+29
\|/ \| \| \| \| \| \| \| \|	With more tag protocols being added, regain some order by sorting the entries in various places. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	liquidio: fix PF falsely indicating success at setting MAC address of a ↵	Felix Manlunas	2017-05-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	nonexistent VF In the function assigned to .ndo_set_vf_mac, check the validity of the vfidx argument before proceeding to tell the firmware to set the VF MAC address. Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	liquidio: fix insmod failure when multiple NICs are plugged in	Rick Farrington	2017-05-17	3	-10/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When multiple liquidio NICs are plugged in, the first insmod of the PF driver succeeds. But after an rmmod, a subsequent insmod fails. Reason is during rmmod, the PF driver resets the Octeon of only one of the NICs; it neglects to reset the Octeons of the other NICs. Fix the insmod failure by adding the missing Octeon resets at rmmod. Keep a per-NIC refcount that indicates the number of active PFs in a given NIC. When the refcount goes to zero, then reset the Octeon of that NIC. Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: store CPU port pointer in the tree	Vivien Didelot	2017-05-17	11	-35/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A dsa_switch_tree instance holds a dsa_switch pointer and a port index to identify the switch port to which the CPU is attached. Now that the DSA layer has a dsa_port structure to hold this data, use it to point the switch CPU port. This patch simply substitutes s/dst->cpu_switch/dst->cpu_dp->ds/ and s/dst->cpu_port/dst->cpu_dp->index/. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mlxsw-Preparations-for-restructuring'	David S. Miller	2017-05-17	7	-516/+529
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jiri Pirko says: ==================== mlxsw: Preparations for restructuring This patchset doesn't introduce any functional changes and merely meant to make the code base more receptive for upcoming restructuring. The first six patches mainly shuffle code in order to reduce the scope of structs that shouldn't be defined in the main driver header. Most of them will be later expanded, so it makes sense to correctly place them now. The last patches mostly simplify bridge-related functions, so that they could be more easily modified later on. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum: Default ports to non-virtual mode	Ido Schimmel	2017-05-17	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In virtual mode, packets are classified to FIDs based on their ingress port and VLAN whereas in non-virtual mode only the VLAN is taken into account. Currently ports are initialized to use virtual mode due to the presence of the PVID vPort. However, we're going to transition ports between both modes based on the FIDs they use and not merely based on the presence on a VLAN upper. Therefore, during initialization, no mode will be explicitly set. Since the Programmer's Reference Manual (PRM) doesn't specify a default, explicitly set the port to non-virtual mode and later transition the port between both modes based on the FIDs it uses. In a follow-up patchset, this step will be moved to the common FID core where it logically belongs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum: Move PVID code to appropriate place	Ido Schimmel	2017-05-17	3	-58/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PVID is a port attribute and should therefore reside in the main driver file and not the switchdev specific one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Don't batch learning operations	Ido Schimmel	2017-05-17	3	-20/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We no longer batch VLAN operations, so there's no need to set the learning state for a range of VLANs. Use a common function to set the learning state for a Port-VLAN, thereby making the code saner more receptive for upcoming changes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Don't batch STP operations	Ido Schimmel	2017-05-17	1	-42/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify the code by using the common function that sets an STP state for a Port-VLAN and remove the existing one that tries to batch it for several VLANs. This will help us in a follow-up patchset to introduce a unified infrastructure for bridge ports, regardless if the bridge is VLAN-aware or not. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Don't batch VLAN operations	Ido Schimmel	2017-05-17	3	-139/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	switchdev's VLAN object has the ability to describe a range of VLAN IDs, but this is only used when VLAN operations are done using the SELF flag, which is something we would like to remove as it allows one to bypass the bridge driver. Do VLAN operations on a per-VLAN basis, thereby simplifying the code and preparing it for refactoring in a follow-up patchset. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Remove redundant check	Ido Schimmel	2017-05-17	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 97c242902c20 ("switchdev: Execute bridge ndos only for bridge ports") switchdev code checks that port is bridged, so no need to perform the same check in the driver. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Initialize RIFs in a separate function	Ido Schimmel	2017-05-17	1	-18/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The router interfaces (RIFs) array is currently initialized together with the general router configuration. However, in a follow-up patchset we're going to introduce a common RIF core that will require us to initialize more RIF constructs, so move the RIF initialization to its own function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Move FIB notification block to router struct	Ido Schimmel	2017-05-17	2	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FIB notification block logically belongs inside the router specific struct, so move it there. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Move RIFs array to its rightful place	Ido Schimmel	2017-05-17	4	-24/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The router interfaces (RIFs) array is of no interest to code outside the routing realm, so declare it inside the router specific struct instead of the chip-wide one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_switchdev: Reduce scope of bridge struct	Ido Schimmel	2017-05-17	4	-37/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some attributes in the global chip struct are only relevant for bridge operation, so encapsulate them in their own struct that isn't exposed to non-bridge code. This will also help us later, when we add more bridge-specific attributes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Reduce scope of router struct	Ido Schimmel	2017-05-17	2	-114/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a similar fashion to previous patch, the router structure ('mlxsw_sp_router') doesn't need to be accessible to anyone, but the router code located at spectrum_router.c Make this apparent and reduce its scope by defining it there. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_buffer: Reduce scope of shared buffer struct	Ido Schimmel	2017-05-17	2	-59/+68
\|/ \| \| \| \| \| \| \| \| \| \| \|	The shared buffer structure ('mlxsw_sp_sb') doesn't need to be accessible to anyone, but the shared buffer code located at spectrum_buffers.c Make this apparent and reduce its scope by defining it there. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cxgb4: add new T5 pci device id	Ganesh Goudar	2017-05-17	1	-0/+1
\| \| \| \| \|	Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cxgb4: reduce resource allocation in kdump kernel	Ganesh Goudar	2017-05-17	1	-6/+6
\| \| \| \| \| \| \| \|	When is_kdump_kernel() is true, reduce memory footprint of cxgb4 by using a single "Queue Set". Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	liquidio: use pcie_flr instead of duplicating it	Christoph Hellwig	2017-05-16	1	-16/+1
\| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: phy: Remove residual magic from PHY drivers	Andrew Lunn	2017-05-16	5	-38/+36
\| \| \| \| \| \| \| \| \|	commit fa8cddaf903c ("net phylib: Remove unnecessary condition check in phy") removed the only place where the PHY flag PHY_HAS_MAGICANEG was checked. But it left the flag being set in the drivers. Remove the flag. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bnx2x: Remove open coded carrier check	Leon Romanovsky	2017-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \|	There is inline function to test if carrier present, so it makes open-coded solution redundant. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: internal implementation for pacing	Eric Dumazet	2017-05-16	8	-5/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BBR congestion control depends on pacing, and pacing is currently handled by sch_fq packet scheduler for performance reasons, and also because implemening pacing with FQ was convenient to truly avoid bursts. However there are many cases where this packet scheduler constraint is not practical. - Many linux hosts are not focusing on handling thousands of TCP flows in the most efficient way. - Some routers use fq_codel or other AQM, but still would like to use BBR for the few TCP flows they initiate/terminate. This patch implements an automatic fallback to internal pacing. Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option. If sch_fq happens to be in the egress path, pacing is delegated to the qdisc, otherwise pacing is done by TCP itself. One advantage of pacing from TCP stack is to get more precise rtt estimations, and less work done from TX completion, since TCP Small queue limits are not generally hit. Setups with single TX queue but many cpus might even benefit from this. Note that unlike sch_fq, we do not take into account header sizes. Taking care of these headers would add additional complexity for no practical differences in behavior. Some performance numbers using 800 TCP_STREAM flows rate limited to ~48 Mbit per second on 40Gbit NIC. If MQ+pfifo_fast is used on the NIC : $ sar -n DEV 1 5 \| grep eth 14:48:44 eth0 725743.00 2932134.00 46776.76 4335184.68 0.00 0.00 1.00 14:48:45 eth0 725349.00 2932112.00 46751.86 4335158.90 0.00 0.00 0.00 14:48:46 eth0 725101.00 2931153.00 46735.07 4333748.63 0.00 0.00 0.00 14:48:47 eth0 725099.00 2931161.00 46735.11 4333760.44 0.00 0.00 1.00 14:48:48 eth0 725160.00 2931731.00 46738.88 4334606.07 0.00 0.00 0.00 Average: eth0 725290.40 2931658.20 46747.54 4334491.74 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 0 259825920 45644 2708324 0 0 21 2 247 98 0 0 100 0 0 4 0 0 259823744 45644 2708356 0 0 0 0 2400825 159843 0 19 81 0 0 0 0 0 259824208 45644 2708072 0 0 0 0 2407351 159929 0 19 81 0 0 1 0 0 259824592 45644 2708128 0 0 0 0 2405183 160386 0 19 80 0 0 1 0 0 259824272 45644 2707868 0 0 0 32 2396361 158037 0 19 81 0 0 Now use MQ+FQ : lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc lpaa23:~# tc qdisc replace dev eth0 root mq $ sar -n DEV 1 5 \| grep eth 14:49:57 eth0 678614.00 2727930.00 43739.13 4033279.14 0.00 0.00 0.00 14:49:58 eth0 677620.00 2723971.00 43674.69 4027429.62 0.00 0.00 1.00 14:49:59 eth0 676396.00 2719050.00 43596.83 4020125.02 0.00 0.00 0.00 14:50:00 eth0 675197.00 2714173.00 43518.62 4012938.90 0.00 0.00 1.00 14:50:01 eth0 676388.00 2719063.00 43595.47 4020171.64 0.00 0.00 0.00 Average: eth0 676843.00 2720837.40 43624.95 4022788.86 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 259832240 46008 2710912 0 0 21 2 223 192 0 1 99 0 0 1 0 0 259832896 46008 2710744 0 0 0 0 1702206 198078 0 17 82 0 0 0 0 0 259830272 46008 2710596 0 0 0 0 1696340 197756 1 17 83 0 0 4 0 0 259829168 46024 2710584 0 0 16 0 1688472 197158 1 17 82 0 0 3 0 0 259830224 46024 2710408 0 0 0 0 1692450 197212 0 18 82 0 0 As expected, number of interrupts per second is very different. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'udp-scalability-improvements'	David S. Miller	2017-05-16	8	-69/+211
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Paolo Abeni says: ==================== udp: scalability improvements This patch series implement an idea suggested by Eric Dumazet to reduce the contention of the udp sk_receive_queue lock when the socket is under flood. An ancillary queue is added to the udp socket, and the socket always tries first to read packets from such queue. If it's empty, we splice the content from sk_receive_queue into the ancillary queue. The first patch introduces some helpers to keep the udp code small, and the following two implement the ancillary queue strategy. The code is split to hopefully help the reviewing process. The measured overall gain under udp flood is up to the 30% depending on the numa layout and the number of ingress queue used by the relevant nic. The performance numbers have been gathered using pktgen as sender, with 64 bytes packets, random src port on a host b2b connected via a 10Gbs link with the dut. The receiver used the udp_sink program by Jesper [1] and an h/w l4 rx hash on the ingress nic, so that the number of ingress nic rx queues hit by the udp traffic could be controlled via ethtool -L. The udp_sink program was bound to the first idle cpu, to get more stable numbers. On a single numa node receiver: nic rx queues vanilla patched kernel 1 1820 kpps 1900 kpps 2 1950 kpps 2500 kpps 16 1670 kpps 2120 kpps When using a single nic rx queue, busy polling was also enabled, elsewhere, in the above scenario, the bh processing becomes the bottle-neck and this produces large artifacts in the measured performances (e.g. improving the udp sink run time, decreases the overall tput, since more action from the scheduler comes into play). [1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c v1 -> v2: Patches 1/3 and 2/3 are unchanged, in patch 3/3 the rx_queue_lock_held param of udp_rmem_release() is now a bool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	udp: keep the sk_receive_queue held when splicing	Paolo Abeni	2017-05-16	1	-10/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On packet reception, when we are forced to splice the sk_receive_queue, we can keep the related lock held, so that we can avoid re-acquiring it, if fwd memory scheduling is required. v1 -> v2: the rx_queue_lock_held param in udp_rmem_release() is now a bool Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	udp: use a separate rx queue for packet reception	Paolo Abeni	2017-05-16	5	-24/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	under udp flood the sk_receive_queue spinlock is heavily contended. This patch try to reduce the contention on such lock adding a second receive queue to the udp sockets; recvmsg() looks first in such queue and, only if empty, tries to fetch the data from sk_receive_queue. The latter is spliced into the newly added queue every time the receive path has to acquire the sk_receive_queue lock. The accounting of forward allocated memory is still protected with the sk_receive_queue lock, so udp_rmem_release() needs to acquire both locks when the forward deficit is flushed. On specific scenarios we can end up acquiring and releasing the sk_receive_queue lock multiple times; that will be covered by the next patch Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/sock: factor out dequeue/peek with offset code	Paolo Abeni	2017-05-16	3	-41/+60
\|/ \| \| \| \| \| \| \| \| \|	And update __sk_queue_drop_skb() to work on the specified queue. This will help the udp protocol to use an additional private rx queue in a later patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'nfp-LSO-checksum-and-XDP-datapath-updates'	David S. Miller	2017-05-16	4	-79/+149
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jakub Kicinski says: ==================== nfp: LSO, checksum and XDP datapath updates This series introduces a number of refinements to standard features like LSO and checksum offload. Three major features are support for CHECKSUM_COMPLETE, refinement of TSO handling and another small speed up for XDP TX. This series also switches from depending on some app FW<>driver ABI versions to heavier use of capabilities. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: eliminate an if statement in calculation of completed frames	Jakub Kicinski	2017-05-16	1	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Given that our rings are always a power of 2, we can simplify the calculation of number of completed TX descriptors by using masking instead of if statement based on whether the index have wrapped or not. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: add a helper for wrapping descriptor index	Jakub Kicinski	2017-05-16	2	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a number of places where we calculate the descriptor index based on a value which may have overflown. Create a macro for masking with the ring size. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: complete the XDP TX ring only when it's full	Jakub Kicinski	2017-05-16	2	-18/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since XDP TX ring holds "spare" RX buffers anyway, we don't have to rush the completion. We can wait until ring fills up completely before trying to reclaim buffers. If RX poll has ended an no buffer has been queued for XDP TX we have no guarantee we will see another interrupt, so run the reclaim there as well, to make sure TX statistics won't become stale. This should help us reclaim more buffers per single queue controller register read. Note that the XDP completion is very trivial, it only adds up the sizes of transmitted frames for statistics so the latency spike should be acceptable. In case user sets the ring sizes to something crazy, limit the completion to 2k entries. The check if the ring is empty at the beginning of xdp_complete() is no longer needed - the callers will perform it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: add CHECKSUM_COMPLETE support	Jakub Kicinski	2017-05-16	3	-10/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce NFP_NET_CFG_CTRL_CSUM_COMPLETE capability and implement parsing of CHECKSUM_COMPLETE metadata. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: version independent support for chained RSS metadata	Edwin Peer	2017-05-16	3	-14/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ABI version 4 introduced metadata chaining. Using the ABI version to signal metadata chaining precludes firmware that advertises new capabilities which rely on prepended metadata from working on older kernels. Capability bits are thus better suited to signalling the chained metadata format. A new version of the RSS capability is introduced to distinguish between the differing metadata formats for ABI versions other than 4. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: don't assume RSS and IRQ moderation are always enabled	Jakub Kicinski	2017-05-16	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if capability for RSS and IRQ moderation are present we may have not initialized them for control vNIC. Depend on selected features mask (ctrl) rather than capabilities (cap) to determine which features should be enabled. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: support LSO2 capability	Edwin Peer	2017-05-16	3	-16/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Firmware advertising the LSO2 capability exploits driver provided L3 and L4 offsets in order to avoid parsing packet headers in the TX path. The vlan field in struct nfp_net_tx_desc is repurposed, making TXVLAN a mutually exclusive configuration to LSO2. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: rename l4_offset in struct nfp_net_tx_desc to lso_hdrlen	Edwin Peer	2017-05-16	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The l4_offset field referred to by NFD is confusingly named. It is not the offset of the L4 transport header, but rather the L4 payload. The LSO2 capability supported by alternative device firmware requires the actual L4 offset, thus the rename seems prudent. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: don't enable TSO on the device when disabled	Jakub Kicinski	2017-05-16	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \|	We advertise TSO to the stack but leave it disabled by default. Make sure it's not only disabled in the netdev features but also on the device itself. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: socket: mark socket protocol handler structs as const	linzhang	2017-05-16	4	-4/+4
\| \| \| \| \|	Signed-off-by: linzhang <xiaolou4617@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tools: hv: Add clean up for included files in Ubuntu net config	Haiyang Zhang	2017-05-16	1	-3/+18
\| \| \| \| \| \| \| \|	The clean up function is updated to cover duplicate config info in files included by "source" key word in Ubuntu network config. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bnxt: add dma mapping attributes	Shannon Nelson	2017-05-16	1	-22/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute in our Rx path dma mapping in order to get the expected performance out of the receive path. Adding it to the Tx path has little effect, so that's not a part of this patch. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'xgene-Add-ethtool-stats-and-bug-fixes'	David S. Miller	2017-05-16	10	-302/+428
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Iyappan Subramanian says: ==================== drivers: net: xgene: Add ethtool stats and bug fixes This patch set, - adds ethtool extended statistics support - addresses errata workarounds - fixes bugs related to statistics v2: Address review comments from v1 - Adds lock to protect mdio-xgene indirect MAC access - Refactors xgene-enet indirect MAC read/write functions - Uses mdio-xgene MAC access routines, if xgene-enet port use the same HW. v1: - Initial version Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Fix redundant prefetch buffer cleanup	Iyappan Subramanian	2017-05-16	4	-62/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prefetch buffer cleanup code was called twice, causing EDAC to report errors during reboot. [ 1130.972475] xgene-edac 78800000.edac: IOB bridge agent (BA) transaction error [ 1130.979584] xgene-edac 78800000.edac: IOB BA write response error [ 1130.985648] xgene-edac 78800000.edac: IOB BA write access at 0x00.00000000 () [ 1130.993612] xgene-edac 78800000.edac: IOB BA requestor ID 0x00002400 [ 1131.000242] xgene-edac 78800000.edac: IOB bridge agent (BA) transaction error ... This patch fixes the errors by, - removing the redundant prefetch buffer cleanup from port_ops->shutdown() - moving port_ops->shutdown() after delete_rings() Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15	Quan Nguyen	2017-05-16	3	-9/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds workaround for HW errata 10GE_10 and ENET_15: "HW statistic counters value are duplicated". - RFCS duplicates RALN counter - RFLR duplicates RUND counter - TFCS duplicates TFRG counter - RALN should be intepreted as 0 in 10G mode Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Add frame recovered statistics counter for errata ↵	Quan Nguyen	2017-05-16	3	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	10GE_8/ENET_11 This patch adds statistic counter for frames recovered from HW errata 10GE_8 and ENET_11: "HW reports Length error for valid 64 byte frames with len <46 bytes". Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Workaround for HW errata 10GE_4	Quan Nguyen	2017-05-16	3	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds workaround for HW errata 10GE_4: "XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read". Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Add rx_overrun/tx_underrun statistics	Iyappan Subramanian	2017-05-16	7	-3/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds rx_overrun and tx_underrun ethtool statistic counters. Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Extend ethtool statistics	Quan Nguyen	2017-05-16	6	-1/+181
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds extended ethtool statistics support. Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>