linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	xen-netback: switch to threaded irq for control ring	Juergen Gross	2016-09-22	3	-49/+11
\| \| \| \| \| \| \| \| \|	Instead of open coding it use the threaded irq mechanism in xen-netback. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ethernet: mediatek: get out of potential invalid pointer access	Sean Wang	2016-09-22	1	-2/+4
\| \| \| \| \| \| \| \| \|	Potential dangerous invalid pointer might be accessed if the error happens when couple phy_device to net_device so cleanup the error path. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ethernet: mediatek: use [get\|set]_link_ksettings	Sean Wang	2016-09-22	1	-21/+12
\| \| \| \| \| \| \| \| \| \| \| \|	1) use new api [get\|set]_link_ksettings instead of [get\|set]_settings old ones. 2) dev->phydev is sure being ready before calling these callbacks, so removing all the sanity check if it is existing. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ethernet: mediatek: remove superfluous local variable for phy address	Sean Wang	2016-09-22	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \|	remove the unused variable for parsing PHY address and the related logic for sanity test which would be all already handled done when of_mdiobus_register was called Reported-by: Nelson Chang <nelson.chang@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ethernet: mediatek: use phydev from struct net_device	Sean Wang	2016-09-22	2	-39/+36
\| \| \| \| \| \| \| \|	reuse phydev already in struct net_device instead of creating another new one in private structure. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mediatek-trgmii'	David S. Miller	2016-09-22	4	-3/+73
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sean Wang says: ==================== mediatek: add support for RGMII on GMAC0 through TRGMII hardware module By default, GMAC0 is connected to built-in switch called MT7530 through the proprietary interface called Turbo RGMII (TRGMII). TRGMII also supports well for RGMII as generic external PHY uses but requires some slight changes to the setup of TRGMII and doesn't have well support on current driver. So this patchset 1) provides the slight changes of the setup for RGMII can work through TRGMII 2) adds additional setting "trgmii" as PHY_INTERFACE_MODE_TRGMII about phy-mode on device tree to make GMAC0 distinguish which mode it runs 3) changes dynamically source clock, TX/RX delay and interface mode on TRGMII for adapting various link Changes since v1: - fixed the style of comment which doesn't have a space at the beginning and end of comment lines - add support for phy-mode "trgmii" as PHY_INTERFACE_MODE_TRGMII into linux/phy.h - enhance the Documentation about device tree binding for trgmii which is applicable only for GMAC0 which uses fixed-link ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: ethernet: mediatek: add the dts property to set if TRGMII supported on ↵	Sean Wang	2016-09-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GMAC0 Add the dts property for the capability if TRGMII supported on GAMC0 Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: ethernet: mediatek: add support for GMAC0 connecting with external PHY ↵	Sean Wang	2016-09-22	2	-2/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	through TRGMII Changing dynamically source clock, TX/RX delay and interface mode used by TRGMII hardware module inside PHY capability polling routine for adapting to the various speed of RGMII used by external PHY for GMAC0. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: ethernet: mediatek: add extension of phy-mode for TRGMII	Sean Wang	2016-09-22	3	-0/+8
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	adds PHY-mode "trgmii" as an extension for the operation mode of the PHY interface for PHY_INTERFACE_MODE_TRGMII. and adds a variable trgmii inside mtk_mac as the indication to make the difference between the MAC connected to internal switch or connected to external PHY by the given configuration on the board and then to perform the corresponding setup on TRGMII hardware module. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge tag 'rxrpc-rewrite-20160922-v2' of ↵	David S. Miller	2016-09-22	13	-130/+390
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Preparation for slow-start algorithm [ver #2] Here are some patches that prepare for improvements in ACK generation and for the implementation of the slow-start part of the protocol: (1) Stop storing the protocol header in the Tx socket buffers, but rather generate it on the fly. This potentially saves a little space and makes it easier to alter the header just before transmission (the flags may get altered and the serial number has to be changed). (2) Mask off the Tx buffer annotations and add a flag to record which ones have already been resent. (3) Track RTT on a per-peer basis for use in future changes. Tracepoints are added to log this. (4) Send PING ACKs in response to incoming calls to elicit a PING-RESPONSE ACK from which RTT data can be calculated. The response also carries other useful information. (5) Expedite PING-RESPONSE ACK generation from sendmsg. If we're actively using sendmsg, this allows us, under some circumstances, to avoid having to rely on the background work item to run to generate this ACK. This requires ktime_sub_ms() to be added. (6) Set the REQUEST-ACK flag on some DATA packets to elicit ACK-REQUESTED ACKs from which RTT data can be calculated. (7) Limit the use of pings and ACK requests for RTT determination. Changes: (V2) Don't use the C division operator for 64-bit division. One instance should use do_div() and the other should be using nsecs_to_jiffies(). The last two patches got transposed, leading to an undefined symbol in one of them. Reported-by: kbuild test robot <lkp@intel.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	rxrpc: Reduce the number of PING ACKs sent	David Howells	2016-09-22	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't want to send a PING ACK for every new incoming call as that just adds to the network traffic. Instead, we send a PING ACK to the first three that we receive and then once per second thereafter. This could probably be made adjustable in future. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Reduce the number of ACK-Requests sent	David Howells	2016-09-22	4	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce the number of ACK-Requests we set on DATA packets that we're sending to reduce network traffic. We set the flag on odd-numbered DATA packets to start off the RTT cache until we have at least three entries in it and then probe once per second thereafter to keep it topped up. This could be made tunable in future. Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA packets as we transmit them and not stored statically in the sk_buff. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Obtain RTT data by requesting ACKs on DATA packets	David Howells	2016-09-22	7	-20/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to sending a PING ACK to gain RTT data, we can set the RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The ACK packet contains the serial number of the packet it is in response to, so we can look through the Tx buffer for a matching DATA packet. This requires that the data packets be stamped with the time of transmission as a ktime rather than having the resend_at time in jiffies. This further requires the resend code to do the resend determination in ktimes and convert to jiffies to set the timer. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Add ktime_sub_ms()	David Howells	2016-09-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a ktime_sub_ms() to go with ktime_add_ms() and co. for use in AF_RXRPC RTT determination. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Expedite ping response transmission	David Howells	2016-09-22	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expedite the transmission of a response to a PING ACK by sending it from sendmsg if one is pending. We're most likely to see a PING ACK during the client call Tx phase as the other side may use it to determine a number of parameters, such as the client's receive window size, the RTT and whether the client is doing slow start (similar to RFC5681). If we don't expedite it, it's left to the background processing thread to transmit. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Send pings to get RTT data	David Howells	2016-09-22	4	-8/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Send a PING ACK packet to the peer when we get a new incoming call from a peer we don't have a record for. The PING RESPONSE ACK packet will tell us the following about the peer: (1) its receive window size (2) its MTU sizes (3) its support for jumbo DATA packets (4) if it supports slow start (similar to RFC 5681) (5) an estimate of the RTT This is necessary because the peer won't normally send us an ACK until it gets to the Rx phase and we send it a packet, but we would like to know some of this information before we start sending packets. A pair of tracepoints are added so that RTT determination can be observed. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Add per-peer RTT tracker	David Howells	2016-09-22	4	-4/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a function to track the average RTT for a peer. Sources of RTT data will be added in subsequent patches. The RTT data will be useful in the future for determining resend timeouts and for handling the slow-start part of the Rx protocol. Also add a pair of tracepoints, one to log transmissions to elicit a response for RTT purposes and one to log responses that contribute RTT data. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Add re-sent Tx annotation	David Howells	2016-09-22	3	-12/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a Tx-phase annotation for packet buffers to indicate that a buffer has already been retransmitted. This will be used by future congestion management. Re-retransmissions of a packet don't affect the congestion window managment in the same way as initial retransmissions. Signed-off-by: David Howells <dhowells@redhat.com>
\| *	rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs	David Howells	2016-09-22	6	-88/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't store the rxrpc protocol header in sk_buffs on the transmit queue, but rather generate it on the fly and pass it to kernel_sendmsg() as a separate iov. This reduces the amount of storage required. Note that the security header is still stored in the sk_buff as it may get encrypted along with the data (and doesn't change with each transmission). Signed-off-by: David Howells <dhowells@redhat.com>
* \|	Merge branch 'ftgmac100-ast2500-support'	David S. Miller	2016-09-22	2	-30/+75
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Joel Stanley says: ==================== ftgmac100 support for ast2500 This series adds support to the ftgmac100 driver for the Aspeed ast2400 and ast2500 SoCs. In particular, they ensure the driver works correctly on the ast2500 where the MAC block has seen some changes in register layout. They have been tested on ast2400 and ast2500 systems with the NCSI stack and with a directly attached PHY. V2 reworks the two patches relating to PHYSTS_CHG into the one patch that disables the interrupt instead of playing with interrupt sensitivity. I kept patch 4 'net/faraday: Clear stale interrupts' which was first introduced to clear the stale PHYSTS_CHG interrupt, as it helps keep us safe from unhygienic (vendor) bootloaders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/faraday: Mask out PHYSTS_CHG interrupt	Joel Stanley	2016-09-22	2	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PHYSTS_CHG (the ftgmac100's PHY IRQ) is telling the system to go look at the PHY registers for a link status change. The interrupt was causing issues on Aspeed SoC where some board designs had an active high configuration, some active low, and in some cases repurposed for other functions. When misconfigured Linux would chew 100% of CPU cycles servicing interrupts: [ 20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG [ 20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG [ 20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG [ 20.300000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG While in the ftgmac100 IP can be configured for high, low and edge sensitivity the current driver always polls the PHY, so we chose to mask out the interrupt. See https://patchwork.ozlabs.org/patch/672099/ for more discussion. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/faraday: Configure old MDIO interface on Aspeed SoCs	Joel Stanley	2016-09-22	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Aspeed SoCs have a new MDIO interface as an option in the G4 and G5 SoCs. The old one is still available, so select it in order to remain compatible with the ftgmac100 driver. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/faraday: Clear stale interrupts	Gavin Shan	2016-09-22	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is stale interrupt (PHYSTS_CHG in ISR, bit#6 in 0x0) from the bootloader (uboot) when enabling the MAC. The stale interrupts aren't part of kernel and should be cleared. This clears the stale interrupts in ISR (0x0) when enabling the MAC. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/faraday: Adapt for Aspeed SoCs	Joel Stanley	2016-09-22	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RXDES and TXDES registers bits in the ftgmac100 indicates EDO{R,T}R at bit position 15 for the Faraday Tech IP. However, the version of this IP present in the Aspeed SoCs has these bits at position 30 in the registers. It appers that ast2400 SoCs support both positions, with the 15th bit marked as reserved but still functional. In the ast2500 this bit is reused for another function, so we need a work around. This was confirmed with engineers from Aspeed that using bit 30 is correct for both the ast2400 and ast2500 SoCs. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/faraday: Make EDO{R,T}R bits configurable	Andrew Jeffery	2016-09-22	2	-16/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These bits are #defined at a fixed location. In order to support future hardware that has chosen to move these bits around move the bits into a member of the struct ftgmac100. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/faraday: Separate rx page storage from rxdesc	Andrew Jeffery	2016-09-22	1	-7/+18
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ftgmac100 hardware revision in e.g. the Aspeed AST2500 no longer reserves all bits in RXDES#2 but instead uses the bottom 16 bits to store MAC frame metadata. Avoid corruption by shifting struct page pointers out to their own member in struct ftgmac100. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	cxgb4: Convert to use simple_open()	Wei Yongjun	2016-09-22	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove an open coded simple_open() function and replace file operations references to the function with simple_open() instead. Generated by: scripts/coccinelle/api/simple_open.cocci Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: dsa: qca8k: use mdio_module_driver to simplify the code	Wei Yongjun	2016-09-22	1	-13/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mdio_module_driver() makes the code simpler by eliminating boilerplate code. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: dsa: qca8k: fix non static symbol warning	Wei Yongjun	2016-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes the following sparse warning: drivers/net/dsa/qca8k.c:259:22: warning: symbol 'qca8k_regmap_config' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'sctp-align'	David S. Miller	2016-09-22	11	-45/+46
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Marcelo Ricardo Leitner says: ==================== Rename WORD_TRUNC/ROUND macros and use them This patchset aims to rename these macros to a non-confusing name, as reported by David Laight and David Miller, and to update all remaining places to make use of it, which was 1 last remaining spot. v3: - Name it SCTP_PAD4 instead of SCTP_ALIGN4, as suggested by David Laight v2: - fixed 2nd patch summary Details on the specific changelogs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	sctp: make use of SCTP_TRUNC4 macro	Marcelo Ricardo Leitner	2016-09-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And avoid the usage of '&~3'. This is the last place still not using the macro. Also break the line to make it easier to read. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	sctp: rename WORD_TRUNC/ROUND macros	Marcelo Ricardo Leitner	2016-09-22	11	-42/+42
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To something more meaningful these days, specially because this is working on packet headers or lengths and which are not tied to any CPU arch but to the protocol itself. So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4. Reported-by: David Laight <David.Laight@ACULAB.COM> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'mlx5e-xdp'	David S. Miller	2016-09-22	6	-263/+757
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tariq Toukan says: ==================== mlx5e XDP support This series adds XDP support in mlx5e driver. This includes the use cases: XDP_DROP, XDP_PASS, and XDP_TX. Single stream performance tests show 16.5 Mpps for XDP_DROP, and 12.4 Mpps for XDP_TX, with nice scalability for multiple streams/rings. This rate of XDP_DROP is lower than the 32 Mpps we got in previous implementation, when Striding RQ was used. We moved to non-Striding RQ, as some XDP_TX requirements (like headroom, packet-per-page) cannot be satisfied with the current Striding RQ HW, and we decided to fully support both DROP/TX. Few directions are considered in order to enable the faster rate for XDP_DROP, e.g a possibility for users to enable Striding RQ so they choose optimized XDP_DROP on the price of partial XDP_TX functionality, or some HW changes. Series generated against net-next commit: cf714ac147e0 'ipvlan: Fix dependency issue' Thanks, Tariq V2: * patch 8: - when XDP_TX fails, call mlx5e_page_release and drop the packet. - update xdp_tx counter within mlx5e_xmit_xdp_frame. (mlx5e_xmit_xdp_frame return value becomes obsolete, change it to void) - drop the packet for unknown XDP return code. * patch 9: - use a boolean for xdp_doorbell in SQ struct, instead of dragging it throughout the functions calls. - handle doorbell and counters within mlx5e_xmit_xdp_frame. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: XDP TX xmit more	Saeed Mahameed	2016-09-22	2	-8/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we rang XDP SQ doorbell on every forwarded XDP packet. Here we introduce a xmit more like mechanism that will queue up more than one packet into SQ (up to RX napi budget) w/o notifying the hardware. Once RX napi budget is consumed and we exit napi RX loop, we will flush (doorbell) all XDP looped packets in case there are such. XDP forward packet rate: Comparing XDP with and w/o xmit more (bulk transmit): RX Cores XDP TX XDP TX (xmit more) --------------------------------------------------- 1 6.5Mpps 12.4Mpps 2 13.2Mpps 24.2Mpps 4 25.2Mpps 36.3Mpps* 8 36.3Mpps* 36.3Mpps* *My xmitter was limited to 36.3Mpps, so it is the bottleneck. It seems that receive side can handle more. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: XDP TX forwarding support	Saeed Mahameed	2016-09-22	6	-36/+304
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding support for XDP_TX forwarding from xdp program. Using XDP, now user can loop packets out of the same port. We create a dedicated TX SQ for each channel that will serve XDP programs that return XDP_TX action to loop packets back to the wire directly from the channel RQ RX path. For that RX pages will now need to be mapped bi-directionally, and on XDP_TX action we will sync the page back to device then queue it into SQ for transmission. The XDP xmit frame function will report back to the RX path if the page was consumed (transmitted), if so, RX path will forget about that page as if it were released to the stack. Later on, on XDP TX completion, the page will be released back to the page cache. For simplicity this patch will hit a doorbell on every XDP TX packet. Next patch will introduce a xmit more like mechanism that will queue up more than one packet into SQ w/o notifying the hardware, once RX napi loop is done we will hit doorbell once for all XDP TX packets form the previous loop. This should drastically improve XDP TX performance. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: Have a clear separation between different SQ types	Saeed Mahameed	2016-09-22	5	-64/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make a clear separate between Regular SQ (TXQ) and ICO SQ creation, destruction and union their mutual information structures. Don't allocate redundant TXQ skb/wqe_info/dma_fifo arrays for ICO SQ. And have a different SQ edge for ICO SQ than TXQ SQ, to be more accurate. In preparation for XDP TX support. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: XDP fast RX drop bpf programs support	Rana Shahout	2016-09-22	4	-2/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver. When XDP is on we make sure to change channels RQs type to MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to ensure "page per packet". On XDP set, we fail if HW LRO is set and request from user to turn it off. Since on ConnectX4-LX HW LRO is always on by default, this will be annoying, but we prefer not to enforce LRO off from XDP set function. Full channels reset (close/open) is required only when setting XDP on/off. When XDP set is called just to exchange programs, we will update each RQ xdp program on the fly and for synchronization with current data path RX activity of that RQ, we temporally disable that RQ and ensure RX path is not running, quickly update and re-enable that RQ, for that we do: - rq.state = disabled - napi_synnchronize - xchg(rq->xdp_prg) - rq.state = enabled - napi_schedule // Just in case we've missed an IRQ Packet rate performance testing was done with pktgen 64B packets and on TX side and, TC drop action on RX side compared to XDP fast drop. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Comparison is done between: 1. Baseline, Before this patch with TC drop action 2. This patch with TC drop action 3. This patch with XDP RX fast drop RX Cores Baseline(TC drop) TC drop XDP fast Drop -------------------------------------------------------------- 1 5.3Mpps 5.3Mpps 16.5Mpps 2 10.2Mpps 10.2Mpps 31.3Mpps 4 20.5Mpps 19.9Mpps 36.3Mpps* *My xmitter was limited to 36.3Mpps, so it is the bottleneck. It seems that receive side can handle more. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: Dynamic RQ type infrastructure	Saeed Mahameed	2016-09-22	1	-42/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add two helper functions to allow dynamic changes of RQ type. mlx5e_set_rq_priv_params and mlx5e_set_rq_type_params will be used on netdev creation to determine the default RQ type. This will be needed later for downstream patches of XDP support. When enabling XDP we will dynamically move from striding RQ to linked list RQ type. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: Slightly reduce hardware LRO size	Saeed Mahameed	2016-09-22	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch LRO size was 64K, now with build_skb requires extra room, headroom + sizeof(skb_shared_info) added to the data buffer will make wqe size or page_frag_size slightly larger than 64K which will demand order 5 page instead of order 4 in 4K page systems. We take those extra bytes from hardware LRO data size in order to not increase the required page order for when hardware LRO is enabled. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: Union RQ RX info per RQ type	Saeed Mahameed	2016-09-22	3	-26/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have two types of RX RQs, and they use two separate sets of info arrays and structures in RX data path function. Today those structures are mutually exclusive per RQ type, hence one kind is allocated on RQ creation according to the RQ type. For better cache locality and to minimalize the sizeof(struct mlx5e_rq), in this patch we define them as a union. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/mlx5e: Build RX SKB on demand	Saeed Mahameed	2016-09-22	3	-123/+133
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For non-striding RQ configuration before this patch we had a ring with pre-allocated SKBs and mapped the SKB->data buffers for device. For robustness and better RX data buffers management, we allocate a page per packet and build_skb around it. This patch (which is a prerequisite for XDP) will actually reduce performance for normal stack usage, because we are now hitting a bottleneck in the page allocator. We use the page-cache to restore or even improve performance in comparison to the old RX scheme. Packet rate performance testing was done with pktgen 64B packets on xmit side and TC ingress dropping action on RX side. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Comparison is done between: 1.Baseline, before 'net/mlx5e: Build RX SKB on demand' 2.Build SKB with RX page cache (This patch) RX Cores Baseline Build SKB+page-cache Improvement ----------------------------------------------------------- 1 4.16Mpps 5.33Mpps 28% 2 7.16Mpps 10.24Mpps 43% 4 13.61Mpps 20.51Mpps 51% 8 25.32Mpps 32.00Mpps 26% All respective cores were 100% utilized. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tcp: implement TSQ for retransmits	Eric Dumazet	2016-09-22	1	-25/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We saw sch_fq drops caused by the per flow limit of 100 packets and TCP when dealing with large cwnd and bursts of retransmits. Even after increasing the limit to 1000, and even after commit 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time"), we can still have these drops. Under certain conditions, TCP can spend a considerable amount of time queuing thousands of skbs in a single tcp_xmit_retransmit_queue() invocation, incurring latency spikes and stalls of other softirq handlers. This patch implements TSQ for retransmits, limiting number of packets and giving more chance for scheduling packets in both ways. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'mv88e6390-prep'	David S. Miller	2016-09-22	2	-220/+213
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrew Lunn says: ==================== Preparation for mv88e6390 These two patches are a couple of preparation steps for supporting the the MV88E6390 family of chips. This is a new generation from Marvell, and will need more feature flags than are currently available in an unsigned long. Expand to an unsigned long long. The MV88E6390 also places its port registers somewhere else, so add a wrapper around port register access. v2: Rework wrappers to use mv88e6xxx_{read\|write} Simpliy some (err < ) to (err) Add Reviewed by tag. v3:: reg = reg & foo -> reg &= foo Fix over zealous s/ret/err ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: mv88e6xxx: Convert flag bits to unsigned long long	Andrew Lunn	2016-09-22	1	-31/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are soon going to run out of flag bits on 32bit systems. Convert to unsigned long long. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: mv88e6xxx: Add helper for accessing port registers	Andrew Lunn	2016-09-22	2	-189/+182
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a device coming soon which places its port registers somewhere different to all other Marvell switches supported so far. Add helper functions for reading/writing port registers, making it easier to handle this new device. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ptp_clock: future-proofing drivers against PTP subsystem becoming optional	Nicolas Pitre	2016-09-22	9	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Drivers must be ready to accept NULL from ptp_clock_register() if the PTP clock subsystem is configured out. This patch documents that and ensures that all drivers cope well with a NULL return. Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Eugenia Emantayev <eugenia@mellanox.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: ethernet: hisilicon: hns: use new api ethtool_{get\|set}_link_ksettings	Philippe Reynes	2016-09-22	1	-47/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: ethernet: hisilicon: hns: use phydev from struct net_device	Philippe Reynes	2016-09-22	3	-33/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phydev in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: ethernet: mediatek: fix missing changes merged for conflicts ↵	Sean Wang	2016-09-22	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	overlapping commits add the missing commits about 1) Commit d3bd1ce4db8e843dce421e2f8f123e5251a9c7d3 ("remove redundant free_irq for devm_request_ir allocated irq") 2) Commit 7c6b0d76fa02213393815e3b6d5e4a415bf3f0e2 ("fix logic unbalance between probe and remove") during merge for conflicts overlapping commits by Commit b20b378d49926b82c0a131492fa8842156e0e8a9 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'cxgb4-tc-offload'	David S. Miller	2016-09-22	9	-284/+1720
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rahul Lakkireddy says: ==================== cxgb4: add support for offloading TC u32 filters This series of patches add support to offload TC u32 filters onto Chelsio NICs. Patch 1 moves current common filter code to separate files in order to provide a common api for performing packet classification and filtering in Chelsio NICs. Patch 2 enables filters for normal NIC configuration and implements common api for setting and deleting filters. Patches 3-5 add support for TC u32 offload via ndo_setup_tc. --- v3: Based on review and suggestion from David Miller <davem@davemloft.net> - Fixed all local variable declarations by placing them in longest line first and shortest line last order. v2: Based on review and suggestions from Jiri Pirko <jiri@resnulli.us>: - Replaced macros S and U with appropriate static helper functions. - Moved completion code for set and delete filters to respective functions cxgb4_set_filter() and cxgb4_del_filter(). Renamed the original functions to __cxgb4_set_filter() and __cxgb4_del_filter() in case synchronization is not required. - Dropped debugfs patch. - Merged code for inserting and deleting u32 filters into a single patch. - Reworked and fixed bugs with traversing the actions list. - Removed all unnecessary extra (). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>