linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	batman-adv: Make TT capability changes atomic	Linus Lüssing	2015-08-14	2	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: e17931d1a61d ("batman-adv: introduce capability initialization bitfield") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
*	batman-adv: Make NC capability changes atomic	Linus Lüssing	2015-08-14	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 3f4841ffb336 ("batman-adv: tvlv - add network coding container") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
*	batman-adv: Make DAT capability changes atomic	Linus Lüssing	2015-08-14	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 17cf0ea455f1 ("batman-adv: tvlv - add distributed arp table container") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
*	batman-adv: Avoid u32 overflow during gateway select	Ruben Wisniewski	2015-08-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The gateway selection based on fast connections is using a single value calculated from the average tq (0-255) and the download bandwidth (in 100Kibit). The formula for the first step (tq ** 2 * 10000 * bandwidth) tends to overflow a u32 with low bandwidth settings like 50 [100KiBit] and a tq value of over 92. Changing this to a 64 bit unsigned integer allows to support a bandwidth_down with up to ~2.8e10 [100KiBit] and a perfect tq of 255. This is ~6.6 times higher than the maximum possible value of the gateway announcement TVLV. This problem only affects the non-default gw_sel_class 1. Signed-off-by: Ruben Wisniewsi <ruben@vfn-nrw.de> [sven@narfation.org: rewritten commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
*	batman-adv: Replace gw_reselect divisor with simple shift	Sven Eckelmann	2015-08-11	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The gw_factor is divided by BATADV_TQ_LOCAL_WINDOW_SIZE ** 2 * 64. But the rest of the calculation has nothing to do with the tq window size and therefore the calculation is just (tmp_gw_factor / (64 ** 3)). Replace it with a simple shift to avoid a costly 64-bit divide when the max_gw_factor is changed from u32 to u64. This type change is necessary to avoid an overflow bug. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
*	vxlan: fix fdb_dump index calculation	Atzm Watanabe	2015-08-10	1	-5/+5
\| \| \| \| \| \| \| \| \|	When too many remotes are bound to an FDB entry, index may not be increased. This problem will be caused on the large scale environment that is based on the unicast default destination, for instance. Signed-off-by: Atzm Watanabe <atzm@iij.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mellanox: mlxsw: Use '%zx' to print size_t format	Fabio Estevam	2015-08-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Use '%zx' to print size_t format in order to fix the following build warning: drivers/net/ethernet/mellanox/mlxsw/item.h:65:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Wformat=] Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	isdn: Remove reverse_bits(), use revbit8()	yalin wang	2015-08-10	1	-17/+5
\| \| \| \| \| \| \| \|	This change isdn driver, remove reverse_bits() function, use the generic revbit8() function instead. Signed-off-by: yalin wang <yalin.wang2010@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cxgb4: cleanup some indenting	Dan Carpenter	2015-08-10	1	-3/+3
\| \| \| \| \| \| \|	Add or remove some tabs so that statements line up correctly. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	dsa: Support multiple MDIO busses	Andrew Lunn	2015-08-10	2	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	When using a cluster of switches, some topologies will have an MDIO bus per switch, not one for the whole cluster. Allow this to be represented in the device tree, by adding an optional mii-bus property at the switch level. The old platform_device method of instantiation supports this already, so only the device tree binding needs extending with an additional optional phandle. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: mv88e6352: Use mnemonics for EEPROM registers and bits	Andrew Lunn	2015-08-10	2	-10/+16
\| \| \| \| \| \| \| \|	Add register definitions #defines for accessing the EEPROM. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'ovs-gre'	David S. Miller	2015-08-10	13	-598/+503
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pravin B Shelar says: ==================== GRE: Use flow based tunneling for OVS GRE vport. Following patches make use of new Using GRE tunnel meta data collection feature. This allows us to directly use netdev based GRE tunnel implementation. While doing so I have removed GRE demux API which were targeted for OVS. Most of GRE protocol code is now consolidated in ip_gre module. v5-v4: Fixed Kconfig dependency for vport-gre module. v3-v4: Added interface to ip-gre device to enable meta data collection. While doing this I split second patch into two patches. v2-v3: Add API to create GRE flow based device. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gre: Remove support for sharing GRE protocol hook.	Pravin B Shelar	2015-08-10	3	-290/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for sharing GREPROTO_CISCO port was added so that OVS gre port and kernel GRE devices can co-exist. After flow-based tunneling patches OVS GRE protocol processing is completely moved to ip_gre module. so there is no need for GRE protocol hook. Following patch consolidates GRE protocol related functions into ip_gre module. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	openvswitch: Use regular GRE net_device instead of vport	Pravin B Shelar	2015-08-10	5	-260/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using GRE tunnel meta data collection feature, we can implement OVS GRE vport. This patch removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ip_gre: Add support to collect tunnel metadata.	Pravin B Shelar	2015-08-10	6	-28/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following patch create new tunnel flag which enable tunnel metadata collection on given device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	openvswitch: Move tunnel destroy function to oppenvswitch module.	Pravin B Shelar	2015-08-10	3	-20/+20
\|/ \| \| \| \| \| \| \|	This function will be used in gre and geneve vport implementations. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add explicit logging and stat for neighbour table overflow	Rick Jones	2015-08-10	3	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an explicit neighbour table overflow message (ratelimited) and statistic to make diagnosing neighbour table overflows tractable in the wild. Diagnosing a neighbour table overflow can be quite difficult in the wild because there is no explicit dmesg logged. Callers to neighbour code seem to use net_dbg_ratelimit when the neighbour call fails which means the "base message" is not emitted and the callback suppressed messages from the ratelimiting can end-up juxtaposed with unrelated messages. Further, a forced garbage collection will increment a stat on each call whether it was successful in freeing-up a table entry or not, so that statistic is only a hint. So, add a net_info_ratelimited message and explicit statistic to the neighbour code. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: netlink: add support for vlan_filtering attribute	Nikolay Aleksandrov	2015-08-10	4	-7/+33
\| \| \| \| \| \| \| \| \|	This patch adds the ability to toggle the vlan filtering support via netlink. Since we're already running with rtnl in .changelink() we don't need to take any additional locks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'qlcnic-enhancements'	David S. Miller	2015-08-10	12	-32/+102
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shahed Shaikh says: ==================== qlcnic: enhancements This series adds few enhancements. o Patch from Harish reorders the sequence of header files inclusion, keeping kernel's header files on top. o Firmware introduced a new feature which allows driver to increases the size of firmware dump of iSCSI function which is being collected by NIC driver. o Print buffer address which is holding a firmware dump. o Use vzalloc() instead kzalloc() for allocating large chunk of memory which will avoid potential memory allocation failure. o Add new device ID for 0x8C30 which is a 83xx series based VF function. Please apply this series to net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qlcnic: Update version to 5.3.63	Shahed Shaikh	2015-08-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory	Shahed Shaikh	2015-08-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Driver allocates a large chunk of temporary buffer using kzalloc to copy FW image. As there is no real need of this memory to be physically contiguous, use vzalloc instead. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qlcnic: Add new VF device ID 0x8C30	Shahed Shaikh	2015-08-10	2	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a 83xx series based VF device Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qlcnic: Print firmware minidump buffer and template header addresses	Shahed Shaikh	2015-08-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qlcnic: Add support to enable capability to extend minidump for iSCSI	Shahed Shaikh	2015-08-10	5	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases it is required to capture minidump for iSCSI functions as part of default minidump collection process. To enable this, firmware exports it's capability and driver need to enable that capability by issuing a mailbox command. With this feature, firmware can provide additional iSCSI function's minidump with smaller minidump capture mask (0x1f). Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qlcnic: Rearrange ordering of header files inclusion	Harish Patil	2015-08-10	10	-21/+22
\|/ \| \| \| \| \| \| \|	Include local headers files after kernel's header files. Signed-off-by: Harish Patil <harish.patil@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: phy: select copper mode when Marvel 88e1111 in SGMII	Madalin Bucur	2015-08-10	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	For the Marvel 88e1111 PHY only two SGMII modes are available, both allowing only SGMII to copper mode (with or without clock). SGMII to fiber mode is not supported. Make sure the fiber/copper registers selector bits are cleared for selecting copper mode. Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: fec: fix the race between xmit and bdp reclaiming path	Kevin Hao	2015-08-10	1	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we transmit a fragmented skb, we may run into a race like the following scenario (assume txq->cur_tx is next to txq->dirty_tx): cpu 0 cpu 1 fec_enet_txq_submit_skb reserve a bdp for the first fragment fec_enet_txq_submit_frag_skb update the bdp for the other fragment update txq->cur_tx fec_enet_tx_queue bdp = fec_enet_get_nextdesc(txq->dirty_tx, fep, queue_id); This bdp is the bdp reserved for the first segment. Given that this bdp BD_ENET_TX_READY bit is not set and txq->cur_tx is already pointed to a bdp beyond this one. We think this is a completed bdp and try to reclaim it. update the bdp for the first segment update txq->cur_tx So we shouldn't update the txq->cur_tx until all the update to the bdps used for fragments are performed. Also add the corresponding memory barrier to guarantee that the update to the bdps, dirty_tx and cur_tx performed in the proper order. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mlxsw-fixes'	David S. Miller	2015-08-09	7	-24/+135
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jiri Pirko says: ==================== mlxsw: Couple of fixes/adjustments Ido Schimmel (5): mlxsw: Call free_netdev when removing port mlxsw: Make system port to local port mapping explicit mlxsw: Simplify mlxsw_sx_port_xmit function mlxsw: Use correct skb length when dumping payload mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit Jiri Pirko (2): mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM mlxsw: Strip FCS from incoming packets ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit	Ido Schimmel	2015-08-09	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store the length of the skb before transmitting it and use it for stats instead of skb->len, since skb might have been freed already. This issue was discovered using the Kernel Address sanitizer (KASan). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Use correct skb length when dumping payload	Ido Schimmel	2015-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not use the length of the transmitted skb (which was freed), but that of the response skb. This issue was discovered using the Kernel Address sanitizer (KASan). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Simplify mlxsw_sx_port_xmit function	Ido Schimmel	2015-08-09	4	-20/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we only checked if the transmission queue is not full in the middle of the xmit function. This lead to complex logic due to the fact that sometimes we need to reallocate the headroom for our Tx header. Allow the switch driver to know if the transmission queue is not full before sending the packet and remove this complex logic. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Strip FCS from incoming packets	Jiri Pirko	2015-08-09	2	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FCS of incoming packets is already checked by HW. Just strip it out. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM	Jiri Pirko	2015-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This resolves compile errors on um-allyesconfig. Note that there are many other drivers which have the same issue. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Make system port to local port mapping explicit	Ido Schimmel	2015-08-09	2	-0/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	System ports are unique identifiers in a multi-ASIC environment that represent all the available ports in the system. Local ports on the other hand, are unique only within the local ASIC. Since system port to local port mapping is not part of the HW-SW contract and since only single-ASIC configurations are currently supported, set an explicit 1:1 mapping by configuring the Switch System Port Record (SSPR) register. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: Call free_netdev when removing port	Ido Schimmel	2015-08-09	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \|	When removing a port's netdevice we should also free the memory allocated by alloc_etherdev(). Do this by calling free_netdev() at the end of the teardown sequence. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ethernet: Fix double word "the the" in eth.c	Masanari Iida	2015-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fix double word "the the" in Documentation/DocBook/networking/API-eth-get-headlen.html Documentation/DocBook/networking/netdev.html Documentation/DocBook/networking.xml These files are generated from comment in source, so I have to fix comment in net/ethernet/eth.c. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: phy: add RealTek RTL8211DN phy id	Shaohui Xie	2015-08-09	1	-0/+14
\| \| \| \| \| \| \|	RTL8211DN is compatible with RTL8211E. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mpls: Enforce payload type of traffic sent using explicit NULL	Robert Shearman	2015-08-09	1	-27/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only label on the stack, then after popping the resulting packet must be treated as a IPv4 packet and forwarded based on the IPv4 header. The same is true for IPv6 Explicit NULL with an IPv6 packet following. Therefore, when installing the IPv4/IPv6 Explicit NULL label routes, add an attribute that specifies the expected payload type for use at forwarding time for determining the type of the encapsulated packet instead of inspecting the first nibble of the packet. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'bpf-perf'	David S. Miller	2015-08-09	14	-53/+373
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kaixu Xia says: ==================== bpf: Introduce the new ability of eBPF programs to access hardware PMU counter This patchset is base on the net-next: git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git commit 9dc20a649609c95ce7c5ac4282656ba627b67d49. Previous patch v6 url: https://lkml.org/lkml/2015/8/4/188 changes in V7: - rebase the whole patch set to net-next tree(9dc20a64); - split out the core perf APIs into Patch 1/5; - change the return value of function perf_event_attrs() from struct perf_event * to const struct perf_event * in Patch 1/5; - rename the function perf_event_read_internal() to perf_event_ read_local() and rewrite it in Patch 1/5; - rename the function check_func_limit() to check_map_func _compatibility() and remove the unnecessary pass pointer to a pointer in Patch 4/5; changes in V6: - make the Patch 1/4 commit message more meaning and readable; - remove the unnecessary comment in Patch 2/4 and make it clean; - declare the function perf_event_release_kernel() in include/ linux/perf_event.h to fix the build error when CONFIG_PERF_EVENTS isn't configured in Patch 2/4; - add function perf_event_attrs() to get the struct perf_event_attr in Patch 2/4. - move the related code from kernel/trace/bpf_trace.c to kernel/ events/core.c and add function perf_event_read_internal() to avoid poking inside of the event outside of perf code in Patch 3/4; - generial the func & map match-pair with an array in Patch 3/4; changes in V5: - move struct fd_array_map_ops* fd_ops to bpf_map; - move array perf event decrement refcnt function to map_free; - fix the NULL ptr of perf_event_get(); - move bpf_perf_event_read() to kernel/bpf/bpf_trace.c; - get rid of the remaining struct bpf_prog; - move the unnecessay cast on void *; changes in V4: - make the bpf_prog_array_map more generic; - fix the bug of event refcnt leak; - use more useful errno in bpf_perf_event_read(); changes in V3: - collapse V2 patches 1-3 into one; - drop the function map->ops->map_traverse_elem() and release the struct perf_event in map_free; - only allow to access bpf_perf_event_read() from programs; - update the perf_event_array_map elem via xchg(); - pass index directly to bpf_perf_event_read() instead of MAP_KEY; changes in V2: - put atomic_long_inc_not_zero() between fdget() and fdput(); - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE; - Only read the event counter on current CPU or on current process; - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the pointer to the struct perf_event; - according to the perf_event_map_fd and key, the function bpf_perf_event_read() can get the Hardware PMU counter value; Patch 5/5 is a simple example and shows how to use this new eBPF programs ability. The PMU counter data can be found in /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU value when 'kprobe/sys_write' sampling) $ cat /sys/kernel/debug/tracing/trace_pipe $ ./tracex6 ... syslog-ng-548 [000] d..1 76.905673: : CPU-0 681765271 syslog-ng-548 [000] d..1 76.905690: : CPU-0 681787855 syslog-ng-548 [000] d..1 76.905707: : CPU-0 681810504 syslog-ng-548 [000] d..1 76.905725: : CPU-0 681834771 syslog-ng-548 [000] d..1 76.905745: : CPU-0 681859519 syslog-ng-548 [000] d..1 76.905766: : CPU-0 681890419 syslog-ng-548 [000] d..1 76.905783: : CPU-0 681914045 syslog-ng-548 [000] d..1 76.905800: : CPU-0 681935950 syslog-ng-548 [000] d..1 76.905816: : CPU-0 681958299 ls-690 [005] d..1 82.241308: : CPU-5 3138451 sh-691 [004] d..1 82.244570: : CPU-4 7324988 <...>-699 [007] d..1 99.961387: : CPU-7 3194027 <...>-695 [003] d..1 99.961474: : CPU-3 288901 <...>-695 [003] d..1 99.961541: : CPU-3 383145 <...>-695 [003] d..1 99.961591: : CPU-3 450365 <...>-695 [003] d..1 99.961639: : CPU-3 515751 <...>-695 [003] d..1 99.961686: : CPU-3 579047 ... The detail of patches is as follow: Patch 1/5 add the necessary core perf APIs perf_event_attrs(), perf_event_get(),perf_event_read_local() when accessing events counters in eBPF programs Patch 2/5 rewrites part of the bpf_prog_array map code and make it more generic; Patch 3/5 introduces a new bpf map type. This map only stores the pointer to struct perf_event; Patch 4/5 implements function bpf_perf_event_read() that get the selected hardware PMU conuter; Patch 5/5 gives a simple example. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	samples/bpf: example of get selected PMU counter value	Kaixu Xia	2015-08-09	4	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a simple example and shows how to use the new ability to get the selected Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Implement function bpf_perf_event_read() that get the selected hardware ↵	Kaixu Xia	2015-08-09	4	-15/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PMU conuter According to the perf_event_map_fd and index, the function bpf_perf_event_read() can convert the corresponding map value to the pointer to struct perf_event and return the Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Add new bpf map type to store the pointer to struct perf_event	Kaixu Xia	2015-08-09	3	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'. This map only stores the pointer to struct perf_event. The user space event FDs from perf_event_open() syscall are converted to the pointer to struct perf_event and stored in map. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Make the bpf_prog_array_map more generic	Wang Nan	2015-08-09	5	-38/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All the map backends are of generic nature. In order to avoid adding much special code into the eBPF core, rewrite part of the bpf_prog_array map code and make it more generic. So the new perf_event_array map type can reuse most of code with bpf_prog_array map and add fewer lines of special code. Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	perf: add the necessary core perf APIs when accessing events counters in ↵	Kaixu Xia	2015-08-09	2	-0/+88
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	eBPF programs This patch add three core perf APIs: - perf_event_attrs(): export the struct perf_event_attr from struct perf_event; - perf_event_get(): get the struct perf_event from the given fd; - perf_event_read_local(): read the events counters active on the current CPU; These APIs are needed when accessing events counters in eBPF programs. The API perf_event_read_local() comes from Peter and I add the corresponding SOB. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mv88e6xxx-switchdev-fdb'	David S. Miller	2015-08-09	10	-197/+317
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Vivien Didelot says: ==================== net: dsa: mv88e6xxx: support switchdev FDB objects This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB objects. The first two patches add minor but necessary changes to switchdev, the third one implements the switchdev glue in DSA for FDB routines, and the remaining ones refactor the FDB access functions in the mv88e6xxx code. Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1): # bridge fdb add 3c:97:0e:11:30:6e dev swp2 # bridge fdb add 3c:97:0e:11:40:78 dev swp3 # bridge fdb add 3c:97:0e:11:50:86 dev swp4 # bridge fdb del 3c:97:0e:11:40:78 dev swp3 # bridge fdb 01:00:5e:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth1 self permanent 00:50:d2:10:78:15 dev swp0 master br0 permanent 3c:97:0e:11:30:6e dev swp2 self static 00:50:d2:10:78:15 dev swp3 master br1 permanent 3c:97:0e:11:50:86 dev swp4 self static # cat /sys/kernel/debug/dsa0/atu # DB T/P Vec State Addr # 001 Port 004 e 3c:97:0e:11:30:6e # 004 Port 010 e 3c:97:0e:11:50:86 For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged ports and bridge groups, and the remaining will be later used by VLANs. This change is necessary to welcome the support for hardware VLANs (which will follow soon). Changes in v2: - remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code - use ether_addr_copy instead of memcpy for MAC addresses - constify MAC address in port_fdb_{add,del} - split the mv88e6xxx code refactoring into several patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: rework FDB add/del operations	Vivien Didelot	2015-08-09	4	-42/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a low level function for the ATU Load operation, and provide FDB add and delete wrappers functions. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: rework FDB getnext operation	Vivien Didelot	2015-08-09	4	-27/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a low level _mv88e6xxx_atu_getnext function and helpers to rewrite the mv88e6xxx_port_fdb_getnext operation. A mv88e6xxx_atu_entry structure is added for convenient access to the hardware, and GLOBAL_ATU_FID is defined instead of the raw 0x01 value. The previous implementation did not handle the eventual trunk mapping. If the related bit is set, then the ATU data register would contain the trunk ID, and not the port vector. Check this in the FDB getnext operation and do not handle it (yet). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: rename ATU MAC accessors	Vivien Didelot	2015-08-09	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename the __mv88e6xxx_{read,write}_addr functions to more explicit _mv88e6xxx_atu_mac_{read,write} functions, which also respect the single underscore convention used in the file (meaning SMI lock must be held). In the meantime, define their MAC address parameters as an array of ETH_ALEN bytes instead of a char pointer. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: extend fid mask	Vivien Didelot	2015-08-09	2	-10/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The driver currently manages one FID per port (or bridge group), with a mask of DSA_MAX_PORTS bits, where 0 means that the FID is in use. The Marvell 88E6xxx switches support up to 4094 FIDs (from 1 to 0xfff; FID 0 means that multiple address databases are not being used). This patch changes the fid_mask for an fid_bitmap of 4096 bits. >From now on, FIDs 1 to num_ports are reserved for non-bridged ports and bridge groups (a bridge group gets the FID of its first member). The remaining bits will be reserved for VLAN entries. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: add support for switchdev FDB objects	Vivien Didelot	2015-08-09	4	-114/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the fdb_{add,del,getnext} function pointer in favor of new port_fdb_{add,del,getnext}. Implement the switchdev_port_obj_{add,del,dump} functions in DSA to support the SWITCHDEV_OBJ_PORT_FDB objects. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>