summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mac80211: directly return ieee80211_vif_use_reserved_context()Fabian Frederick2014-10-091-5/+1
| | | | | | | | No need to store ieee80211_vif_use_reserved_context() result and test it before returning. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211-hwsim: Add support for HWSIM_ATTR_DESTROY_RADIO_ON_CLOSEJukka Rissanen2014-10-092-3/+40
| | | | | | | | | | | Add support for new attribute HWSIM_ATTR_DESTROY_RADIO_ON_CLOSE which can be set by the user space component. The attribute will cause the kernel to destroy all those radios that were created by a process if that process dies. The old behaviour i.e., radios are persistent is still the default. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: allow channel switch with multiple channel contextsLuciano Coelho2014-10-098-40/+34
| | | | | | | | | Channel switch with multiple channel contexts should now work fine. Remove check that disallows switches when multiple contexts are in use. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: wait for the first beacon on the new channel after CSALuciano Coelho2014-10-093-13/+36
| | | | | | | | | | | Instead of immediately reopening the queues (in case of block_tx), calling the post_channel_switch operation and sending the notification, wait for the first beacon on the new channel. This makes sure that we don't lose packets if the AP/GO is not on the new channel yet. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: add post_channel_switch driver operationLuciano Coelho2014-10-095-1/+43
| | | | | | | | | As a counterpart to the pre_channel_switch operation, add a post_channel_switch operation. This allows the drivers to go back to a normal configuration after the channel switch is completed. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: add pre_channel_switch driver operationLuciano Coelho2014-10-095-8/+86
| | | | | | | | | | Some drivers may need to prepare for a channel switch also when it is initiated from the remote side (eg. station, P2P client). To make this possible, add a generic callback that can be called for all interface types. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: add extended channel switching capability if the driver supports CSALuciano Coelho2014-10-093-9/+19
| | | | | | | | | | | | | The Extended Channel Switching capability bit in the extended capabilities element must be set if the driver supports CSA on non-beaconing interfaces. Since this capability needs to be set during driver registration, the extended_capabiliities global variable needs to be moved to the local structure so that it can be modified. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: add device_timestamp to the ieee80211_channel_switch structLuciano Coelho2014-10-093-5/+15
| | | | | | | | | | Some devices may need the device timestamp in order to synchronize the channel switch. To pass this value back to the driver, add it to the channel switch structure and copy the device_timestamp value received in the rx info structure into it. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: implement cfg80211_ops to query mesh proxy path tableHenning Rogge2014-10-093-0/+87
| | | | | | | | Implement get_mpp and dump_mpp cfg80211_ops to export the content of the 802.11s mesh proxy path table to userspace. Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* cfg80211: add ops to query mesh proxy path tableHenning Rogge2014-10-095-1/+183
| | | | | | | | Add two new cfg80211 operations for querying a table with proxied mesh paths. Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211: minstrel_ht: fix MCS_GROUP_RATES usageKarl Beldan2014-10-091-1/+1
| | | | | | | | | | | Commit 5935839ad735 ("mac80211: improve minstrel_ht rate sorting by throughput & probability") replaced the constant 8 with MCS_GROUP_RATES when getting the number of streams of an HT MCS. See commit 7a5e3fa2c81c ("mac80211: minstrel_ht: replace some occurences of MCS_GROUP_RATES"). Fixes: 5935839ad735 ("mac80211: improve minstrel_ht rate sorting by throughput & probability") Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211_hwsim: fix typo, remove unnecessary gotoBen Greear2014-10-091-2/+1
| | | | | | | Trivial cleanups. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* mac80211_hwsim: fix memory leak on netlink TX failureBen Greear2014-10-091-1/+2
| | | | | | | | | If the packet can't be delivered to userspace (at all or quickly enough) then it can leak - fix that. Signed-off-by: Ben Greear <greearb@candelatech.com> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-10-081397-37800/+153007
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Most notable changes in here: 1) By far the biggest accomplishment, thanks to a large range of contributors, is the addition of multi-send for transmit. This is the result of discussions back in Chicago, and the hard work of several individuals. Now, when the ->ndo_start_xmit() method of a driver sees skb->xmit_more as true, it can choose to defer the doorbell telling the driver to start processing the new TX queue entires. skb->xmit_more means that the generic networking is guaranteed to call the driver immediately with another SKB to send. There is logic added to the qdisc layer to dequeue multiple packets at a time, and the handling mis-predicted offloads in software is now done with no locks held. Finally, pktgen is extended to have a "burst" parameter that can be used to test a multi-send implementation. Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4, virtio_net Adding support is almost trivial, so export more drivers to support this optimization soon. I want to thank, in no particular or implied order, Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann, David Tat, Hannes Frederic Sowa, and Rusty Russell. 2) PTP and timestamping support in bnx2x, from Michal Kalderon. 3) Allow adjusting the rx_copybreak threshold for a driver via ethtool, and add rx_copybreak support to enic driver. From Govindarajulu Varadarajan. 4) Significant enhancements to the generic PHY layer and the bcm7xxx driver in particular (EEE support, auto power down, etc.) from Florian Fainelli. 5) Allow raw buffers to be used for flow dissection, allowing drivers to determine the optimal "linear pull" size for devices that DMA into pools of pages. The objective is to get exactly the necessary amount of headers into the linear SKB area pre-pulled, but no more. The new interface drivers use is eth_get_headlen(). From WANG Cong, with driver conversions (several had their own by-hand duplicated implementations) by Alexander Duyck and Eric Dumazet. 6) Support checksumming more smoothly and efficiently for encapsulations, and add "foo over UDP" facility. From Tom Herbert. 7) Add Broadcom SF2 switch driver to DSA layer, from Florian Fainelli. 8) eBPF now can load programs via a system call and has an extensive testsuite. Alexei Starovoitov and Daniel Borkmann. 9) Major overhaul of the packet scheduler to use RCU in several major areas such as the classifiers and rate estimators. From John Fastabend. 10) Add driver for Intel FM10000 Ethernet Switch, from Alexander Duyck. 11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric Dumazet. 12) Add Datacenter TCP congestion control algorithm support, From Florian Westphal. 13) Reorganize sk_buff so that __copy_skb_header() is significantly faster. From Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits) netlabel: directly return netlbl_unlabel_genl_init() net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers net: description of dma_cookie cause make xmldocs warning cxgb4: clean up a type issue cxgb4: potential shift wrapping bug i40e: skb->xmit_more support net: fs_enet: Add NAPI TX net: fs_enet: Remove non NAPI RX r8169:add support for RTL8168EP net_sched: copy exts->type in tcf_exts_change() wimax: convert printk to pr_foo() af_unix: remove 0 assignment on static ipv6: Do not warn for informational ICMP messages, regardless of type. Update Intel Ethernet Driver maintainers list bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING tipc: fix bug in multicast congestion handling net: better IFF_XMIT_DST_RELEASE support net/mlx4_en: remove NETDEV_TX_BUSY 3c59x: fix bad split of cpu_to_le32(pci_map_single()) net: bcmgenet: fix Tx ring priority programming ...
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-10-0847-181/+264
| |\
| | * net_sched: copy exts->type in tcf_exts_change()WANG Cong2014-10-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to copy exts->type when committing the change, otherwise it would be always 0. This is a quick fix for -net and -stable, for net-next tcf_exts will be removed. Fixes: commit 33be627159913b094bb578e83 ("net_sched: act: use standard struct list_head") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Update Intel Ethernet Driver maintainers listAlexander Duyck2014-10-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I will no longer be working for Intel as of today. As such I am removing myself from the maintainers list and adding my replacement, Matthew Vick as he will be taking over maintenance of the fm10k driver. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTINGHerbert Xu2014-10-072-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we may defragment the packet in IPv4 PRE_ROUTING and refragment it after POST_ROUTING we should save the value of frag_max_size. This is still very wrong as the bridge is supposed to leave the packets intact, meaning that the right thing to do is to use the original frag_list for fragmentation. Unfortunately we don't currently guarantee that the frag_list is left untouched throughout netfilter so until this changes this is the best we can do. There is also a spot in FORWARD where it appears that we can forward a packet without going through fragmentation, mark it so that we can fix it later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * 3c59x: fix bad split of cpu_to_le32(pci_map_single())Sylvain "ythier" Hitier2014-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 6f2b6a3005b2c34c39f207a87667564f64f2f91a, # 3c59x: Add dma error checking and recovery the intent is to split out the mapping from the byte-swapping in order to insert a dma_mapping_error() check. Kinda this semantic patch: // See http://coccinelle.lip6.fr/ // // Beware, grouik-and-dirty! @@ expression DEV, X, Y, Z; @@ - cpu_to_le32(pci_map_single(DEV, X, Y, Z)) + dma_addr_t addr = pci_map_single(DEV, X, Y, Z); + if (dma_mapping_error(&DEV->dev, addr)) + /* snip */; + cpu_to_le32(addr) However, the #else part (of the #if DO_ZEROCOPY test) is changed this way: - cpu_to_le32(pci_map_single(DEV, X, Y, Z)) + dma_addr_t addr = cpu_to_le32(pci_map_single(DEV, X, Y, Z)); // ^^^^^^^^^^^ // That mismatches the 3 other changes! + if (dma_mapping_error(&DEV->dev, addr)) + /* snip */; + cpu_to_le32(addr) Let's remove the leftover cpu_to_le32() for coherency. v2: Better changelog. v3: Add Acked-by Fixes: 6f2b6a3005b2c34c39f207a87667564f64f2f91a # 3c59x: Add dma error checking and recovery Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bna: allow transmit tagged framesIvan Vecera2014-10-072-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Tx VLAN offloading is disabled frames with size ~ MTU are not transmitted as the driver does not account 4 bytes of VLAN header added by stack. It should use VLAN_ETH_HLEN instead of ETH_HLEN. The second problem is with newer BNA chips (BNA 1860). These chips filter out any VLAN tagged frames in Tx path. This is a problem when Tx VLAN offloading is disabled and frames are tagged by stack. Older chips like 1010/1020 are not affected as they probably don't do such filtering. Cc: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * sctp: handle association restarts when the socket is closed.Vlad Yasevich2014-10-062-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently association restarts do not take into consideration the state of the socket. When a restart happens, the current assocation simply transitions into established state. This creates a condition where a remote system, through a the restart procedure, may create a local association that is no way reachable by user. The conditions to trigger this are as follows: 1) Remote does not acknoledge some data causing data to remain outstanding. 2) Local application calls close() on the socket. Since data is still outstanding, the association is placed in SHUTDOWN_PENDING state. However, the socket is closed. 3) The remote tries to create a new association, triggering a restart on the local system. The association moves from SHUTDOWN_PENDING to ESTABLISHED. At this point, it is no longer reachable by any socket on the local system. This patch addresses the above situation by moving the newly ESTABLISHED association into SHUTDOWN-SENT state and bundling a SHUTDOWN after the COOKIE-ACK chunk. This way, the restarted associate immidiately enters the shutdown procedure and forces the termination of the unreachable association. Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Merge branch 'spider_net'David S. Miller2014-10-051-37/+5
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Antoine Tenart says: ==================== net: spider_net: fix possible bitops errors Dan reported a possible signedness issue on the pxa168_eth driver. While having a look at it, I came across a similar problem in the spider_net driver. Here is one proposal to fix it. The first patch rework the spider_net_set_mac() function by removing the spider_net_get_mac_address() call and using memcpy() to set netdev->dev_addr (which is what's done in lots of Ethernet drivers) and the second one fix the actual signedness issue. If for any reason you really want to keep a call to spider_net_get_mac_address() because the memcpy() is somehow not good enough here, we can also come up with a solution involving a temporary unsigned char variable. I couldn't test these changes, so please do. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * net: spider_net: avoid using signed char for bitopsAntoine Ténart2014-10-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signedness bugs may occur when using signed char for bitops, depending on if the highest bit is ever used. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * net: spider_net: do not read mac address again after setting itAntoine Ténart2014-10-051-34/+2
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the spider_net_get_mac_address() call at the end of the spider_net_set_mac() function. The dev->dev_addr is instead updated with a memcpy() from sa->sa_data. Since spider_net_get_mac_address() is not used anywhere else, this patch also removes the function. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * hyperv: Fix a bug in netvsc_send()KY Srinivasan2014-10-051-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the packet is successfully sent, we should not touch the packet as it may have been freed. This patch is based on the work done by Long Li <longli@microsoft.com>. David, please queue this up for stable. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * team: avoid race condition in scheduling delayed workJoe Lawrence2014-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When team_notify_peers and team_mcast_rejoin are called, they both reset their respective .count_pending atomic variable. Then when the actual worker function is executed, the variable is atomically decremented. This pattern introduces a potential race condition where the .count_pending rolls over and the worker function keeps rescheduling until .count_pending decrements to zero again: THREAD 1 THREAD 2 ======== ======== team_notify_peers(teamX) atomic_set count_pending = 1 schedule_delayed_work team_notify_peers(teamX) atomic_set count_pending = 1 team_notify_peers_work atomic_dec_and_test count_pending = 0 (return) schedule_delayed_work team_notify_peers_work atomic_dec_and_test count_pending = -1 schedule_delayed_work (repeat until count_pending = 0) Instead of assigning a new value to .count_pending, use atomic_add to tack-on the additional desired worker function invocations. Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Fixes: fc423ff00df3a19554414ee ("team: add peer notification") Fixes: 492b200efdd20b8fcfdac87 ("team: add support for sending multicast rejoins") Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ematch: Fix early ending of inverted containers.Ignacy Gawędzki2014-10-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The result of a negated container has to be inverted before checking for early ending. This fixes my previous attempt (17c9c8232663a47f074b7452b9b034efda868ca7) to make inverted containers work correctly. Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: systemport: fix bcm_sysport_insert_tsb()Florian Fainelli2014-10-041-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit bc23333ba11fb7f959b7e87e121122f5a0fbbca8 ("net: bcmgenet: fix bcmgenet_put_tx_csum()"), we need to return the skb pointer in case we had to reallocate the SKB headroom. Fixes: 80105befdb4b8 ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ip6_gre: fix flowi6_proto value in xmit pathNicolas Dichtel2014-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xmit path, we build a flowi6 which will be used for the output route lookup. We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the protocol should be IPPROTO_GRE. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * r8152: autoresume before setting MAC addresshayeswang2014-10-031-2/+9
| | | | | | | | | | | | | | | | | | | | | Resume the device before setting the MAC address. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * asix: Don't reset PHY on if_up for ASIX 88772Michel Stam2014-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've noticed every time the interface is set to 'up,', the kernel reports that the link speed is set to 100 Mbps/Full Duplex, even when ethtool is used to set autonegotiation to 'off', half duplex, 10 Mbps. It can be tested by: ifconfig eth0 down ethtool -s eth0 autoneg off speed 10 duplex half ifconfig eth0 up Then checking 'dmesg' for the link speed. Signed-off-by: Michel Stam <m.stam@fugro.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Merge branch 'rds-net'David S. Miller2014-10-033-7/+12
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Herton R. Krzesinski says: ==================== Small fixes/changes for RDS I got a report of one issue within RDS (after investigation it was a double free), and I'm sending the fix (patch 3/3) which reporter said it works (no more WARNING triggered on a specially instrumented kernel). The report/test was done on a very old kernel (RHEL 5, 2.6.18 based with backports), but the problem the patch handles still exists and should not change. Besides that, while reviewing some of the code but being unable to reproduce with rds_tcp, I noticed two small improvements/fixes which are in patches 1 and 2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * net/rds: fix possible double free on sock tear downHerton R. Krzesinski2014-10-031-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got a report of a double free happening at RDS slab cache. One suspicion was that may be somewhere we were doing a sock_hold/sock_put on an already freed sock. Thus after providing a kernel with the following change: static inline void sock_hold(struct sock *sk) { - atomic_inc(&sk->sk_refcnt); + if (!atomic_inc_not_zero(&sk->sk_refcnt)) + WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n", + sk, sk->sk_family); } The warning successfuly triggered: Trying to hold sock already gone: ffff81f6dda61280 (family: 21) WARNING: at include/net/sock.h:350 sock_hold() Call Trace: <IRQ> [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b [<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf [<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc [<ffffffff8009899a>] tasklet_action+0x8f/0x12b [<ffffffff800125a2>] __do_softirq+0x89/0x133 [<ffffffff8005f30c>] call_softirq+0x1c/0x28 [<ffffffff8006e644>] do_softirq+0x2c/0x7d [<ffffffff8006e4d4>] do_IRQ+0xee/0xf7 [<ffffffff8005e625>] ret_from_intr+0x0/0xa <EOI> Looking at the call chain above, the only way I think this would be possible is if somewhere we already released the same socket->sock which is assigned to the rds_message at rds_send_remove_from_sock. Which seems only possible to happen after the tear down done on rds_release. rds_release properly calls rds_send_drop_to to drop the socket from any rds_message, and some proper synchronization is in place to avoid race with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see a very narrow window where it may be possible we touch a sock already released: when rds_release races with rds_send_drop_acked, we check RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this specific case we don't clear rm->m_rs. In this case, it seems we could then go on at rds_send_drop_to and after it returns, the sock is freed by last sock_put on rds_release, with concurrently we being at rds_send_remove_from_sock; then at some point in the loop at rds_send_remove_from_sock we process an rds_message which didn't have rm->m_rs unset for a freed sock, and a possible sock_hold on an sock already gone at rds_release happens. This hopefully address the described condition above and avoids a double free on "second last" sock_put. In addition, I removed the comment about socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to in rds_release and we should have things properly serialized there, thus I can't see the comment being accurate there. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * net/rds: do proper house keeping if connection fails in rds_tcp_conn_connectHerton R. Krzesinski2014-10-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I see two problems if we consider the sock->ops->connect attempt to fail in rds_tcp_conn_connect. The first issue is that for example we don't remove the previously added rds_tcp_connection item to rds_tcp_tc_list at rds_tcp_set_callbacks, which means that on a next reconnect attempt for the same rds_connection, when rds_tcp_conn_connect is called we can again call rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list, leading to list corruption: to avoid this just make sure we call properly rds_tcp_restore_callbacks before we exit. The second issue is that we should also release the sock properly, by setting sock = NULL only if we are returning without error. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * net/rds: call rds_conn_drop instead of open code it at rds_connect_completeHerton R. Krzesinski2014-10-031-2/+1
| | |/ | | | | | | | | | | | | Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netlabel: directly return netlbl_unlabel_genl_init()Fabian Frederick2014-10-081-5/+1
| | | | | | | | | | | | | | | | | | | | | No need to store netlbl_unlabel_genl_init result and test it before returning. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpersEric Dumazet2014-10-082-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two helpers so that drivers do not have to care of BQL being available or not. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jim Davis <jim.epost@gmail.com> Fixes: 29d40c903247 ("net/mlx4_en: Use prefetch in tx path") Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: description of dma_cookie cause make xmldocs warningMasanari Iida2014-10-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 7bced397510ab569d31de4c70b39e13355046387, dma_cookie was removed from struct skbuff. But the description of dma_cookie still exist. So the "make xmldocs" output following warning. Warning(.//include/linux/skbuff.h:609): Excess struct/union /enum/typedef member 'dma_cookie' description in 'sk_buff' Remove description of dma_cookie fix the symptom. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cxgb4: clean up a type issueDan Carpenter2014-10-081-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tx_desc struct holds 8 __be64 values. The original code in ring_tx_db() took a tx_desc pointer then casted it to an int pointer and then casted it to a u64 pointer. It was confusing and triggered some static checker warnings. I have changed the cxgb_pio_copy() function to only take tx_desc pointers. This isn't really a loss of flexibility because anything else was buggy to begin with. I also removed the casting on the destination pointer since that was unnecessary and a bit messy. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cxgb4: potential shift wrapping bugDan Carpenter2014-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | "cntxt_id" is an unsigned int but "udb" is a u64 so there is a potential shift wrapping bug here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | i40e: skb->xmit_more supportEric Dumazet2014-10-081-44/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support skb->xmit_more in i40e is straightforward : we need to move around i40e_maybe_stop_tx() call to correctly test netif_xmit_stopped() before taking the decision to not kick the NIC. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'fs_enet_napi'David S. Miller2014-10-086-161/+147
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christophe Leroy says: ==================== net: fs_enet: Remove non NAPI RX and add NAPI for TX When using a MPC8xx as a router, 'perf' shows a significant time spent in fs_enet_interrupt() and fs_enet_start_xmit(). 'perf annotate' shows that the time spent in fs_enet_start_xmit is indeed spent between spin_unlock_irqrestore() and the following instruction, hence in interrupt handling. This is due to the TX complete interrupt that fires after each transmitted packet. This patchset first remove all non NAPI handling as NAPI has become the only mode for RX, then adds NAPI for handling TX complete. This improves NAT TCP throughput by 21% on MPC885 with FEC. Tested on MPC885 with FEC. [PATCH 1/2] net: fs_enet: Remove non NAPI RX [PATCH 2/2] net: fs_enet: Add NAPI TX Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: fs_enet: Add NAPI TXLEROY Christophe2014-10-085-11/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using a MPC8xx as a router, 'perf' shows a significant time spent in fs_enet_interrupt() and fs_enet_start_xmit(). 'perf annotate' shows that the time spent in fs_enet_start_xmit is indeed spent between spin_unlock_irqrestore() and the following instruction, hence in interrupt handling. This is due to the TX complete interrupt that fires after each transmitted packet. This patch modifies the handling of TX complete to use NAPI. With this patch, my NAT router offers a throughput improved by 21% Original performance: [root@localhost tmp]# scp toto pgs:/tmp toto 100% 256MB 2.8MB/s 01:31 Performance with the patch: [root@localhost tmp]# scp toto pgs:/tmp toto 100% 256MB 3.4MB/s 01:16 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: fs_enet: Remove non NAPI RXLEROY Christophe2014-10-082-150/+15
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | In the probe function, use_napi is inconditionnaly set to 1. This patch removes all the code which is conditional to !use_napi, and removes use_napi which has then become useless. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8169:add support for RTL8168EPChun-Hao Lin2014-10-081-44/+517
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RTL8168EP is Realtek PCIe Gigabit Ethernet controller with DASH support. It is a successor chip of RTL8168DP. For RTL8168EP, the read/write ocp register is via eri channel type 2, so I move ocp_xxx() related functions under rtl_eri_xxx. And use r8168dp_ocp_xxx() for RTL8168DP ocp read/write, r8168ep_ocp_xxx() for RTL8168EP ocp read/write. The way of checking dash enable is different with RTL8168DP. I use r8168dp_check_dash()for RTL8168DP and r8168ep_check_dash() for RTL8168EP, to check if dash is enabled. The driver_start() and driver_stop() of RTL8168EP is also different with RTL8168DP. I use rtl8168dp_driver_xxx() for RTL8168DP and rtl8168ep_driver_xxx for RTL8168EP. Right now, RTL8168EP phy mcu did not need firmware code patch, so I did not add firmware code for it. so I did not add firmware code for it. Signed-off-by: Chun-Hao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | wimax: convert printk to pr_foo()Fabian Frederick2014-10-077-16/+17
| | | | | | | | | | | | | | | | | | | | | Use current logging functions and add module name prefix. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | af_unix: remove 0 assignment on staticFabian Frederick2014-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | static values are automatically initialized to 0 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Do not warn for informational ICMP messages, regardless of type.David S. Miller2014-10-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason to emit a log message for these. Based upon a suggestion from Hannes Frederic Sowa. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
| * | tipc: fix bug in multicast congestion handlingJon Maloy2014-10-074-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc: use pseudo message to wake up sockets after link congestion") was to handle link congestion abatement in a uniform way for both unicast and multicast transmit. However, the latter doesn't work correctly, and has been broken since the referenced commit was applied. If a user now sends a burst of multicast messages that is big enough to cause broadcast link congestion, it will be put to sleep, and not be waked up when the congestion abates as it should be. This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS, is set correctly, but in the wrong field. Instead of setting it in the 'action_flags' field of the arrival node struct, it is by mistake set in the dummy node struct that is owned by the broadcast link, where it will never tested for. Second, we cannot use the same flag for waking up unicast and multicast users, since the function tipc_node_unlock() needs to pick the wakeup pseudo messages to deliver from different queues. It must hence be able to distinguish between the two cases. This commit solves this problem by adding a new flag TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user(). The latter is to be called by tipc_node_unlock() when the named flag, now set in the correct field, is encountered. v2: using explicit 'unsigned int' declaration instead of 'uint', as per comment from David Miller. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: better IFF_XMIT_DST_RELEASE supportEric Dumazet2014-10-0727-39/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>