summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* 6lowpan: use netdev_alloc_skb instead dev_alloc_skbAlexander Aring2013-10-281-2/+2
| | | | | | | | | This patch uses the netdev_alloc_skb instead dev_alloc_skb function and drops the seperate assignment to skb->dev. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* 6lowpan: remove unnecessary check on err >= 0Alexander Aring2013-10-281-1/+1
| | | | | | | | The err variable can only be zero in this case. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* 6lowpan: remove unnecessary ret variableAlexander Aring2013-10-281-4/+2
| | | | | | Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: merge two if statements to onewangweidong2013-10-281-9/+5
| | | | | | | | | Two if statements do the same work, we can merge them to one. And fix some typos. There is just code simplification, no functional changes. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: remove the repeat initialize with 0wangweidong2013-10-281-21/+8
| | | | | | | | | kmem_cache_zalloc had set the allocated memory to zero. I think no need to initialize with 0. And move the comments to the function begin. Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix some comments in chunk.c and associola.cwangweidong2013-10-282-3/+3
| | | | | | | | fix some typos Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: restore gso for vxlanEric Dumazet2013-10-281-8/+7
| | | | | | | | | | | | | | | | | | | | | Alexei reported a performance regression on vxlan, caused by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable" GSO vxlan packets were not properly segmented, adding IP fragments while they were not expected. Rename 'bool tunnel' to 'bool encap', and add a new boolean to express the fact that UDP should be fragmented. This fragmentation is triggered by skb->encapsulation being set. Remove a "skb->encapsulation = 1" added in above commit, as its not needed, as frags inherit skb->frag from original GSO skb. Reported-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix rtnl notification in atomic contextAlexei Starovoitov2013-10-252-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 991fb3f74c "dev: always advertise rx_flags changes via netlink" introduced rtnl notification from __dev_set_promiscuity(), which can be called in atomic context. Steps to reproduce: ip tuntap add dev tap1 mode tap ifconfig tap1 up tcpdump -nei tap1 & ip tuntap del dev tap1 mode tap [ 271.627994] device tap1 left promiscuous mode [ 271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940 [ 271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip [ 271.677525] INFO: lockdep is turned off. [ 271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G W 3.12.0-rc3+ #73 [ 271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 271.731254] ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428 [ 271.760261] ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8 [ 271.790683] 0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8 [ 271.822332] Call Trace: [ 271.838234] [<ffffffff817544e5>] dump_stack+0x55/0x76 [ 271.854446] [<ffffffff8108bad1>] __might_sleep+0x181/0x240 [ 271.870836] [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0 [ 271.887076] [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0 [ 271.903368] [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0 [ 271.919716] [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0 [ 271.936088] [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0 [ 271.952504] [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0 [ 271.968902] [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100 [ 271.985302] [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0 [ 272.001642] [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0 [ 272.017917] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380 [ 272.033961] [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50 [ 272.049855] [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0 [ 272.065494] [<ffffffff81732052>] packet_notifier+0x1b2/0x380 [ 272.080915] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380 [ 272.096009] [<ffffffff81761c66>] notifier_call_chain+0x66/0x150 [ 272.110803] [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10 [ 272.125468] [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20 [ 272.139984] [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70 [ 272.154523] [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20 [ 272.168552] [<ffffffff816224c5>] rollback_registered_many+0x145/0x240 [ 272.182263] [<ffffffff81622641>] rollback_registered+0x31/0x40 [ 272.195369] [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90 [ 272.208230] [<ffffffff81547ca0>] __tun_detach+0x140/0x340 [ 272.220686] [<ffffffff81547ed6>] tun_chr_close+0x36/0x60 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: initialize hashrnd in flow_dissector with net_get_random_onceHannes Frederic Sowa2013-10-251-13/+21
| | | | | | | | | | | | We also can defer the initialization of hashrnd in flow_dissector to its first use. Since net_get_random_once is irq safe now we don't have to audit the call paths if one of this functions get called by an interrupt handler. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: make net_get_random_once irq safeHannes Frederic Sowa2013-10-251-3/+4
| | | | | | | | | | | | | | | I initial build non irq safe version of net_get_random_once because I would liked to have the freedom to defer even the extraction process of get_random_bytes until the nonblocking pool is fully seeded. I don't think this is a good idea anymore and thus this patch makes net_get_random_once irq safe. Now someone using net_get_random_once does not need to care from where it is called. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add missing dev_put() in __netdev_adjacent_dev_insertNikolay Aleksandrov2013-10-251-0/+1
| | | | | | | | | | I think that a dev_put() is needed in the error path to preserve the proper dev refcount. CC: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netem: markov loss model transition fixHagen Paul Pfeifer2013-10-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The transition from markov state "3 => lost packets within a burst period" to "1 => successfully transmitted packets within a gap period" has no *additional* loss event. The loss already happen for transition from 1 -> 3, this additional loss will make things go wild. E.g. transition probabilities: p13: 10% p31: 100% Expected: Ploss = p13 / (p13 + p31) Ploss = ~9.09% ... but it isn't. Even worse: we get a double loss - each time. So simple don't return true to indicate loss, rather break and return false. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Stefano Salsano <stefano.salsano@uniroma2.it> Cc: Fabio Ludovici <fabio.ludovici@yahoo.it> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller2013-10-2318-477/+956
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Antonio Quartulli says: ==================== this is another set of changes intended for net-next/linux-3.13. (probably our last pull request for this cycle) Patches 1 and 2 reshape two of our main data structures in a way that they can easily be extended in the future to accommodate new routing protocols. Patches from 3 to 9 improve our routing protocol API and its users so that all the protocol-related code is not mixed up with the other components anymore. Patch 10 limits the local Translation Table maximum size to a value such that it can be fully transfered over the air if needed. This value depends on fragmentation being enabled or not and on the mtu values. Patch 11 makes batman-adv send a uevent in case of soft-interface destruction while a "bat-Gateway" was configured (this informs userspace about the GW not being available anymore). Patches 13 and 14 enable the TT component to detect non-mesh client flag changes at runtime (till now those flags where set upon client detection and were not changed anymore). Patch 16 is a generalisation of our user-to-kernel space communication (and viceversa) used to exchange ICMP packets to send/received to/from the mesh network. Now it can easily accommodate new ICMP packet types without breaking the existing userspace API anymore. Remaining patches are minor changes and cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * batman-adv: generalize batman-adv icmp packet handlingSimon Wunderlich2013-10-235-97/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of handling icmp packets only up to length of icmp_packet_rr, the code should handle any icmp length size. Therefore the length truncating is moved to when the packet is actually sent to userspace (this does not support lengths longer than icmp_packet_rr yet). Longer packets are forwarded without truncating. This patch also cleans up some parts where the icmp header struct could be used instead of other icmp_packet(_rr) structs to make the code more readable. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: Start new development cycleSimon Wunderlich2013-10-231-1/+1
| | | | | | | | | | Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: include the sync-flags when compute the global/local table CRCAntonio Quartulli2013-10-233-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flags covered by TT_SYNC_MASK are kept in sync among the nodes in the network and therefore they have to be considered while computing the global/local table CRC. In this way a generic originator is able to understand if its table contains the correct flags or not. Bits from 4 to 7 in the TT flags fields are now reserved for "synchronized" flags only. This allows future developers to add more flags of this type without breaking compatibility. It's important to note that not all the remote TT flags are synchronised. This comes from the fact that some flags are used to inject an information once only. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
| * batman-adv: improve the TT component to support runtime flag changesAntonio Quartulli2013-10-232-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some flags (i.e. the WIFI flag) may change after that the related client has already been announced. However it is useful to informa the rest of the network about this change. Add a runtime-flag-switch detection mechanism and re-announce the related TT entry to advertise the new flag value. This mechanism can be easily exploited by future flags that may need the same treatment. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
| * batman-adv: invoke dev_get_by_index() outside of is_wifi_iface()Antonio Quartulli2013-10-233-31/+12
| | | | | | | | | | | | | | | | | | | | | | | | Upcoming changes need to perform other checks on the incoming net_device struct. To avoid performing dev_get_by_index() for each and every check, it is better to move it outside of is_wifi_iface() and search the netdev object once only. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
| * batman-adv: send GW_DEL event in case of soft-iface destructionAntonio Quartulli2013-10-231-2/+7
| | | | | | | | | | | | | | | | | | | | In case of soft_iface destruction send a GW DEL event to userspace so that applications which are listening for GW events are informed about the lost of connectivity and can react accordingly. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
| * batman-adv: limit local translation table max sizeMarek Lindner2013-10-235-36/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The local translation table size is limited by what can be transferred from one node to another via a full table request. The number of entries fitting into a full table request depend on whether the fragmentation is enabled or not. Therefore this patch introduces a max table size check and refuses to add more local clients when that size is reached. Moreover, if the max full table packet size changes (MTU change or fragmentation is disabled) the local table is downsized instantaneously. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org>
| * batman-adv: adapt the TT component to use the new API functionsAntonio Quartulli2013-10-231-11/+24
| | | | | | | | | | Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: provide orig_node routing APIAntonio Quartulli2013-10-233-88/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some operations executed on an orig_node depends on the current routing algorithm being used. To easily make this mechanism routing algorithm agnostic add a orig_node specific API that each algorithm can populate with its own routines. Such routines are then invoked by the code when needed, without knowing which routing algorithm is currently in use With this patch 3 API functions are added: - orig_free (to free routing depending internal structs) - orig_add_if (to change the inner state of an orig_node when a new hard interface is added) - orig_del_if (to change the inner state of an orig_node when an hard interface is removed) Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: adapt the neighbor purging routine to use the new API functionsAntonio Quartulli2013-10-231-8/+9
| | | | | | | | | | Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: adapt bonding to use the new API functionsAntonio Quartulli2013-10-233-15/+32
| | | | | | | | | | Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: add bat_neigh_is_equiv_or_better API functionAntonio Quartulli2013-10-234-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | Each routing protocol has its own metric semantic and therefore is the protocol itself the only component able to compare two metrics to check their "similarity". This new API allows each routing protocol to implement its own logic and make the external code protocol agnostic. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: add bat_neigh_cmp API functionAntonio Quartulli2013-10-233-1/+25
| | | | | | | | | | | | | | | | | | This new API allows to compare the two neighbours based on the metric avoiding the user to deal with any routing algorithm specific detail Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: add bat_orig_print API functionAntonio Quartulli2013-10-233-56/+78
| | | | | | | | | | | | | | | | | | | | | | | | Each routing protocol has its own metric and private variables, therefore it is useful to introduce a new API for originator information printing. This API needs to be implemented by each protocol in order to provide its specific originator table output. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: make struct batadv_orig_node algorithm agnosticAntonio Quartulli2013-10-234-95/+135
| | | | | | | | | | | | | | | | | | | | some of the struct batadv_orig_node members are B.A.T.M.A.N. IV specific and therefore they are moved in a algorithm specific substruct in order to make batadv_orig_node routing algorithm agnostic Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: make struct batadv_neigh_node algorithm agnosticAntonio Quartulli2013-10-238-71/+103
| | | | | | | | | | | | | | | | | | | | some of the fields in struct batadv_neigh_node are strictly related to the B.A.T.M.A.N. IV algorithm. In order to make the struct usable by any routing algorithm it has to be split and made more generic Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | inet: remove old fragmentation hash initializingHannes Frederic Sowa2013-10-231-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All fragmentation hash secrets now get initialized by their corresponding hash function with net_get_random_once. Thus we can eliminate the initial seeding. Also provide a comment that hash secret seeding happens at the first call to the corresponding hashing function. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: split inet6_hash_frag for netfilter and initialize secrets with ↵Hannes Frederic Sowa2013-10-232-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net_get_random_once Defer the fragmentation hash secret initialization for IPv6 like the previous patch did for IPv4. Because the netfilter logic reuses the hash secret we have to split it first. Thus introduce a new nf_hash_frag function which takes care to seed the hash secret. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: initialize ip4_frags hash secret as late as possibleHannes Frederic Sowa2013-10-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Defer the generation of the first hash secret for the ipv4 fragmentation cache as late as possible. ip4_frags.rnd gets initial seeded by inet_frags_init and regulary reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly from ip_fragment.c in which case we initialize the secret directly. If we first get called by inet_frag_secret_rebuild we install a new secret by a manual call to get_random_bytes. This secret will be overwritten as soon as the first call to ipqhashfn happens. This is safe because we won't race while publishing the new secrets with anyone else. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-10-2340-169/+324
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Revert "bridge: only expire the mdb entry when query is received"Linus Lüssing2013-10-223-22/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: initialize passive-side sk_pacing_rate after 3WHSNeal Cardwell2013-10-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For passive TCP connections, upon receiving the ACK that completes the 3WHS, make sure we set our pacing rate after we get our first RTT sample. On passive TCP connections, when we receive the ACK completing the 3WHS we do not take an RTT sample in tcp_ack(), but rather in tcp_synack_rtt_meas(). So upon receiving the ACK that completes the 3WHS, tcp_ack() leaves sk_pacing_rate at its initial value. Originally the initial sk_pacing_rate value was 0, so passive-side connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs made in the first RTT. With a default initial cwnd of 10 packets, this happened to be correct for RTTs 5ms or bigger, so it was hard to see problems in WAN or emulated WAN testing. Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the initial sk_pacing_rate is 0xffffffff. So after that change, passive TCP connections were keeping this value (and using large numbers of segments per skbuff) until receiving an ACK for data. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: probe routes asynchronous in rt6_probeHannes Frederic Sowa2013-10-211-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helperJulian Anastasov2013-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now when rt6_nexthop() can return nexthop address we can use it for proper nexthop comparison of directly connected destinations. For more information refer to commit bbb5823cf742a7 ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper"). Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: fill rt6i_gateway with nexthop addressJulian Anastasov2013-10-212-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip_output: do skb ufo init for peeked non ufo skb as wellJiri Pirko2013-10-191-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip6_output: do skb ufo init for peeked non ufo skb as wellJiri Pirko2013-10-191-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | udp6: respect IPV6_DONTFRAG sockopt in case there are pending framesJiri Pirko2013-10-191-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if up->pending != 0 dontfrag is left with default value -1. That causes that application that do: sendto len>mtu flag MSG_MORE sendto len>mtu flag 0 will receive EMSGSIZE errno as the result of the second sendto. This patch fixes it by respecting IPV6_DONTFRAG socket option. introduced by: commit 4b340ae20d0e2366792abe70f46629e576adaf5e "IPv6: Complete IPV6_DONTFRAG support" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix raceDaniel Borkmann2013-10-191-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of credentials passing in unix stream sockets (dgram sockets seem not affected), we get a rather sparse race after commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default"). We have a stream server on receiver side that requests credential passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED on each spawned/accepted socket on server side to 1 first (as it's not inherited), it can happen that in the time between accept() and setsockopt() we get interrupted, the sender is being scheduled and continues with passing data to our receiver. At that time SO_PASSCRED is neither set on sender nor receiver side, hence in cmsg's SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534 (== overflow{u,g}id) instead of what we actually would like to see. On the sender side, here nc -U, the tests in maybe_add_creds() invoked through unix_stream_sendmsg() would fail, as at that exact time, as mentioned, the sender has neither SO_PASSCRED on his side nor sees it on the server side, and we have a valid 'other' socket in place. Thus, sender believes it would just look like a normal connection, not needing/requesting SO_PASSCRED at that time. As reverting 16e5726 would not be an option due to the significant performance regression reported when having creds always passed, one way/trade-off to prevent that would be to set SO_PASSCRED on the listener socket and allow inheriting these flags to the spawned socket on server side in accept(). It seems also logical to do so if we'd tell the listener socket to pass those flags onwards, and would fix the race. Before, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}}, msg_flags=0}, 0) = 5 After, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}}, msg_flags=0}, 0) = 5 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: Fix updating FDB entries when the PVID is appliedToshiaki Makita2013-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently set the value that variable vid is pointing, which will be used in FDB later, to 0 at br_allowed_ingress() when we receive untagged or priority-tagged frames, even though the PVID is valid. This leads to FDB updates in such a wrong way that they are learned with VID 0. Update the value to that of PVID if the PVID is applied. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: Fix the way the PVID is referencedToshiaki Makita2013-10-181-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is set or not at br_get_pvid(), while we don't care about the bit in adding/deleting the PVID, which makes it impossible to forward any incomming untagged frame with vlan_filtering enabled. Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate that the PVID is not set, which is slightly more efficient than using the VLAN_TAG_PRESENT. Fix the problem by getting rid of using the VLAN_TAG_PRESENT. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: Apply the PVID to priority-tagged framesToshiaki Makita2013-10-181-7/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames use the PVID for the port as its VID. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Apply the PVID to not only untagged frames but also priority-tagged frames. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: Don't use VID 0 and 4095 in vlan filteringToshiaki Makita2013-10-183-54/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IEEE 802.1Q says that: - VID 0 shall not be configured as a PVID, or configured in any Filtering Database entry. - VID 4095 shall not be configured as a PVID, or transmitted in a tag header. This VID value may be used to indicate a wildcard match for the VID in management operations or Filtering Database entries. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Don't accept adding these VIDs in the vlan_filtering implementation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: Correctly clamp MAX forward_delay when enabling STPVlad Yasevich2013-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit be4f154d5ef0ca147ab6bcd38857a774133f5450 bridge: Clamp forward_delay when enabling STP had a typo when attempting to clamp maximum forward delay. It is possible to set bridge_forward_delay to be higher then permitted maximum when STP is off. When turning STP on, the higher then allowed delay has to be clamed down to max value. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()Eric Dumazet2013-10-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO packets in the first place. (As a performance hint) Once we have GSO packets in write queue, we can not decide they are no longer GSO only because flow now uses a route which doesn't handle TSO/GSO. Core networking stack handles the case very well for us, all we need is keeping track of packet counts in MSS terms, regardless of segmentation done later (in GSO or hardware) Right now, if tcp_fragment() splits a GSO packet in two parts, @left and @right, and route changed through a non GSO device, both @left and @right have pcount set to 1, which is wrong, and leads to incorrect packet_count tracking. This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU discovery") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: must unclone packets before mangling themEric Dumazet2013-10-171-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP stack should make sure it owns skbs before mangling them. We had various crashes using bnx2x, and it turned out gso_size was cleared right before bnx2x driver was populating TC descriptor of the _previous_ packet send. TCP stack can sometime retransmit packets that are still in Qdisc. Of course we could make bnx2x driver more robust (using ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack. We have identified two points where skb_unclone() was needed. This patch adds a WARN_ON_ONCE() to warn us if we missed another fix of this kind. Kudos to Neal for finding the root cause of this bug. Its visible using small MSS. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'for-davem' of ↵David S. Miller2013-10-1710-5/+43
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Please pull this batch of fixes intended for the 3.12 stream! For the mac80211 bits, Johannes says: "Jouni fixes a remain-on-channel vs. scan bug, and Felix fixes client TX probing on VLANs." And also: "This time I have two fixes from Emmanuel for RF-kill issues, and fixed two issues reported by Evan Huus and Thomas Lindroth respectively." On top of those... Avinash Patil adds a couple of mwifiex fixes to properly inform cfg80211 about some different types of disconnects, avoiding WARNINGs. Mark Cave-Ayland corrects a pointer arithmetic problem in rtlwifi, avoiding incorrect automatic gain calculations. Solomon Peachy sends a cw1200 fix for locking around calls to cw1200_irq_handler, addressing "lost interrupt" problems. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>