summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* vti6: fix input pathNicolas Dichtel2016-09-213-9/+13
| | | | | | | | | | | | | | | | | | Since commit 1625f4529957, vti6 is broken, all input packets are dropped (LINUX_MIB_XFRMINNOSTATES is incremented). XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in xfrm6_rcv_spi(). A new function xfrm6_rcv_tnl() that enables to pass a value to xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function is used in several handlers). CC: Alexey Kodanev <alexey.kodanev@oracle.com> Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* xfrm: Fix memory leak of aead algorithm nameIlan Tayari2016-09-191-0/+1
| | | | | | | | | | | | | | | | commit 1a6509d99122 ("[IPSEC]: Add support for combined mode algorithms") introduced aead. The function attach_aead kmemdup()s the algorithm name during xfrm_state_construct(). However this memory is never freed. Implementation has since been slightly modified in commit ee5c23176fcc ("xfrm: Clone states properly on migration") without resolving this leak. This patch adds a kfree() call for the aead algorithm name. Fixes: 1a6509d99122 ("[IPSEC]: Add support for combined mode algorithms") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* vti: use right inner_mode for inbound inter address family policy checksthomas.zeitlhofer+lkml@ze-it.at2016-09-092-2/+28
| | | | | | | | | | | | | | | In case of inter address family tunneling (IPv6 over vti4 or IPv4 over vti6), the inbound policy checks in vti_rcv_cb() and vti6_rcv_cb() are using the wrong address family. As a result, all inbound inter address family traffic is dropped. Use the xfrm_ip2inner_mode() helper, as done in xfrm_input() (i.e., also increment LINUX_MIB_XFRMINSTATEMODEERROR in case of error), to select the inner_mode that contains the right address family for the inbound policy checks. Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* xfrm_user: propagate sec ctx allocation errorsMathias Krause2016-09-091-3/+6
| | | | | | | | | | | | | | | | | When we fail to attach the security context in xfrm_state_construct() we'll return 0 as error value which, in turn, will wrongly claim success to userland when, in fact, we won't be adding / updating the XFRM state. This is a regression introduced by commit fd21150a0fe1 ("[XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()"). Fix it by propagating the error returned by security_xfrm_state_alloc() in this case. Fixes: fd21150a0fe1 ("[XFRM] netlink: Inline attach_encap_tmpl()...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* tcp: cwnd does not increase in TCP YeAHArtem Germanov2016-09-081-1/+1
| | | | | | | | | | | | | | | Commit 76174004a0f19785a328f40388e87e982bbf69b9 (tcp: do not slow start when cwnd equals ssthresh ) introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth ~100KB/s. That is caused by stalled cwnd when cwnd equals ssthresh. This patch fixes it by proper increasing cwnd in this case. Signed-off-by: Artem Germanov <agermanov@anchorfree.com> Acked-by: Dmitry Adamushko <d.adamushko@anchorfree.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: fastopen: avoid negative sk_forward_allocEric Dumazet2016-09-081-0/+1
| | | | | | | | | | | | | | | | | When DATA and/or FIN are carried in a SYN/ACK message or SYN message, we append an skb in socket receive queue, but we forget to call sk_forced_mem_schedule(). Effect is that the socket has a negative sk->sk_forward_alloc as long as the message is not read by the application. Josh Hunt fixed a similar issue in commit d22e15371811 ("tcp: fix tcp fin memory accounting") Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2016-09-086-18/+18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2016-09-08 1) Fix a crash when xfrm_dump_sa returns an error. From Vegard Nossum. 2) Remove some incorrect WARN() on normal error handling. From Vegard Nossum. 3) Ignore socket policies when rebuilding hash tables, socket policies are not inserted into the hash tables. From Tobias Brunner. 4) Initialize and check tunnel pointers properly before we use it. From Alexey Kodanev. 5) Fix l3mdev oif setting on xfrm dst lookups. From David Ahern. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm: Only add l3mdev oif to dst lookupsDavid Ahern2016-08-222-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Subash reported that commit 42a7b32b73d6 ("xfrm: Add oif to dst lookups") broke a wifi use case that uses fib rules and xfrms. The intent of 42a7b32b73d6 was driven by VRFs with IPsec. As a compromise relax the use of oif in xfrm lookups to L3 master devices only (ie., oif is either an L3 master device or is enslaved to a master device). Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_keyAlexey Kodanev2016-08-112-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running LTP 'icmp-uni-basic.sh -6 -p ipcomp -m tunnel' test over openvswitch + veth can trigger kernel panic: BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0 IP: [<ffffffff8169d1d2>] xfrm_input+0x82/0x750 ... [<ffffffff816d472e>] xfrm6_rcv_spi+0x1e/0x20 [<ffffffffa082c3c2>] xfrm6_tunnel_rcv+0x42/0x50 [xfrm6_tunnel] [<ffffffffa082727e>] tunnel6_rcv+0x3e/0x8c [tunnel6] [<ffffffff8169f365>] ip6_input_finish+0xd5/0x430 [<ffffffff8169fc53>] ip6_input+0x33/0x90 [<ffffffff8169f1d5>] ip6_rcv_finish+0xa5/0xb0 ... It seems that tunnel.ip6 can have garbage values and also dereferenced without a proper check, only tunnel.ip4 is being verified. Fix it by adding one more if block for AF_INET6 and initialize tunnel.ip6 with NULL inside xfrm6_rcv_spi() (which is similar to xfrm4_rcv_spi()). Fixes: 049f8e2 ("xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Ignore socket policies when rebuilding hash tablesTobias Brunner2016-07-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever thresholds are changed the hash tables are rebuilt. This is done by enumerating all policies and hashing and inserting them into the right table according to the thresholds and direction. Because socket policies are also contained in net->xfrm.policy_all but no hash tables are defined for their direction (dir + XFRM_POLICY_MAX) this causes a NULL or invalid pointer dereference after returning from policy_hash_bysel() if the rebuild is done while any socket policies are installed. Since the rebuild after changing thresholds is scheduled this crash could even occur if the userland sets thresholds seemingly before installing any socket policies. Fixes: 53c2e285f970 ("xfrm: Do not hash socket policies") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: get rid of another incorrect WARNVegard Nossum2016-07-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | During fuzzing I regularly run into this WARN(). According to Herbert Xu, this "certainly shouldn't be a WARN, it probably shouldn't print anything either". Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: get rid of incorrect WARNVegard Nossum2016-07-271-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFAICT this message is just printed whenever input validation fails. This is a normal failure and we shouldn't be dumping the stack over it. Looks like it was originally a printk that was maybe incorrectly upgraded to a WARN: commit 62db5cfd70b1ef53aa21f144a806fe3b78c84fab Author: stephen hemminger <shemminger@vyatta.com> Date: Wed May 12 06:37:06 2010 +0000 xfrm: add severity to printk Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: fix crash in XFRM_MSG_GETSA netlink handlerVegard Nossum2016-07-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we hit any of the error conditions inside xfrm_dump_sa(), then xfrm_state_walk_init() never gets called. However, we still call xfrm_state_walk_done() from xfrm_dump_sa_done(), which will crash because the state walk was never initialized properly. We can fix this by setting cb->args[0] only after we've processed the first element and checking this before calling xfrm_state_walk_done(). Fixes: d3623099d3 ("ipsec: add support of limited SA dump") Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | ipv6: addrconf: fix dev refcont leak when DAD failedWei Yongjun2016-09-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In general, when DAD detected IPv6 duplicate address, ifp->state will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a delayed work, the call tree should be like this: ndisc_recv_ns -> addrconf_dad_failure <- missing ifp put -> addrconf_mod_dad_work -> schedule addrconf_dad_work() -> addrconf_dad_stop() <- missing ifp hold before call it addrconf_dad_failure() called with ifp refcont holding but not put. addrconf_dad_work() call addrconf_dad_stop() without extra holding refcount. This will not cause any issue normally. But the race between addrconf_dad_failure() and addrconf_dad_work() may cause ifp refcount leak and netdevice can not be unregister, dmesg show the following messages: IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected! ... unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Cc: stable@vger.kernel.org Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Don't delete routes in different VRFsMark Tomlinson2016-09-062-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When deleting an IP address from an interface, there is a clean-up of routes which refer to this local address. However, there was no check to see that the VRF matched. This meant that deletion wasn't confined to the VRF it should have been. To solve this, a new field has been added to fib_info to hold a table id. When removing fib entries corresponding to a local ip address, this table id is also used in the comparison. The table id is populated when the fib_info is created. This was already done in some places, but not in ip_rt_ioctl(). This has now been fixed. Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the device") Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: release dst in ping_v6_sendmsgDave Jones2016-09-061-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neither the failure or success paths of ping_v6_sendmsg release the dst it acquires. This leads to a flood of warnings from "net/core/dst.c:288 dst_release" on older kernels that don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported. That patch optimistically hoped this had been fixed post 3.10, but it seems at least one case wasn't, where I've seen this triggered a lot from machines doing unprivileged icmp sockets. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'Linus Torvalds2016-09-041-22/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we use the 'readlock' both for protecting some of the af_unix IO path and for making the bind be single-threaded. The two are independent, but using the same lock makes for a nasty deadlock due to ordering with regards to filesystem locking. The bind locking would want to nest outside the VSF pathname locking, but the IO locking wants to nest inside some of those same locks. We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix splice-bind deadlock") which moved the readlock inside the vfs locks, but that caused problems with overlayfs that will then call back into filesystem routines that take the lock in the wrong order anyway. Splitting the locks means that we can go back to having the bind lock be the outermost lock, and we don't have any deadlocks with lock ordering. Acked-by: Rainer Weikusat <rweikusat@cyberadapt.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Revert "af_unix: Fix splice-bind deadlock"Linus Torvalds2016-09-041-40/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1. It turns out that it just replaces one deadlock with another one: we can still get the wrong lock ordering with the readlock due to overlayfs calling back into the filesystem layer and still taking the vfs locks after the readlock. The proper solution ends up being to just split the readlock into two pieces: the bind lock (taken *outside* the vfs locks) and the IO lock (taken *inside* the filesystem locks). The two locks are independent anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bonding: Fix bonding crashMahesh Bandewar2016-09-041-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | l2tp: fix use-after-free during module unloadSabrina Dubroca2016-09-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq -> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU -> l2tp_tunnel_destruct). By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish destroying the socket, the private data reserved via the net_generic mechanism has already been freed, but l2tp_tunnel_destruct() actually uses this data. Make sure tunnel deletion for the netns has completed before returning from l2tp_exit_net() by first flushing the tunnel removal workqueue, and then waiting for RCU callbacks to complete. Fixes: 167eb17e0b17 ("l2tp: create tunnel sockets in the right namespace") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit()Eli Cooper2016-09-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8eb30be0352d0916 ("ipv6: Create ip6_tnl_xmit") unsets flowi6_proto in ip4ip6_tnl_xmit() and ip6ip6_tnl_xmit(). Since xfrm_selector_match() relies on this info, IPv6 packets sent by an ip6tunnel cannot be properly selected by their protocols after removing it. This patch puts flowi6_proto back. Cc: stable@vger.kernel.org Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit") Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possiblyGao Feng2016-09-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original codes depend on that the function parameters are evaluated from left to right. But the parameter's evaluation order is not defined in C standard actually. When flow_keys_have_l4(&keys) is invoked before ___skb_get_hash(skb, &keys, hashrnd) with some compilers or environment, the keys passed to flow_keys_have_l4 is not initialized. Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash") Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: fastopen: fix rcv_wup initialization for TFO server on SYN/dataNeal Cardwell2016-09-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yuchung noticed that on the first TFO server data packet sent after the (TFO) handshake, the server echoed the TCP timestamp value in the SYN/data instead of the timestamp value in the final ACK of the handshake. This problem did not happen on regular opens. The tcp_replace_ts_recent() logic that decides whether to remember an incoming TS value needs tp->rcv_wup to hold the latest receive sequence number that we have ACKed (latest tp->rcv_nxt we have ACKed). This commit fixes this issue by ensuring that a TFO server properly updates tp->rcv_wup to match tp->rcv_nxt at the time it sends a SYN/ACK for the SYN/data. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: don't increment tx_dropped in br_do_proxy_arpNikolay Aleksandrov2016-09-011-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | pskb_may_pull may fail due to various reasons (e.g. alloc failure), but the skb isn't changed/dropped and processing continues so we shouldn't increment tx_dropped. CC: Kyeyoon Park <kyeyoonp@codeaurora.org> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netconf: add a notif when settings are createdNicolas Dichtel2016-09-012-5/+15
| | | | | | | | | | | | | | All changes are notified, but the initial state was missing. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: add missing netconf notif when 'all' is updatedNicolas Dichtel2016-09-011-0/+7
| | | | | | | | | | | | | | | | The 'default' value was not advertised. Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tipc: fix random link resets while adding a second bearerParthasarathy Bhuvaragan2016-09-011-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a dual bearer configuration, if the second tipc link becomes active while the first link still has pending nametable "bulk" updates, it randomly leads to reset of the second link. When a link is established, the function named_distribute(), fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL) with NAME_DISTRIBUTOR message for each PUBLICATION. However, the function named_distribute() allocates the buffer by increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR). This consumes the space allocated for TUNNEL_PROTOCOL. When establishing the second link, the link shall tunnel all the messages in the first link queue including the "bulk" update. As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds the link mtu the transmission fails (-EMSGSIZE). Thus, the synch point based on the message count of the tunnel packets is never reached leading to link timeout. In this commit, we adjust the size of name distributor message so that they can be tunnelled. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | kcm: fix a socket double freeWANG Cong2016-08-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry reported a double free on kcm socket, which could be easily reproduced by: #include <unistd.h> #include <sys/syscall.h> int main() { int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0); syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0); return 0; } This is because on the error path, after we install the new socket file, we call sock_release() to clean up the socket, which leaves the fd pointing to a freed socket. Fix this by calling sys_close() on that fd directly. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: re-introduce 'fix parsing of MLDv2 reports'Davide Caratti2016-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") seems to have accidentally reverted commit 47cc84ce0c2f ("bridge: fix parsing of MLDv2 reports"). This commit brings back a change to br_ip6_multicast_mld2_report() where parsing of MLDv2 reports stops when the first group is successfully added to the MDB cache. Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2016-08-3010-30/+71
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Allow nf_tables reject expression from input, forward and output hooks, since only there the routing information is available, otherwise we crash. 2) Fix unsafe list iteration when flushing timeout and accouting objects. 3) Fix refcount leak on timeout policy parsing failure. 4) Unlink timeout object for unconfirmed conntracks too 5) Missing validation of pkttype mangling from bridge family. 6) Fix refcount leak on ebtables on second lookup for the specific bridge match extension, this patch from Sabrina Dubroca. 7) Remove unnecessary ip_hdr() in nf_tables_netdev family. Patches from 1-5 and 7 from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_tables_netdev: remove redundant ip_hdr assignmentLiping Zhang2016-08-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have already use skb_header_pointer to get the ip header pointer, so there's no need to use ip_hdr again. Moreover, in NETDEV INGRESS hook, ip header maybe not linear, so use ip_hdr is not appropriate, remove it. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: ebtables: put module reference when an incorrect extension is foundSabrina Dubroca2016-08-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bcf493428840 ("netfilter: ebtables: Fix extension lookup with identical name") added a second lookup in case the extension that was found during the first lookup matched another extension with the same name, but didn't release the reference on the incorrect module. Fixes: bcf493428840 ("netfilter: ebtables: Fix extension lookup with identical name") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_meta: improve the validity check of pkttype set exprLiping Zhang2016-08-252-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "meta pkttype set" is only supported on prerouting chain with bridge family and ingress chain with netdev family. But the validate check is incomplete, and the user can add the nft rules on input chain with bridge family, for example: # nft add table bridge filter # nft add chain bridge filter input {type filter hook input \ priority 0 \;} # nft add chain bridge filter test # nft add rule bridge filter test meta pkttype set unicast # nft add rule bridge filter input jump test This patch fixes the problem. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: cttimeout: unlink timeout objs in the unconfirmed ct listsLiping Zhang2016-08-251-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN reported this bug: BUG: KASAN: use-after-free in icmp_packet+0x25/0x50 [nf_conntrack_ipv4] at addr ffff880002db08c8 Read of size 4 by task lt-nf-queue/19041 Call Trace: <IRQ> [<ffffffff815eeebb>] dump_stack+0x63/0x88 [<ffffffff813386f8>] kasan_report_error+0x528/0x560 [<ffffffff81338cc8>] kasan_report+0x58/0x60 [<ffffffffa07393f5>] ? icmp_packet+0x25/0x50 [nf_conntrack_ipv4] [<ffffffff81337551>] __asan_load4+0x61/0x80 [<ffffffffa07393f5>] icmp_packet+0x25/0x50 [nf_conntrack_ipv4] [<ffffffffa06ecaa0>] nf_conntrack_in+0x550/0x980 [nf_conntrack] [<ffffffffa06ec550>] ? __nf_conntrack_confirm+0xb10/0xb10 [nf_conntrack] [ ... ] The main reason is that we missed to unlink the timeout objects in the unconfirmed ct lists, so we will access the timeout objects that have already been freed. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: cttimeout: put back l4proto when replacing timeout policyLiping Zhang2016-08-251-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | We forget to call nf_ct_l4proto_put when replacing the existing timeout policy. Acctually, there's no need to get ct l4proto before doing replace, so we can move it to a later position. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nfnetlink: use list_for_each_entry_safe to delete all objectsLiping Zhang2016-08-252-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | cttimeout and acct objects are deleted from the list while traversing it, so use list_for_each_entry is unsafe here. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_reject: restrict to INPUT/FORWARD/OUTPUTLiping Zhang2016-08-254-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After I add the nft rule "nft add rule filter prerouting reject with tcp reset", kernel panic happened on my system: NULL pointer dereference at ... IP: [<ffffffff81b9db2f>] nf_send_reset+0xaf/0x400 Call Trace: [<ffffffff81b9da80>] ? nf_reject_ip_tcphdr_get+0x160/0x160 [<ffffffffa0928061>] nft_reject_ipv4_eval+0x61/0xb0 [nft_reject_ipv4] [<ffffffffa08e836a>] nft_do_chain+0x1fa/0x890 [nf_tables] [<ffffffffa08e8170>] ? __nft_trace_packet+0x170/0x170 [nf_tables] [<ffffffffa06e0900>] ? nf_ct_invert_tuple+0xb0/0xc0 [nf_conntrack] [<ffffffffa07224d4>] ? nf_nat_setup_info+0x5d4/0x650 [nf_nat] [...] Because in the PREROUTING chain, routing information is not exist, then we will dereference the NULL pointer and oops happen. So we restrict reject expression to INPUT, FORWARD and OUTPUT chain. This is consistent with iptables REJECT target. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | Merge tag 'mac80211-for-davem-2016-08-30' of ↵David S. Miller2016-08-302-26/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Three little fixes: * revert a recent wext patch, which Ben Hutchings noticed was wrong, and it turns out not to be necessary for any driver * fix an infinite loop that can occur under certain conditions in mac80211's TDLS code (depending on regulatory information) * add a cfg80211_get_station() static inline when cfg80211 isn't built, to allow other modules to not have to depend on it for it ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mac80211: TDLS: don't require beaconing for AP BWArik Nemtsov2016-08-301-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop downgrading TDLS chandef when reaching the AP BW. The AP provides the necessary regulatory protection in this case. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=153961, which reported an infinite loop here. Reported-by: Kamil Toman <kamil.toman@gmail.com> Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"Johannes Berg2016-08-081-23/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3d5fdff46c4b2b9534fa2f9fc78e90a48e0ff724. Ben Hutchings pointed out that the commit isn't safe since it assumes that the structure used by the driver is iw_point, when in fact there's no way to know about that. Fortunately, the only driver in the tree that ever runs this code path is the wilc1000 staging driver, so it doesn't really matter. Clearly I should have investigated this better before applying, sorry. Reported-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org [though I guess it doesn't matter much] Fixes: 3d5fdff46c4b ("wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | Merge branch 'for-upstream' of ↵David S. Miller2016-08-265-4/+24
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2016-08-25 Here are a couple of important Bluetooth fixes for the 4.8 kernel: - Memory leak fix for HCI requests - Fix sk_filter handling with L2CAP - Fix sock_recvmsg behavior when MSG_TRUNC is not set Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not setLuiz Augusto von Dentz2016-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to bt_sock_recvmsg MSG_TRUNC shall be checked using the original flags not msg_flags. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not setLuiz Augusto von Dentz2016-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b5f34f9420b50c9b5876b9a2b68e96be6d629054 attempt to introduce proper handling for MSG_TRUNC but recv and variants should still work as read if no flag is passed, but because the code may set MSG_TRUNC to msg->msg_flags that shall not be used as it may cause it to be behave as if MSG_TRUNC is always, so instead of using it this changes the code to use the flags parameter which shall contain the original flags. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | Bluetooth: split sk_filter in l2cap_sock_recv_cbDaniel Borkmann2016-08-242-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During an audit for sk_filter(), we found that rx_busy_skb handling in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as intended. The assumption from commit e328140fdacb ("Bluetooth: Use event-driven approach for handling ERTM receive buffer") is that errors returned from sock_queue_rcv_skb() are due to receive buffer shortage. However, nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on the socket, that could drop some of the incoming skbs when handled in sock_queue_rcv_skb(). In that case sock_queue_rcv_skb() will return with -EPERM, propagated from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was that we failed due to receive buffer being full. From that point onwards, due to the to-be-dropped skb being held in rx_busy_skb, we cannot make any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(), due to the filter drop verdict over and over coming from sk_filter(). Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being dropped due to rx_busy_skb being occupied. Instead, just use __sock_queue_rcv_skb() where an error really tells that there's a receive buffer issue. Split the sk_filter() and enable it for non-segmented modes at queuing time since at this point in time the skb has already been through the ERTM state machine and it has been acked, so dropping is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in l2cap_data_rcv() so the packet can be dropped before the state machine sees it. Fixes: e328140fdacb ("Bluetooth: Use event-driven approach for handling ERTM receive buffer") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | Bluetooth: Fix memory leak at end of hci requestsFrederic Dalleau2016-08-241-0/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In hci_req_sync_complete the event skb is referenced in hdev->req_skb. It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will pass the skb to the caller, or __hci_req_sync which leaks. unreferenced object 0xffff880005339a00 (size 256): comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s) backtrace: [<ffffffff818d89d9>] kmemleak_alloc+0x49/0xa0 [<ffffffff8116bba8>] kmem_cache_alloc+0x128/0x180 [<ffffffff8167c1df>] skb_clone+0x4f/0xa0 [<ffffffff817aa351>] hci_event_packet+0xc1/0x3290 [<ffffffff8179a57b>] hci_rx_work+0x18b/0x360 [<ffffffff810692ea>] process_one_work+0x14a/0x440 [<ffffffff81069623>] worker_thread+0x43/0x4d0 [<ffffffff8106ead4>] kthread+0xc4/0xe0 [<ffffffff818dd38f>] ret_from_fork+0x1f/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Frédéric Dalleau <frederic.dalleau@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* | | | qdisc: fix a module refcount leak in qdisc_create_dflt()Eric Dumazet2016-08-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Should qdisc_alloc() fail, we must release the module refcount we got right before. Fixes: 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tipc: fix the error handling in tipc_udp_enable()Wei Yongjun2016-08-251-1/+4
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | Fix to return a negative error code in enable_mcast() error handling case, and release udp socket when necessary. Fixes: d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: diag: Fix refcnt leak in error path destroying socketDavid Ahern2016-08-232-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inet_diag_find_one_icsk takes a reference to a socket that is not released if sock_diag_destroy returns an error. Fix by changing tcp_diag_destroy to manage the refcnt for all cases and remove the sock_put calls from tcp_abort. Fixes: c1e64e298b8ca ("net: diag: Support destroying TCP sockets") Reported-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | udp: get rid of SLAB_DESTROY_BY_RCU allocationsEric Dumazet2016-08-234-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | After commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU") we do not need this special allocation mode anymore, even if it is harmless. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sctp: fix overrun in sctp_diag_dump_one()Lance Richardson2016-08-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function sctp_diag_dump_one() currently performs a memcpy() of 64 bytes from a 16 byte field into another 16 byte field. Fix by using correct size, use sizeof to obtain correct size instead of using a hard-coded constant. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Lance Richardson <lrichard@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>