summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* xfrm: Copy policy family in clone_policyHerbert Xu2017-12-141-0/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 0e74aa1d79a5bbc663e03a2804399cae418a0321 ] The syzbot found an ancient bug in the IPsec code. When we cloned a socket policy (for example, for a child TCP socket derived from a listening socket), we did not copy the family field. This results in a live policy with a zero family field. This triggers a BUG_ON check in the af_key code when the cloned policy is retrieved. This patch fixes it by copying the family field over. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: use the right sk after waking up from wait_buf sleepXin Long2017-12-141-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cea0cc80a6777beb6eb643d4ad53690e1ad1d4ff ] Commit dfcb9f4f99f1 ("sctp: deny peeloff operation on asocs with threads sleeping on it") fixed the race between peeloff and wait sndbuf by checking waitqueue_active(&asoc->wait) in sctp_do_peeloff(). But it actually doesn't work, as even if waitqueue_active returns false the waiting sndbuf thread may still not yet hold sk lock. After asoc is peeled off, sk is not asoc->base.sk any more, then to hold the old sk lock couldn't make assoc safe to access. This patch is to fix this by changing to hold the new sk lock if sk is not asoc->base.sk, meanwhile, also set the sk in sctp_sendmsg with the new sk. With this fix, there is no more race between peeloff and waitbuf, the check 'waitqueue_active' in sctp_do_peeloff can be removed. Thanks Marcelo and Neil for making this clear. v1->v2: fix it by changing to lock the new sock instead of adding a flag in asoc. Suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: do not free asoc when it is already dead in sctp_sendmsgXin Long2017-12-141-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ca3af4dd28cff4e7216e213ba3b671fbf9f84758 ] Now in sctp_sendmsg sctp_wait_for_sndbuf could schedule out without holding sock sk. It means the current asoc can be freed elsewhere, like when receiving an abort packet. If the asoc is just created in sctp_sendmsg and sctp_wait_for_sndbuf returns err, the asoc will be freed again due to new_asoc is not nil. An use-after-free issue would be triggered by this. This patch is to fix it by setting new_asoc with nil if the asoc is already dead when cpu schedules back, so that it will not be freed again in sctp_sendmsg. v1->v2: set new_asoc as nil in sctp_sendmsg instead of sctp_wait_for_sndbuf. Suggested-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sunrpc: Fix rpc_task_begin trace pointChuck Lever2017-12-141-2/+1
| | | | | | | | | | | | [ Upstream commit b2bfe5915d5fe7577221031a39ac722a0a2a1199 ] The rpc_task_begin trace point always display a task ID of zero. Move the trace point call site so that it picks up the new task ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* route: update fnhe_expires for redirect when the fnhe existsXin Long2017-12-141-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e39d5246111399dbc6e11cd39fd8580191b86c47 ] Now when creating fnhe for redirect, it sets fnhe_expires for this new route cache. But when updating the exist one, it doesn't do it. It will cause this fnhe never to be expired. Paolo already noticed it before, in Jianlin's test case, it became even worse: When ip route flush cache, the old fnhe is not to be removed, but only clean it's members. When redirect comes again, this fnhe will be found and updated, but never be expired due to fnhe_expires not being set. So fix it by simply updating fnhe_expires even it's for redirect. Fixes: aee06da6726d ("ipv4: use seqlock for nh_exceptions") Reported-by: Jianlin Shi <jishi@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* route: also update fnhe_genid when updating a route cacheXin Long2017-12-141-2/+7
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit cebe84c6190d741045a322f5343f717139993c08 ] Now when ip route flush cache and it turn out all fnhe_genid != genid. If a redirect/pmtu icmp packet comes and the old fnhe is found and all it's members but fnhe_genid will be updated. Then next time when it looks up route and tries to rebind this fnhe to the new dst, the fnhe will be flushed due to fnhe_genid != genid. It causes this redirect/pmtu icmp packet acutally not to be applied. This patch is to also reset fnhe_genid when updating a route cache. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* gre6: use log_ecn_error module parameter in ip6_tnl_rcv()Alexey Kodanev2017-12-141-1/+1
| | | | | | | | | | | | | | [ Upstream commit 981542c526ecd846920bc500e9989da906ee9fb9 ] After commit 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") it's not used anywhere in the module, but previously was used in ip6gre_rcv(). Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: don't track fragmented packetsFlorian Westphal2017-12-142-5/+4
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7b4fdf77a450ec0fdcb2f677b080ddbf2c186544 ] Andrey reports syzkaller splat caused by NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb))); in ipv4 nat. But this assertion (and the comment) are wrong, this function does see fragments when IP_NODEFRAG setsockopt is used. As conntrack doesn't track packets without complete l4 header, only the first fragment is tracked. Because applying nat to first packet but not the rest makes no sense this also turns off tracking of all fragments. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ipv6: reorder icmpv6_init() and ip6_mr_init()WANG Cong2017-12-141-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 15e668070a64bb97f102ad9cf3bccbca0545cda8 ] Andrey reported the following kernel crash: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88001f311700 task.stack: ffff88001f6e8000 RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618 RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000 RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020 RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000 R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040 FS: 00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0 DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432 sock_release+0x8d/0x1e0 net/socket.c:597 __sock_create+0x39d/0x880 net/socket.c:1226 sock_create_kern+0x3f/0x50 net/socket.c:1243 inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526 icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954 ops_init+0x10a/0x550 net/core/net_namespace.c:115 setup_net+0x261/0x660 net/core/net_namespace.c:291 copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396 9pnet_virtio: no channels available for device ./file1 create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205 SYSC_unshare kernel/fork.c:2281 [inline] SyS_unshare+0x64e/0x1000 kernel/fork.c:2231 entry_SYSCALL_64_fastpath+0x1f/0xc2 This is because net->ipv6.mr6_tables is not initialized at that point, ip6mr_rules_init() is not called yet, therefore on the error path when we iterator the list, we trigger this oops. Fix this by reordering ip6mr_rules_init() before icmpv6_sk_init(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* rds: tcp: Sequence teardown of listen and acceptor sockets to avoid racesSowmini Varadhan2017-12-143-8/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b21dd4506b71bdb9c5a20e759255cd2513ea7ebe ] Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in rds_tcp_listen_data_ready") added the function rds_tcp_listen_sock_def_readable() to handle the case when a partially set-up acceptor socket drops into rds_tcp_listen_data_ready(). However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be null and we would hit a panic of the form BUG: unable to handle kernel NULL pointer dereference at (null) IP: (null) : ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp] tcp_data_queue+0x39d/0x5b0 tcp_rcv_established+0x2e5/0x660 tcp_v4_do_rcv+0x122/0x220 tcp_v4_rcv+0x8b7/0x980 : In the above case, it is not fatal to encounter a NULL value for ready- we should just drop the packet and let the flush of the acceptor thread finish gracefully. In general, the tear-down sequence for listen() and accept() socket that is ensured by this commit is: rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */ In rds_tcp_listen_stop(): serialize with, and prevent, further callbacks using lock_sock() flush rds_wq flush acceptor workq sock_release(listen socket) Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vti6: Don't report path MTU below IPV6_MIN_MTU.Steffen Klassert2017-12-141-2/+6
| | | | | | | | | | | | | | [ Upstream commit e3dc847a5f85b43ee2bfc8eae407a7e383483228 ] In vti6_xmit(), the check for IPV6_MIN_MTU before we send a ICMPV6_PKT_TOOBIG message is missing. So we might report a PMTU below 1280. Fix this by adding the required check. Fixes: ccd740cbc6e ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tcp: correct memory barrier usage in tcp_check_space()Jason Baron2017-12-091-1/+1
| | | | | | | | | | | | | | | | | [ Upstream commit 56d806222ace4c3aeae516cd7a855340fb2839d8 ] sock_reset_flag() maps to __clear_bit() not the atomic version clear_bit(). Thus, we need smp_mb(), smp_mb__after_atomic() is not sufficient. Fixes: 3c7151275c0c ("tcp: add memory barriers to write space paths") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tipc: fix cleanup at module unloadParthasarathy Bhuvaragan2017-12-091-3/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 35e22e49a5d6a741ebe7f2dd280b2052c3003ef7 ] In tipc_server_stop(), we iterate over the connections with limiting factor as server's idr_in_use. We ignore the fact that this variable is decremented in tipc_close_conn(), leading to premature exit. In this commit, we iterate until the we have no connections left. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tipc: fix nametbl_lock soft lockup at module exitParthasarathy Bhuvaragan2017-12-091-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9dc3abdd1f7ea524e8552e0a3ef01219892ed1f4 ] Commit 333f796235a527 ("tipc: fix a race condition leading to subscriber refcnt bug") reveals a soft lockup while acquiring nametbl_lock. Before commit 333f796235a527, we call tipc_conn_shutdown() from tipc_close_conn() in the context of tipc_topsrv_stop(). In that context, we are allowed to grab the nametbl_lock. Commit 333f796235a527, moved tipc_conn_release (renamed from tipc_conn_shutdown) to the connection refcount cleanup. This allows either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup. Since tipc_exit_net() first calls tipc_topsrv_stop() and then tipc_nametble_withdraw() increases the chances for the later to perform the connection cleanup. The soft lockup occurs in the call chain of tipc_nametbl_withdraw(), when it performs the tipc_conn_kref_release() as it tries to grab nametbl_lock again while holding it already. tipc_nametbl_withdraw() grabs nametbl_lock tipc_nametbl_remove_publ() tipc_subscrp_report_overlap() tipc_subscrp_send_event() tipc_conn_sendmsg() << if (con->flags != CF_CONNECTED) we do conn_put(), triggering the cleanup as refcount=0. >> tipc_conn_kref_release tipc_sock_release tipc_conn_release tipc_subscrb_delete tipc_subscrp_delete tipc_nametbl_unsubscribe << Soft Lockup >> The previous changes in this series fixes the race conditions fixed by commit 333f796235a527. Hence we can now revert the commit. Fixes: 333f796235a52727 ("tipc: fix a race condition leading to subscriber refcnt bug") Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: don't try to sleep in rate_control_rate_init()Johannes Berg2017-12-091-2/+0
| | | | | | | | | | | | | | | | [ Upstream commit 115865fa0826ed18ca04717cf72d0fe874c0fe7f ] In my previous patch, I missed that rate_control_rate_init() is called from some places that cannot sleep, so it cannot call ieee80211_recalc_min_chandef(). Remove that call for now to fix the context bug, we'll have to find a different way to fix the minimum channel width issue. Fixes: 96aa2e7cf126 ("mac80211: calculate min channel width correctly") Reported-by: Xiaolong Ye (via lkp-robot) <xiaolong.ye@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sctp: fix array overrun read on sctp_timer_tblColin Ian King2017-12-091-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit 0e73fc9a56f22f2eec4d2b2910c649f7af67b74d ] The comparison on the timeout can lead to an array overrun read on sctp_timer_tbl because of an off-by-one error. Fix this by using < instead of <= and also compare to the array size rather than SCTP_EVENT_TIMEOUT_MAX. Fixes CoverityScan CID#1397639 ("Out-of-bounds read") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: prevent skb/txq mismatchMichal Kazior2017-12-091-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit dbef53621116474bb883f76f0ba6b7640bc42332 ] Station structure is considered as not uploaded (to driver) until drv_sta_state() finishes. This call is however done after the structure is attached to mac80211 internal lists and hashes. This means mac80211 can lookup (and use) station structure before it is uploaded to a driver. If this happens (structure exists, but sta->uploaded is false) fast_tx path can still be taken. Deep in the fastpath call the sta->uploaded is checked against to derive "pubsta" argument for ieee80211_get_txq(). If sta->uploaded is false (and sta is actually non-NULL) ieee80211_get_txq() effectively downgraded to vif->txq. At first glance this may look innocent but coerces mac80211 into a state that is almost guaranteed (codel may drop offending skb) to crash because a station-oriented skb gets queued up on vif-oriented txq. The ieee80211_tx_dequeue() ends up looking at info->control.flags and tries to use txq->sta which in the fail case is NULL. It's probably pointless to pretend one can downgrade skb from sta-txq to vif-txq. Since downgrading unicast traffic to vif->txq must not be done there's no txq to put a frame on if sta->uploaded is false. Therefore the code is made to fall back to regular tx() op path if the described condition is hit. Only drivers using wake_tx_queue were affected. Example crash dump before fix: Unable to handle kernel paging request at virtual address ffffe26c PC is at ieee80211_tx_dequeue+0x204/0x690 [mac80211] [<bf4252a4>] (ieee80211_tx_dequeue [mac80211]) from [<bf4b1388>] (ath10k_mac_tx_push_txq+0x54/0x1c0 [ath10k_core]) [<bf4b1388>] (ath10k_mac_tx_push_txq [ath10k_core]) from [<bf4bdfbc>] (ath10k_htt_txrx_compl_task+0xd78/0x11d0 [ath10k_core]) [<bf4bdfbc>] (ath10k_htt_txrx_compl_task [ath10k_core]) [<bf51c5a4>] (ath10k_pci_napi_poll+0x54/0xe8 [ath10k_pci]) [<bf51c5a4>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c0572e90>] (net_rx_action+0xac/0x160) Reported-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: calculate min channel width correctlyJohannes Berg2017-12-092-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 96aa2e7cf126773b16c6c19b7474a8a38d3c707e ] In the current minimum chandef code there's an issue in that the recalculation can happen after rate control is initialized for a station that has a wider bandwidth than the current chanctx, and then rate control can immediately start using those higher rates which could cause problems. Observe that first of all that this problem is because we don't take non-associated and non-uploaded stations into account. The restriction to non-associated is quite pointless and is one of the causes for the problem described above, since the rate init will happen before the station is set to associated; no frames could actually be sent until associated, but the rate table can already contain higher rates and that might cause problems. Also, rejecting non-uploaded stations is wrong, since the rate control can select higher rates for those as well. Secondly, it's then necessary to recalculate the minimal config before initializing rate control, so that when rate control is initialized, the higher rates are already available. This can be done easily by adding the necessary function call in rate init. Change-Id: Ib9bc02d34797078db55459d196993f39dcd43070 Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: qrtr: Mark 'buf' as little endianStephen Boyd2017-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3512a1ad56174308a9fd3e10f4b1e3e152e9ec01 ] Failure to mark this pointer as __le32 causes checkers like sparse to complain: net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident> Silence it. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vti6: fix device register to report IFLA_INFO_KINDDavid Forster2017-12-091-1/+1
| | | | | | | | | | | | | [ Upstream commit 93e246f783e6bd1bc64fdfbfe68b18161f69b28e ] vti6 interface is registered before the rtnl_link_ops block is attached. As a result the resulting RTM_NEWLINK is missing IFLA_INFO_KIND. Re-order attachment of rtnl_link_ops block to fix. Signed-off-by: Dave Forster <dforster@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookupsGuillaume Nault2017-12-092-27/+12
| | | | | | | | | | | | | | | | [ Upstream commit a9b2dff80be979432484afaf7f8d8e73f9e8838a ] For connected sockets, __l2tp_ip{,6}_bind_lookup() needs to check the remote IP when looking for a matching socket. Otherwise a connected socket can receive traffic not originating from its peer. Drop l2tp_ip_bind_lookup() and l2tp_ip6_bind_lookup() instead of updating their prototype, as these functions aren't used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: nf_tables: fix oob accessFlorian Westphal2017-11-301-1/+1
| | | | | | | | | | | | | | | [ Upstream commit 3e38df136e453aa69eb4472108ebce2fb00b1ba6 ] BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8 Read of size 8 by task nft/1607 When we've destroyed last valid expr, nft_expr_next() returns an invalid expr. We must not dereference it unless it passes != nft_expr_last() check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: nft_queue: use raw_smp_processor_id()Pablo Neira Ayuso2017-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c2e756ff9e699865d294cdc112acfc36419cf5cc ] Using smp_processor_id() causes splats with PREEMPT_RCU: [19379.552780] BUG: using smp_processor_id() in preemptible [00000000] code: ping/32389 [19379.552793] caller is debug_smp_processor_id+0x17/0x19 [...] [19379.552823] Call Trace: [19379.552832] [<ffffffff81274e9e>] dump_stack+0x67/0x90 [19379.552837] [<ffffffff8129a4d4>] check_preemption_disabled+0xe5/0xf5 [19379.552842] [<ffffffff8129a4fb>] debug_smp_processor_id+0x17/0x19 [19379.552849] [<ffffffffa07c42dd>] nft_queue_eval+0x35/0x20c [nft_queue] No need to disable preemption since we only fetch the numeric value, so let's use raw_smp_processor_id() instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: Suppress NEW_PEER_CANDIDATE event if no roomMasashi Honma2017-11-301-6/+8
| | | | | | | | | | | | | [ Upstream commit 11197d006bcfabf0173a7820a163fcaac420d10e ] Previously, kernel sends NEW_PEER_CANDIDATE event to user land even if the found peer does not have any room to accept other peer. This causes continuous connection trials. Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: Remove invalid flag operations in mesh TSF synchronizationMasashi Honma2017-11-303-15/+0
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 76f43b4c0a9337af22827d78de4f2b8fd5328489 ] mesh_sync_offset_adjust_tbtt() implements Extensible synchronization framework ([1] 13.13.2 Extensible synchronization framework). It shall not operate the flag "TBTT Adjusting subfield" ([1] 8.4.2.100.8 Mesh Capability), since it is used only for MBCA ([1] 13.13.4 Mesh beacon collision avoidance, see 13.13.4.4.3 TBTT scanning and adjustment procedures for detail). So this patch remove the flag operations. [1] IEEE Std 802.11 2012 Signed-off-by: Masashi Honma <masashi.honma@gmail.com> [remove adjusting_tbtt entirely, since it's now unused] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: Allow IP_MULTICAST_IF to set index to L3 slaveDavid Ahern2017-11-302-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7bb387c5ab12aeac3d5eea28686489ff46b53ca9 ] IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index does not match it. e.g., ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument Relax the check in setsockopt to allow setting mc_index to an L3 slave if sk_bound_dev_if points to an L3 master. Make a similar change for IPv6. In this case change the device lookup to take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for the lookup of a potential L3 master device. This really only silences a setsockopt failure since uses of mc_index are secondary to sk_bound_dev_if if it is set. In both cases, if either index is an L3 slave or master, lookups are directed to the same FIB table so relaxing the check at setsockopt time causes no harm. Patch is based on a suggested change by Darwin for a problem noted in their code base. Suggested-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* RDS: RDMA: fix the ib_map_mr_sg_zbva() argumentSantosh Shilimkar2017-11-301-2/+3
| | | | | | | | | | [ Upstream commit 3e56c2f856d7aba6a03feea834d68f9c05f7d0b6 ] Fixes warning: Using plain integer as NULL pointer Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* RDS: RDMA: return appropriate error on rdma map failuresSantosh Shilimkar2017-11-301-1/+10
| | | | | | | | | | | | | | | | | | [ Upstream commit 584a8279a44a800dea5a5c1e9d53a002e03016b4 ] The first message to a remote node should prompt a new connection even if it is RDMA operation. For RDMA operation the MR mapping can fail because connections is not yet up. Since the connection establishment is asynchronous, we make sure the map failure because of unavailable connection reach to the user by appropriate error code. Before returning to the user, lets trigger the connection so that its ready for the next retry. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* RDS: make message size limit compliant with specAvinash Repaka2017-11-303-1/+42
| | | | | | | | | | | | | [ Upstream commit f9fb69adb6c7acca60977a4db5a5f95b8e66c041 ] RDS support max message size as 1M but the code doesn't check this in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size and its enforced irrespective of underlying transport. Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/9p: Switch to wait_event_killable()Tuomas Tynkkynen2017-11-302-9/+7
| | | | | | | | | | | | | | | | | | | commit 9523feac272ccad2ad8186ba4fcc89103754de52 upstream. Because userspace gets Very Unhappy when calls like stat() and execve() return -EINTR on 9p filesystem mounts. For instance, when bash is looking in PATH for things to execute and some SIGCHLD interrupts stat(), bash can throw a spurious 'command not found' since it doesn't retry the stat(). In practice, hitting the problem is rare and needs a really slow/bogged down 9p server. Signed-off-by: Tuomas Tynkkynen <tuomas@tuxera.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFC: fix device-allocation error returnJohan Hovold2017-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | commit c45e3e4c5b134b081e8af362109905427967eb19 upstream. A recent change fixing NFC device allocation itself introduced an error-handling bug by returning an error pointer in case device-id allocation failed. This is clearly broken as the callers still expected NULL to be returned on errors as detected by Dan's static checker. Fix this up by returning NULL in the event that we've run out of memory when allocating a new device id. Note that the offending commit is marked for stable (3.8) so this fix needs to be backported along with it. Fixes: 20777bc57c34 ("NFC: fix broken device allocation") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* libceph: don't WARN() if user tries to add invalid keyEric Biggers2017-11-301-1/+3
| | | | | | | | | | | | | | | | | | | | | commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream. The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a user tries to add a key of type "ceph" with an invalid payload as follows (assuming CONFIG_CEPH_LIB=y): echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \ | keyctl padd ceph desc @s This can be hit by fuzzers. As this is merely bad input and not a kernel bug, replace the WARN_ON() with return -EINVAL. Fixes: 7af3ea189a9a ("libceph: stop allocating a new cipher on every crypto request") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: use new wait API for vsock_stream_sendmsg()WANG Cong2017-11-301-13/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 499fde662f1957e3cb8d192a94a099ebe19c714b upstream. As reported by Michal, vsock_stream_sendmsg() could still sleep at vsock_stream_has_space() after prepare_to_wait(): vsock_stream_has_space vmci_transport_stream_has_space vmci_qpair_produce_free_space qp_lock qp_acquire_queue_mutex mutex_lock Just switch to the new wait API like we did for commit d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()"). Reported-by: Michal Kubecek <mkubecek@suse.cz> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: "Jorgen S. Hansen" <jhansen@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTERWANG Cong2017-11-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | commit 76da0704507bbc51875013f6557877ab308cfd0a upstream. In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy <jeffy.chen@rock-chips.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/sctp: Always set scope_id in sctp_inet6_skb_msgnameEric W. Biederman2017-11-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7c8a61d9ee1df0fb4747879fa67a99614eb62fec ] Alexandar Potapenko while testing the kernel with KMSAN and syzkaller discovered that in some configurations sctp would leak 4 bytes of kernel stack. Working with his reproducer I discovered that those 4 bytes that are leaked is the scope id of an ipv6 address returned by recvmsg. With a little code inspection and a shrewd guess I discovered that sctp_inet6_skb_msgname only initializes the scope_id field for link local ipv6 addresses to the interface index the link local address pertains to instead of initializing the scope_id field for all ipv6 addresses. That is almost reasonable as scope_id's are meaniningful only for link local addresses. Set the scope_id in all other cases to 0 which is not a valid interface index to make it clear there is nothing useful in the scope_id field. There should be no danger of breaking userspace as the stack leak guaranteed that previously meaningless random data was being returned. Fixes: 372f525b495c ("SCTP: Resync with LKSCTP tree.") History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: do not peel off an assoc from one netns to another oneXin Long2017-11-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit df80cd9b28b9ebaa284a41df611dbf3a2d05ca74 ] Now when peeling off an association to the sock in another netns, all transports in this assoc are not to be rehashed and keep use the old key in hashtable. As a transport uses sk->net as the hash key to insert into hashtable, it would miss removing these transports from hashtable due to the new netns when closing the sock and all transports are being freeed, then later an use-after-free issue could be caused when looking up an asoc and dereferencing those transports. This is a very old issue since very beginning, ChunYu found it with syzkaller fuzz testing with this series: socket$inet6_sctp() bind$inet6() sendto$inet6() unshare(0x40000000) getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST() getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF() This patch is to block this call when peeling one assoc off from one netns to another one, so that the netns of all transport would not go out-sync with the key in hashtable. Note that this patch didn't fix it by rehashing transports, as it's difficult to handle the situation when the tuple is already in use in the new netns. Besides, no one would like to peel off one assoc to another netns, considering ipaddrs, ifaces, etc. are usually different. Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* af_netlink: ensure that NLMSG_DONE never fails in dumpsJason A. Donenfeld2017-11-242-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0642840b8bb008528dbdf929cec9f65ac4231ad0 ] The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vlan: fix a use-after-free in vlan_device_event()Cong Wang2017-11-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 052d41c01b3a2e3371d66de569717353af489d63 ] After refcnt reaches zero, vlan_vid_del() could free dev->vlan_info via RCU: RCU_INIT_POINTER(dev->vlan_info, NULL); call_rcu(&vlan_info->rcu, vlan_info_rcu_free); However, the pointer 'grp' still points to that memory since it is set before vlan_vid_del(): vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) goto out; grp = &vlan_info->grp; Depends on when that RCU callback is scheduled, we could trigger a use-after-free in vlan_group_for_each_dev() right following this vlan_vid_del(). Fix it by moving vlan_vid_del() before setting grp. This is also symmetric to the vlan_vid_add() we call in vlan_device_event(). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct") Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter/ipvs: clear ipvs_property flag when SKB net namespace changedYe Yin2017-11-241-0/+1
| | | | | | | | | | | | | | | | | [ Upstream commit 2b5ec1a5f9738ee7bf8f5ec0526e75e00362c48f ] When run ipvs in two different network namespace at the same host, and one ipvs transport network traffic to the other network namespace ipvs. 'ipvs_property' flag will make the second ipvs take no effect. So we should clear 'ipvs_property' when SKB network namespace changed. Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()") Signed-off-by: Ye Yin <hustcat@gmail.com> Signed-off-by: Wei Zhou <chouryzhou@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tcp: do not mangle skb->cb[] in tcp_make_synack()Eric Dumazet2017-11-241-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3b11775033dc87c3d161996c54507b15ba26414a ] Christoph Paasch sent a patch to address the following issue : tcp_make_synack() is leaving some TCP private info in skb->cb[], then send the packet by other means than tcp_transmit_skb() tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse IPv4/IPV6 stacks, but we have no such cleanup for SYNACK. tcp_make_synack() should not use tcp_init_nondata_skb() : tcp_init_nondata_skb() really should be limited to skbs put in write/rtx queues (the ones that are only sent via tcp_transmit_skb()) This patch fixes the issue and should even save few cpu cycles ;) Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tcp_nv: fix division by zero in tcpnv_acked()Konstantin Khlebnikov2017-11-241-1/+1
| | | | | | | | | | | | [ Upstream commit 4eebff27ca4182bbf5f039dd60d79e2d7c0a707e ] Average RTT could become zero. This happened in real life at least twice. This patch treats zero as 1us. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Lawrence Brakmo <Brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tcp: provide timestamps for partial writesSoheil Hassas Yeganeh2017-11-211-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ad02c4f547826167a709dab8a89a1caefd2c1f50 ] For TCP sockets, TX timestamps are only captured when the user data is successfully and fully written to the socket. In many cases, however, TCP writes can be partial for which no timestamp is collected. Collect timestamps whenever any user data is (fully or partially) copied into the socket. Pass tcp_write_queue_tail to tcp_tx_timestamp instead of the local skb pointer since it can be set to NULL on the error path. Note that tcp_write_queue_tail can be NULL, even if bytes have been copied to the socket. This is because acknowledgements are being processed in tcp_sendmsg(), and by the time tcp_tx_timestamp is called tcp_write_queue_tail can be NULL. For such cases, this patch does not collect any timestamps (i.e., it is best-effort). This patch is written with suggestions from Willem de Bruijn and Eric Dumazet. Change-log V1 -> V2: - Use sockc.tsflags instead of sk->sk_tsflags. - Use the same code path for normal writes and errors. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"Florian Westphal2017-11-181-78/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream. This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1. It was not a good idea. The custom hash table was a much better fit for this purpose. A fast lookup is not essential, in fact for most cases there is no lookup at all because original tuple is not taken and can be used as-is. What needs to be fast is insertion and deletion. rhlist removal however requires a rhlist walk. We can have thousands of entries in such a list if source port/addresses are reused for multiple flows, if this happens removal requests are so expensive that deletions of a few thousand flows can take several seconds(!). The advantages that we got from rhashtable are: 1) table auto-sizing 2) multiple locks 1) would be nice to have, but it is not essential as we have at most one lookup per new flow, so even a million flows in the bysource table are not a problem compared to current deletion cost. 2) is easy to add to custom hash table. I tried to add hlist_node to rhlist to speed up rhltable_remove but this isn't doable without changing semantics. rhltable_remove_fast will check that the to-be-deleted object is part of the table and that requires a list walk that we want to avoid. Furthermore, using hlist_node increases size of struct rhlist_head, which in turn increases nf_conn size. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821 Reported-by: Ivan Babrou <ibobrik@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: nat: avoid use of nf_conn_nat extensionFlorian Westphal2017-11-182-15/+5
| | | | | | | | | | | | | commit 6e699867f84c0f358fed233fe6162173aca28e04 upstream. successful insert into the bysource hash sets IPS_SRC_NAT_DONE status bit so we can check that instead of presence of nat extension which requires extra deref. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: don't compare TKIP TX MIC key in reinstall preventionJohannes Berg2017-11-181-2/+34
| | | | | | | | | | | | | | | | | | | | commit cfbb0d90a7abb289edc91833d0905931f8805f12 upstream. For the reinstall prevention, the code I had added compares the whole key. It turns out though that iwlwifi firmware doesn't provide the TKIP TX MIC key as it's not needed in client mode, and thus the comparison will always return false. For client mode, thus always zero out the TX MIC key part before doing the comparison in order to avoid accepting the reinstall of the key with identical encryption and RX MIC key, but not the same TX MIC key (since the supplicant provides the real one.) Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: use constant time comparison with keysJason A. Donenfeld2017-11-181-1/+2
| | | | | | | | | | | | | commit 2bdd713b92a9cade239d3c7d15205a09f556624d upstream. Otherwise we risk leaking information via timing side channel. Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: accept key reinstall without changing anythingJohannes Berg2017-11-181-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | commit fdf7cb4185b60c68e1a75e61691c4afdc15dea0e upstream. When a key is reinstalled we can reset the replay counters etc. which can lead to nonce reuse and/or replay detection being impossible, breaking security properties, as described in the "KRACK attacks". In particular, CVE-2017-13080 applies to GTK rekeying that happened in firmware while the host is in D3, with the second part of the attack being done after the host wakes up. In this case, the wpa_supplicant mitigation isn't sufficient since wpa_supplicant doesn't know the GTK material. In case this happens, simply silently accept the new key coming from userspace but don't take any action on it since it's the same key; this keeps the PN replay counters intact. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net_sched: avoid matching qdisc with zero handleCong Wang2017-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 50317fce2cc70a2bbbc4b42c31bbad510382a53c ] Davide found the following script triggers a NULL pointer dereference: ip l a name eth0 type dummy tc q a dev eth0 parent :1 handle 1: htb This is because for a freshly created netdevice noop_qdisc is attached and when passing 'parent :1', kernel actually tries to match the major handle which is 0 and noop_qdisc has handle 0 so is matched by mistake. Commit 69012ae425d7 tries to fix a similar bug but still misses this case. Handle 0 is not a valid one, should be just skipped. In fact, kernel uses it as TC_H_UNSPEC. Fixes: 69012ae425d7 ("net: sched: fix handling of singleton qdiscs with qdisc_hash") Fixes: 59cc1f61f09c ("net: sched:convert qdisc linked list to hashtable") Reported-by: Davide Caratti <dcaratti@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: reset owner sk for data chunks on out queues when migrating a sockXin Long2017-11-181-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d04adf1b355181e737b6b1e23d801b07f0b7c4c0 ] Now when migrating sock to another one in sctp_sock_migrate(), it only resets owner sk for the data in receive queues, not the chunks on out queues. It would cause that data chunks length on the sock is not consistent with sk sk_wmem_alloc. When closing the sock or freeing these chunks, the old sk would never be freed, and the new sock may crash due to the overflow sk_wmem_alloc. syzbot found this issue with this series: r0 = socket$inet_sctp() sendto$inet(r0) listen(r0) accept4(r0) close(r0) Although listen() should have returned error when one TCP-style socket is in connecting (I may fix this one in another patch), it could also be reproduced by peeling off an assoc. This issue is there since very beginning. This patch is to reset owner sk for the chunks on out queues so that sk sk_wmem_alloc has correct value after accept one sock or peeloff an assoc to one sock. Note that when resetting owner sk for chunks on outqueue, it has to sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk first and then sctp_set_owner_w them after changing assoc->base.sk, due to that sctp_wfree and it's callees are using assoc->base.sk. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmitXin Long2017-11-181-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8aec4959d832bae0889a8e2f348973b5e4abffef ] When receiving a Toobig icmpv6 packet, ip6gre_err would just set tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may still be using the old value, it has no chance to be updated with tunnel dev's mtu. Jianlin found this issue by reducing route's mtu while running netperf, the performance went to 0. ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig to upper socket after setting tunnel dev's mtu. We couldn't do that for ip6_gre, as gre's inner packet could be any protocol, it's difficult to handle them (like lookup upper dst) in a good way. So this patch is to fix it by updating skb_dst(skb)'s pmtu when dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no performance regression can be caused by this. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>