summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
...
* net:tipc: Fix a double free in tipc_sk_mcast_rcvLv Yunlong2021-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 6bf24dc0cc0cc43b29ba344b66d78590e687e046 ] In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq) finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time. Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after the branch completed. My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because this skb will be freed by kfree_skb(skb) finally. Fixes: cb1b728096f54 ("tipc: eliminate race condition at multicast reception") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: dsa: Fix type was not set for devlink portMaxim Kochetkov2021-04-141-1/+7
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit fb6ec87f7229b92baa81b35cbc76f2626d5bfadb ] If PHY is not available on DSA port (described at devicetree but absent or failed to detect) then kernel prints warning after 3700 secs: [ 3707.948771] ------------[ cut here ]------------ [ 3707.948784] Type was not set for devlink port. [ 3707.948894] WARNING: CPU: 1 PID: 17 at net/core/devlink.c:8097 0xc083f9d8 We should unregister the devlink port as a user port and re-register it as an unused port before executing "continue" in case of dsa_port_setup error. Fixes: 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal") Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZEOliver Hartkopp2021-04-141-4/+7
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f522d9559b07854c231cf8f0b8cb5a3578f8b44e ] Since commit f5223e9eee65 ("can: extend sockaddr_can to include j1939 members") the sockaddr_can has been extended in size and a new CAN_REQUIRED_SIZE macro has been introduced to calculate the protocol specific needed size. The ABI for the msg_name and msg_namelen has not been adapted to the new CAN_REQUIRED_SIZE macro for the other CAN protocols which leads to a problem when an existing binary reads the (increased) struct sockaddr_can in msg_name. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Reported-by: Richard Weinberger <richard@nod.at> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Link: https://lore.kernel.org/linux-can/1135648123.112255.1616613706554.JavaMail.zimbra@nod.at/T/#t Link: https://lore.kernel.org/r/20210325125850.1620-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZEOliver Hartkopp2021-04-142-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9e9714742fb70467464359693a73b911a630226f ] Since commit f5223e9eee65 ("can: extend sockaddr_can to include j1939 members") the sockaddr_can has been extended in size and a new CAN_REQUIRED_SIZE macro has been introduced to calculate the protocol specific needed size. The ABI for the msg_name and msg_namelen has not been adapted to the new CAN_REQUIRED_SIZE macro for the other CAN protocols which leads to a problem when an existing binary reads the (increased) struct sockaddr_can in msg_name. Fixes: f5223e9eee65 ("can: extend sockaddr_can to include j1939 members") Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Link: https://lore.kernel.org/linux-can/1135648123.112255.1616613706554.JavaMail.zimbra@nod.at/T/#t Link: https://lore.kernel.org/r/20210325125850.1620-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm: Provide private skb extensions for segmented and hw offloaded ESP packetsSteffen Klassert2021-04-143-4/+20
| | | | | | | | | | | | | | | | | [ Upstream commit c7dbf4c08868d9db89b8bfe8f8245ca61b01ed2f ] Commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.") added a XFRM_XMIT flag to avoid duplicate ESP trailer insertion on HW offload. This flag is set on the secpath that is shared amongst segments. This lead to a situation where some segments are not transformed correctly when segmentation happens at layer 3. Fix this by using private skb extensions for segmented and hw offloaded ESP packets. Fixes: 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* esp: delete NETIF_F_SCTP_CRC bit from features for esp offloadXin Long2021-04-142-4/+8
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 154deab6a3ba47792936edf77f2f13a1cbc4351d ] Now in esp4/6_gso_segment(), before calling inner proto .gso_segment, NETIF_F_CSUM_MASK bits are deleted, as HW won't be able to do the csum for inner proto due to the packet encrypted already. So the UDP/TCP packet has to do the checksum on its own .gso_segment. But SCTP is using CRC checksum, and for that NETIF_F_SCTP_CRC should be deleted to make SCTP do the csum in own .gso_segment as well. In Xiumei's testing with SCTP over IPsec/veth, the packets are kept dropping due to the wrong CRC checksum. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: xfrm: Localize sequence counter per network namespaceAhmed S. Darwish2021-04-141-5/+5
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit e88add19f68191448427a6e4eb059664650a837f ] A sequence counter write section must be serialized or its internal state can get corrupted. The "xfrm_state_hash_generation" seqcount is global, but its write serialization lock (net->xfrm.xfrm_state_lock) is instantiated per network namespace. The write protection is thus insufficient. To provide full protection, localize the sequence counter per network namespace instead. This should be safe as both the seqcount read and write sections access data exclusively within the network namespace. It also lays the foundation for transforming "xfrm_state_hash_generation" data type from seqcount_t to seqcount_LOCKNAME_t in further commits. Fixes: b65e3d7be06f ("xfrm: state: add sequence count to detect hash resizes") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm: Use actual socket sk instead of skb socket for xfrm_output_resumeEvan Nimmo2021-04-145-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9ab1265d52314fce1b51e8665ea6dbc9ac1a027c ] A situation can occur where the interface bound to the sk is different to the interface bound to the sk attached to the skb. The interface bound to the sk is the correct one however this information is lost inside xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume instead. This assumes that the sk bound interface and the bound interface attached to the sk within the skb are the same which can lead to lookup failures inside ip_route_me_harder resulting in the packet being dropped. We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the global VRF however we have an encapsulated dot1q tunnel interface that is within a different VRF. We also have a mangle rule that marks the packets causing them to be processed inside ip_route_me_harder. Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this worked fine as the sk attached to the skb was changed from the dot1q encapsulated interface to the sk for the tunnel which meant the interface bound to the sk and the interface bound to the skb were identical. Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk when routing harder") fixed some of these issues however a similar problem existed in the xfrm code. Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm: interface: fix ipv4 pmtu check to honor ip header dfEyal Birger2021-04-141-0/+3
| | | | | | | | | | | | | | | | [ Upstream commit 8fc0e3b6a8666d656923d214e4dc791e9a17164a ] Frag needed should only be sent if the header enables DF. This fix allows packets larger than MTU to pass the xfrm interface and be fragmented after encapsulation, aligning behavior with non-interface xfrm. Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: sched: fix err handler in tcf_action_init()Vlad Buslov2021-04-142-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b3650bf76a32380d4d80a3e21b5583e7303f216c ] With recent changes that separated action module load from action initialization tcf_action_init() function error handling code was modified to manually release the loaded modules if loading/initialization of any further action in same batch failed. For the case when all modules successfully loaded and some of the actions were initialized before one of them failed in init handler. In this case for all previous actions the module will be released twice by the error handler: First time by the loop that manually calls module_put() for all ops, and second time by the action destroy code that puts the module after destroying the action. Reproduction: $ sudo tc actions add action simple sdata \"2\" index 2 $ sudo tc actions add action simple sdata \"1\" index 1 \ action simple sdata \"2\" index 2 RTNETLINK answers: File exists We have an error talking to the kernel $ sudo tc actions ls action simple total acts 1 action order 0: Simple <"2"> index 2 ref 1 bind 0 $ sudo tc actions flush action simple $ sudo tc actions ls action simple $ sudo tc actions add action simple sdata \"2\" index 2 Error: Failed to load TC action module. We have an error talking to the kernel $ lsmod | grep simple act_simple 20480 -1 Fix the issue by modifying module reference counting handling in action initialization code: - Get module reference in tcf_idr_create() and put it in tcf_idr_release() instead of taking over the reference held by the caller. - Modify users of tcf_action_init_1() to always release the module reference which they obtain before calling init function instead of assuming that created action takes over the reference. - Finally, modify tcf_action_init_1() to not release the module reference when overwriting existing action as this is no longer necessary since both upper and lower layers obtain and manage their own module references independently. Fixes: d349f9976868 ("net_sched: fix RTNL deadlock again caused by request_module()") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: let skb_orphan_partial wake-up waiters.Paolo Abeni2021-04-141-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9adc89af724f12a03b47099cd943ed54e877cd59 upstream. Currently the mentioned helper can end-up freeing the socket wmem without waking-up any processes waiting for more write memory. If the partially orphaned skb is attached to an UDP (or raw) socket, the lack of wake-up can hang the user-space. Even for TCP sockets not calling the sk destructor could have bad effects on TSQ. Address the issue using skb_orphan to release the sk wmem before setting the new sock_efree destructor. Additionally bundle the whole ownership update in a new helper, so that later other potential users could avoid duplicate code. v1 -> v2: - use skb_orphan() instead of sort of open coding it (Eric) - provide an helper for the ownership change (Eric) Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()Maciej Żenczykowski2021-04-142-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 630e4576f83accf90366686f39808d665d8dbecc upstream. Found by virtue of ipv6 raw sockets not honouring the per-socket IP{,V6}_FREEBIND setting. Based on hits found via: git grep '[.]ip_nonlocal_bind' We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT, and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl. The helper is defined as: static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) { return net->ipv6.sysctl.ip_nonlocal_bind || inet->freebind || inet->transparent; } so this change only widens the accepted opt-outs and is thus a clean bugfix. I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug, but IMHO this should be applied to stable kernels as far back as possible. As such I'm adding a 'fixes' tag with the commit that originally added the helper, which happened in 4.19. Backporting to older LTS kernels (at least 4.9 and 4.14) would presumably require open-coding it or backporting the helper as well. Other possibly relevant commits: v4.18-rc6-1502-g83ba4645152d net: add helpers checking if socket can be bound to nonlocal address v4.18-rc6-1431-gd0c1f01138c4 net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind v4.14-rc5-271-gb71d21c274ef sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND v4.7-rc7-1883-g9b9742022888 sctp: support ipv6 nonlocal bind v4.1-12247-g35a256fee52c ipv6: Nonlocal bind Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: hsr: Reset MAC header for Tx pathKurt Kanzenbach2021-04-142-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9d6803921a16f4d768dc41a75375629828f4d91e upstream. Reset MAC header in HSR Tx path. This is needed, because direct packet transmission, e.g. by specifying PACKET_QDISC_BYPASS does not reset the MAC header. This has been observed using the following setup: |$ ip link add name hsr0 type hsr slave1 lan0 slave2 lan1 supervision 45 version 1 |$ ifconfig hsr0 up |$ ./test hsr0 The test binary is using mmap'ed sockets and is specifying the PACKET_QDISC_BYPASS socket option. This patch resolves the following warning on a non-patched kernel: |[ 112.725394] ------------[ cut here ]------------ |[ 112.731418] WARNING: CPU: 1 PID: 257 at net/hsr/hsr_forward.c:560 hsr_forward_skb+0x484/0x568 |[ 112.739962] net/hsr/hsr_forward.c:560: Malformed frame (port_src hsr0) The warning can be safely removed, because the other call sites of hsr_forward_skb() make sure that the skb is prepared correctly. Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: fix TXQ AC confusionJohannes Berg2021-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1153a74768a9212daadbb50767aa400bc6a0c9b0 upstream. Normally, TXQs have txq->tid = tid; txq->ac = ieee80211_ac_from_tid(tid); However, the special management TXQ actually has txq->tid = IEEE80211_NUM_TIDS; // 16 txq->ac = IEEE80211_AC_VO; This makes sense, but ieee80211_ac_from_tid(16) is the same as ieee80211_ac_from_tid(0) which is just IEEE80211_AC_BE. Now, normally this is fine. However, if the netdev queues were stopped, then the code in ieee80211_tx_dequeue() will propagate the stop from the interface (vif->txqs_stopped[]) if the AC 2 (ieee80211_ac_from_tid(txq->tid)) is marked as stopped. On wake, however, __ieee80211_wake_txqs() will wake the TXQ if AC 0 (txq->ac) is woken up. If a driver stops all queues with ieee80211_stop_tx_queues() and then wakes them again with ieee80211_wake_tx_queues(), the ieee80211_wake_txqs() tasklet will run to resync queue and TXQ state. If all queues were woken, then what'll happen is that _ieee80211_wake_txqs() will run in order of HW queues 0-3, typically (and certainly for iwlwifi) corresponding to ACs 0-3, so it'll call __ieee80211_wake_txqs() for each AC in order 0-3. When __ieee80211_wake_txqs() is called for AC 0 (VO) that'll wake up the management TXQ (remember its tid is 16), and the driver's wake_tx_queue() will be called. That tries to get a frame, which will immediately *stop* the TXQ again, because now we check against AC 2, and AC 2 hasn't yet been marked as woken up again in sdata->vif.txqs_stopped[] since we're only in the __ieee80211_wake_txqs() call for AC 0. Thus, the management TXQ will never be started again. Fix this by checking txq->ac directly instead of calculating the AC as ieee80211_ac_from_tid(txq->tid). Fixes: adf8ed01e4fd ("mac80211: add an optional TXQ for other PS-buffered frames") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20210323210500.bf4d50afea4a.I136ffde910486301f8818f5442e3c9bf8670a9c4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: fix time-is-after bug in mlmeBen Greear2021-04-141-1/+4
| | | | | | | | | | | | | | | | | commit 7d73cd946d4bc7d44cdc5121b1c61d5d71425dea upstream. The incorrect timeout check caused probing to happen when it did not need to happen. This in turn caused tx performance drop for around 5 seconds in ath10k-ct driver. Possibly that tx drop is due to a secondary issue, but fixing the probe to not happen when traffic is running fixes the symptom. Signed-off-by: Ben Greear <greearb@candelatech.com> Fixes: 9abf4e49830d ("mac80211: optimize station connection monitor") Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210330230749.14097-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cfg80211: check S1G beacon compat element lengthJohannes Berg2021-04-141-6/+8
| | | | | | | | | | | | commit b5ac0146492fc5c199de767e492be8a66471011a upstream. We need to check the length of this element so that we don't access data beyond its end. Fix that. Fixes: 9eaffe5078ca ("cfg80211: convert S1G beacon to scan results") Link: https://lore.kernel.org/r/20210408142826.f6f4525012de.I9fdeff0afdc683a6024e5ea49d2daa3cd2459d11@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nl80211: fix potential leak of ACL paramsJohannes Berg2021-04-141-2/+2
| | | | | | | | | | | | | commit abaf94ecc9c356d0b885a84edef4905cdd89cfdd upstream. In case nl80211_parse_unsol_bcast_probe_resp() results in an error, need to "goto out" instead of just returning to free possibly allocated data. Fixes: 7443dcd1f171 ("nl80211: Unsolicited broadcast probe response support") Link: https://lore.kernel.org/r/20210408142833.d8bc2e2e454a.If290b1ba85789726a671ff0b237726d4851b5b0f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nl80211: fix beacon head validationJohannes Berg2021-04-141-1/+5
| | | | | | | | | | | | | | | | | | | commit 9a6847ba1747858ccac53c5aba3e25c54fbdf846 upstream. If the beacon head attribute (NL80211_ATTR_BEACON_HEAD) is too short to even contain the frame control field, we access uninitialized data beyond the buffer. Fix this by checking the minimal required size first. We used to do this until S1G support was added, where the fixed data portion has a different size. Reported-and-tested-by: syzbot+72b99dcf4607e8c770f3@syzkaller.appspotmail.com Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 1d47f1198d58 ("nl80211: correctly validate S1G beacon head") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210408154518.d9b06d39b4ee.Iff908997b2a4067e8d456b3cb96cab9771d252b8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: fix action overwrite reference countingVlad Buslov2021-04-142-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 87c750e8c38bce706eb32e4d8f1e3402f2cebbd4 upstream. Action init code increments reference counter when it changes an action. This is the desired behavior for cls API which needs to obtain action reference for every classifier that points to action. However, act API just needs to change the action and releases the reference before returning. This sequence breaks when the requested action doesn't exist, which causes act API init code to create new action with specified index, but action is still released before returning and is deleted (unless it was referenced concurrently by cls API). Reproduction: $ sudo tc actions ls action gact $ sudo tc actions change action gact drop index 1 $ sudo tc actions ls action gact Extend tcf_action_init() to accept 'init_res' array and initialize it with action->ops->init() result. In tcf_action_add() remove pointers to created actions from actions array before passing it to tcf_action_put_many(). Fixes: cae422f379f3 ("net: sched: use reference counting action init") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: sch_teql: fix null-pointer dereferencePavel Tikhomirov2021-04-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1ffbc7ea91606e4abd10eb60de5367f1c86daf5e upstream. Reproduce: modprobe sch_teql tc qdisc add dev teql0 root teql0 This leads to (for instance in Centos 7 VM) OOPS: [ 532.366633] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 532.366733] IP: [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.366825] PGD 80000001376d5067 PUD 137e37067 PMD 0 [ 532.366906] Oops: 0000 [#1] SMP [ 532.366987] Modules linked in: sch_teql ... [ 532.367945] CPU: 1 PID: 3026 Comm: tc Kdump: loaded Tainted: G ------------ T 3.10.0-1062.7.1.el7.x86_64 #1 [ 532.368041] Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.2 04/01/2014 [ 532.368125] task: ffff8b7d37d31070 ti: ffff8b7c9fdbc000 task.ti: ffff8b7c9fdbc000 [ 532.368224] RIP: 0010:[<ffffffffc06124a8>] [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.368320] RSP: 0018:ffff8b7c9fdbf8e0 EFLAGS: 00010286 [ 532.368394] RAX: ffffffffc0612490 RBX: ffff8b7cb1565e00 RCX: ffff8b7d35ba2000 [ 532.368476] RDX: ffff8b7d35ba2000 RSI: 0000000000000000 RDI: ffff8b7cb1565e00 [ 532.368557] RBP: ffff8b7c9fdbf8f8 R08: ffff8b7d3fd1f140 R09: ffff8b7d3b001600 [ 532.368638] R10: ffff8b7d3b001600 R11: ffffffff84c7d65b R12: 00000000ffffffd8 [ 532.368719] R13: 0000000000008000 R14: ffff8b7d35ba2000 R15: ffff8b7c9fdbf9a8 [ 532.368800] FS: 00007f6a4e872740(0000) GS:ffff8b7d3fd00000(0000) knlGS:0000000000000000 [ 532.368885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 532.368961] CR2: 00000000000000a8 CR3: 00000001396ee000 CR4: 00000000000206e0 [ 532.369046] Call Trace: [ 532.369159] [<ffffffff84c8192e>] qdisc_create+0x36e/0x450 [ 532.369268] [<ffffffff846a9b49>] ? ns_capable+0x29/0x50 [ 532.369366] [<ffffffff849afde2>] ? nla_parse+0x32/0x120 [ 532.369442] [<ffffffff84c81b4c>] tc_modify_qdisc+0x13c/0x610 [ 532.371508] [<ffffffff84c693e7>] rtnetlink_rcv_msg+0xa7/0x260 [ 532.372668] [<ffffffff84907b65>] ? sock_has_perm+0x75/0x90 [ 532.373790] [<ffffffff84c69340>] ? rtnl_newlink+0x890/0x890 [ 532.374914] [<ffffffff84c8da7b>] netlink_rcv_skb+0xab/0xc0 [ 532.376055] [<ffffffff84c63708>] rtnetlink_rcv+0x28/0x30 [ 532.377204] [<ffffffff84c8d400>] netlink_unicast+0x170/0x210 [ 532.378333] [<ffffffff84c8d7a8>] netlink_sendmsg+0x308/0x420 [ 532.379465] [<ffffffff84c2f3a6>] sock_sendmsg+0xb6/0xf0 [ 532.380710] [<ffffffffc034a56e>] ? __xfs_filemap_fault+0x8e/0x1d0 [xfs] [ 532.381868] [<ffffffffc034a75c>] ? xfs_filemap_fault+0x2c/0x30 [xfs] [ 532.383037] [<ffffffff847ec23a>] ? __do_fault.isra.61+0x8a/0x100 [ 532.384144] [<ffffffff84c30269>] ___sys_sendmsg+0x3e9/0x400 [ 532.385268] [<ffffffff847f3fad>] ? handle_mm_fault+0x39d/0x9b0 [ 532.386387] [<ffffffff84d88678>] ? __do_page_fault+0x238/0x500 [ 532.387472] [<ffffffff84c31921>] __sys_sendmsg+0x51/0x90 [ 532.388560] [<ffffffff84c31972>] SyS_sendmsg+0x12/0x20 [ 532.389636] [<ffffffff84d8dede>] system_call_fastpath+0x25/0x2a [ 532.390704] [<ffffffff84d8de21>] ? system_call_after_swapgs+0xae/0x146 [ 532.391753] Code: 00 00 00 00 00 00 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b b7 48 01 00 00 48 89 fb <48> 8b 8e a8 00 00 00 48 85 c9 74 43 48 89 ca eb 0f 0f 1f 80 00 [ 532.394036] RIP [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.395127] RSP <ffff8b7c9fdbf8e0> [ 532.396179] CR2: 00000000000000a8 Null pointer dereference happens on master->slaves dereference in teql_destroy() as master is null-pointer. When qdisc_create() calls teql_qdisc_init() it imediately fails after check "if (m->dev == dev)" because both devices are teql0, and it does not set qdisc_priv(sch)->m leaving it zero on error path, then qdisc_create() imediately calls teql_destroy() which does not expect zero master pointer and we get OOPS. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* bpf, sockmap: Fix incorrect fwd_alloc accountingJohn Fastabend2021-04-141-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 144748eb0c445091466c9b741ebd0bfcc5914f3d upstream. Incorrect accounting fwd_alloc can result in a warning when the socket is torn down, [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230 [...] [18455.319543] Call Trace: [18455.319556] inet_csk_destroy_sock+0xba/0x1f0 [18455.319577] tcp_rcv_state_process+0x1b4e/0x2380 [18455.319593] ? lock_downgrade+0x3a0/0x3a0 [18455.319617] ? tcp_finish_connect+0x1e0/0x1e0 [18455.319631] ? sk_reset_timer+0x15/0x70 [18455.319646] ? tcp_schedule_loss_probe+0x1b2/0x240 [18455.319663] ? lock_release+0xb2/0x3f0 [18455.319676] ? __release_sock+0x8a/0x1b0 [18455.319690] ? lock_downgrade+0x3a0/0x3a0 [18455.319704] ? lock_release+0x3f0/0x3f0 [18455.319717] ? __tcp_close+0x2c6/0x790 [18455.319736] ? tcp_v4_do_rcv+0x168/0x370 [18455.319750] tcp_v4_do_rcv+0x168/0x370 [18455.319767] __release_sock+0xbc/0x1b0 [18455.319785] __tcp_close+0x2ee/0x790 [18455.319805] tcp_close+0x20/0x80 This currently happens because on redirect case we do skb_set_owner_r() with the original sock. This increments the fwd_alloc memory accounting on the original sock. Then on redirect we may push this into the queue of the psock we are redirecting to. When the skb is flushed from the queue we give the memory back to the original sock. The problem is if the original sock is destroyed/closed with skbs on another psocks queue then the original sock will not have a way to reclaim the memory before being destroyed. Then above warning will be thrown sockA sockB sk_psock_strp_read() sk_psock_verdict_apply() -- SK_REDIRECT -- sk_psock_skb_redirect() skb_queue_tail(psock_other->ingress_skb..) sk_close() sock_map_unref() sk_psock_put() sk_psock_drop() sk_psock_zap_ingress() At this point we have torn down our own psock, but have the outstanding skb in psock_other. Note that SK_PASS doesn't have this problem because the sk_psock_drop() logic releases the skb, its still associated with our psock. To resolve lets only account for sockets on the ingress queue that are still associated with the current socket. On the redirect case we will check memory limits per 6fa9201a89898, but will omit fwd_alloc accounting until skb is actually enqueued. When the skb is sent via skb_send_sock_locked or received with sk_psock_skb_ingress memory will be claimed on psock_other. Fixes: 6fa9201a89898 ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved fieldTetsuo Handa2021-04-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | commit 08c27f3322fec11950b8f1384aa0f3b11d028528 upstream. KMSAN found uninitialized value at batadv_tt_prepare_tvlv_local_data() [1], for commit ced72933a5e8ab52 ("batman-adv: use CRC32C instead of CRC16 in TT code") inserted 'reserved' field into "struct batadv_tvlv_tt_data" and commit 7ea7b4a142758dea ("batman-adv: make the TT CRC logic VLAN specific") moved that field to "struct batadv_tvlv_tt_vlan_data" but left that field uninitialized. [1] https://syzkaller.appspot.com/bug?id=07f3e6dba96f0eb3cabab986adcd8a58b9bdbe9d Reported-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: ced72933a5e8ab52 ("batman-adv: use CRC32C instead of CRC16 in TT code") Fixes: 7ea7b4a142758dea ("batman-adv: make the TT CRC logic VLAN specific") Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ethtool: fix incorrect datatype in set_eee opsWong Vee Khee2021-04-141-2/+2
| | | | | | | | | | | | | | | | commit 63cf32389925e234d166fb1a336b46de7f846003 upstream. The member 'tx_lpi_timer' is defined with __u32 datatype in the ethtool header file. Hence, we should use ethnl_update_u32() in set_eee ops. Fixes: fd77be7bd43c ("ethtool: set EEE settings with EEE_SET request") Cc: <stable@vger.kernel.org> # 5.10.x Cc: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlhMuhammad Usama Anjum2021-04-141-3/+5
| | | | | | | | | | | | | | commit 864db232dc7036aa2de19749c3d5be0143b24f8f upstream. nlh is being checked for validtity two times when it is dereferenced in this function. Check for validity again when updating the flags through nlh pointer to make the dereferencing safe. CC: <stable@vger.kernel.org> Addresses-Coverity: ("NULL pointer dereference") Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfc: Avoid endless loops caused by repeated llcp_sock_connect()Xiaoming Ni2021-04-141-0/+4
| | | | | | | | | | | | | | | | | | | | commit 4b5db93e7f2afbdfe3b78e37879a85290187e6f1 upstream. When sock_wait_state() returns -EINPROGRESS, "sk->sk_state" is LLCP_CONNECTING. In this case, llcp_sock_connect() is repeatedly invoked, nfc_llcp_sock_link() will add sk to local->connecting_sockets twice. sk->sk_node->next will point to itself, that will make an endless loop and hang-up the system. To fix it, check whether sk->sk_state is LLCP_CONNECTING in llcp_sock_connect() to avoid repeated invoking. Fixes: b4011239a08e ("NFC: llcp: Fix non blocking sockets connections") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.11 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfc: fix memory leak in llcp_sock_connect()Xiaoming Ni2021-04-141-0/+2
| | | | | | | | | | | | | | | | | | | commit 7574fcdbdcb335763b6b322f6928dc0fd5730451 upstream. In llcp_sock_connect(), use kmemdup to allocate memory for "llcp_sock->service_name". The memory is not released in the sock_unlink label of the subsequent failure branch. As a result, memory leakage occurs. fix CVE-2020-25672 Fixes: d646960f7986 ("NFC: Initial LLCP support") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.3 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfc: fix refcount leak in llcp_sock_connect()Xiaoming Ni2021-04-141-0/+2
| | | | | | | | | | | | | | | | | | commit 8a4cd82d62b5ec7e5482333a72b58a4eea4979f0 upstream. nfc_llcp_local_get() is invoked in llcp_sock_connect(), but nfc_llcp_local_put() is not invoked in subsequent failure branches. As a result, refcount leakage occurs. To fix it, add calling nfc_llcp_local_put(). fix CVE-2020-25671 Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.6 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfc: fix refcount leak in llcp_sock_bind()Xiaoming Ni2021-04-141-0/+2
| | | | | | | | | | | | | | | | | | commit c33b1cc62ac05c1dbb1cdafe2eb66da01c76ca8d upstream. nfc_llcp_local_get() is invoked in llcp_sock_bind(), but nfc_llcp_local_put() is not invoked in subsequent failure branches. As a result, refcount leakage occurs. To fix it, add calling nfc_llcp_local_put(). fix CVE-2020-25670 Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.6 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfrm/compat: Cleanup WARN()s that can be user-triggeredDmitry Safonov2021-04-141-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ef19e111337f6c3dca7019a8bad5fbc6fb18d635 upstream. Replace WARN_ONCE() that can be triggered from userspace with pr_warn_once(). Those still give user a hint what's the issue. I've left WARN()s that are not possible to trigger with current code-base and that would mean that the code has issues: - relying on current compat_msg_min[type] <= xfrm_msg_min[type] - expected 4-byte padding size difference between compat_msg_min[type] and xfrm_msg_min[type] - compat_policy[type].len <= xfrma_policy[type].len (for every type) Reported-by: syzbot+834ffd1afc7212eb8147@syzkaller.appspotmail.com Fixes: 5f3eea6b7e8f ("xfrm/compat: Attach xfrm dumps to 64=>32 bit translator") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: nftables: skip hook overlap logic if flowtable is stalePablo Neira Ayuso2021-04-101-0/+3
| | | | | | | | | | | [ Upstream commit 86fe2c19eec4728fd9a42ba18f3b47f0d5f9fd7c ] If the flowtable has been previously removed in this batch, skip the hook overlap checks. This fixes spurious EEXIST errors when removing and adding the flowtable in the same batch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: conntrack: Fix gre tunneling over ipv6Ludovic Senecaux2021-04-101-3/+0
| | | | | | | | | | | [ Upstream commit 8b2030b4305951f44afef80225f1475618e25a73 ] This fix permits gre connections to be tracked within ip6tables rules Signed-off-by: Ludovic Senecaux <linuxludo@free.fr> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mac80211: choose first enabled channel for monitorKarthikeyan Kathirvel2021-04-101-1/+12
| | | | | | | | | | | | | | | | | [ Upstream commit 041c881a0ba8a75f71118bd9766b78f04beed469 ] Even if the first channel from sband channel list is invalid or disabled mac80211 ends up choosing it as the default channel for monitor interfaces, making them not usable. Fix this by assigning the first available valid or enabled channel instead. Signed-off-by: Karthikeyan Kathirvel <kathirve@codeaurora.org> Link: https://lore.kernel.org/r/1615440547-7661-1-git-send-email-kathirve@codeaurora.org [reword commit message, comment, code cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mac80211: Check crypto_aead_encrypt for errorsDaniel Phan2021-04-102-4/+6
| | | | | | | | | | | | | | [ Upstream commit 58d25626f6f0ea5bcec3c13387b9f835d188723d ] crypto_aead_encrypt returns <0 on error, so if these calls are not checked, execution may continue with failed encrypts. It also seems that these two crypto_aead_encrypt calls are the only instances in the codebase that are not checked for errors. Signed-off-by: Daniel Phan <daniel.phan36@gmail.com> Link: https://lore.kernel.org/r/20210309204137.823268-1-daniel.phan36@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpf: Remove MTU check in __bpf_skb_max_lenJesper Dangaard Brouer2021-04-071-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6306c1189e77a513bf02720450bb43bd4ba5d8ae upstream. Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: 9p: advance iov on empty readJisheng Zhang2021-04-071-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d65614a01d24704b016635abf5cc028a54e45a62 ] I met below warning when cating a small size(about 80bytes) txt file on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we miss iov_iter_advance() if the read count is 0 for zerocopy case, so we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is full. Fix it by removing the exception for 0 to ensure to call iov_iter_advance() even on empty read for zerocopy case. [ 8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40 [ 8.280028] Modules linked in: [ 8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6 [ 8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40 [ 8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b [ 8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246 [ 8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000 [ 8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90 [ 8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058 [ 8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050 [ 8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600 [ 8.285439] FS: 00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 8.285844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0 [ 8.286710] Call Trace: [ 8.288279] generic_file_splice_read+0x31/0x1a0 [ 8.289273] ? do_splice_to+0x2f/0x90 [ 8.289511] splice_direct_to_actor+0xcc/0x220 [ 8.289788] ? pipe_to_sendpage+0xa0/0xa0 [ 8.290052] do_splice_direct+0x8b/0xd0 [ 8.290314] do_sendfile+0x1ad/0x470 [ 8.290576] do_syscall_64+0x2d/0x40 [ 8.290818] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.291409] RIP: 0033:0x7f24fd7dca0a [ 8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8 [ 8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028 [ 8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a [ 8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 [ 8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003 [ 8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [ 8.295782] ---[ end trace 63317af81b3ca24b ]--- Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* appletalk: Fix skb allocation size in loopback caseDoug Brown2021-04-071-12/+21
| | | | | | | | | | | | | | | | | | | [ Upstream commit 39935dccb21c60f9bbf1bb72d22ab6fd14ae7705 ] If a DDP broadcast packet is sent out to a non-gateway target, it is also looped back. There is a potential for the loopback device to have a longer hardware header length than the original target route's device, which can result in the skb not being created with enough room for the loopback device's hardware header. This patch fixes the issue by determining that a loopback will be necessary prior to allocating the skb, and if so, ensuring the skb has enough room. This was discovered while testing a new driver that creates a LocalTalk network interface (LTALK_HLEN = 1). It caused an skb_under_panic. Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: introduce CAN specific pointer in the struct net_deviceOleksij Rempel2021-04-074-61/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4e096a18867a5a989b510f6999d9c6b6622e8f7b ] Since 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") the CAN framework uses per device specific data in the AF_CAN protocol. For this purpose the struct net_device->ml_priv is used. Later the ml_priv usage in CAN was extended for other users, one of them being CAN_J1939. Later in the kernel ml_priv was converted to an union, used by other drivers. E.g. the tun driver started storing it's stats pointer. Since tun devices can claim to be a CAN device, CAN specific protocols will wrongly interpret this pointer, which will cause system crashes. Mostly this issue is visible in the CAN_J1939 stack. To fix this issue, we request a dedicated CAN pointer within the net_device struct. Reported-by: syzbot+5138c4dd15a0401bec7b@syzkaller.appspotmail.com Fixes: 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") Fixes: ffd956eef69b ("can: introduce CAN midlayer private and allocate it automatically") Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Fixes: 497a5757ce4e ("tun: switch to net core provided statistics counters") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20210223070127.4538-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* flow_dissector: fix TTL and TOS dissection on IPv4 fragmentsDavide Caratti2021-04-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d2126838050ccd1dadf310ffb78b2204f3b032b9 ] the following command: # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ $tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop doesn't drop all IPv4 packets that match the configured TTL / destination address. In particular, if "fragment offset" or "more fragments" have non zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the correct behavior. Fixes: 518d8a2e9bad ("net/flow_dissector: add support for dissection of misc ip header fields") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* rpc: fix NULL dereference on kmalloc failureJ. Bruce Fields2021-04-071-4/+7
| | | | | | | | | | | | | | | | | | | [ Upstream commit 0ddc942394013f08992fc379ca04cffacbbe3dae ] I think this is unlikely but possible: svc_authenticate sets rq_authop and calls svcauth_gss_accept. The kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL, and returning SVC_DENIED. This causes svc_process_common to go to err_bad_auth, and eventually call svc_authorise. That calls ->release == svcauth_gss_release, which tries to dereference rq_auth_data. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mac80211: fix double free in ibss_leaveMarkus Theil2021-03-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3bd801b14e0c5d29eeddc7336558beb3344efaa3 upstream. Clear beacon ie pointer and ie length after free in order to prevent double free. ================================================================== BUG: KASAN: double-free or invalid-free \ in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 CPU: 0 PID: 8472 Comm: syz-executor100 Not tainted 5.11.0-rc6-syzkaller #0 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2c6 mm/kasan/report.c:230 kasan_report_invalid_free+0x51/0x80 mm/kasan/report.c:355 ____kasan_slab_free+0xcc/0xe0 mm/kasan/common.c:341 kasan_slab_free include/linux/kasan.h:192 [inline] __cache_free mm/slab.c:3424 [inline] kfree+0xed/0x270 mm/slab.c:3760 ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 rdev_leave_ibss net/wireless/rdev-ops.h:545 [inline] __cfg80211_leave_ibss+0x19a/0x4c0 net/wireless/ibss.c:212 __cfg80211_leave+0x327/0x430 net/wireless/core.c:1172 cfg80211_leave net/wireless/core.c:1221 [inline] cfg80211_netdev_notifier_call+0x9e8/0x12c0 net/wireless/core.c:1335 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2040 call_netdevice_notifiers_extack net/core/dev.c:2052 [inline] call_netdevice_notifiers net/core/dev.c:2066 [inline] __dev_close_many+0xee/0x2e0 net/core/dev.c:1586 __dev_close net/core/dev.c:1624 [inline] __dev_change_flags+0x2cb/0x730 net/core/dev.c:8476 dev_change_flags+0x8a/0x160 net/core/dev.c:8549 dev_ifsioc+0x210/0xa70 net/core/dev_ioctl.c:265 dev_ioctl+0x1b1/0xc40 net/core/dev_ioctl.c:511 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060 sock_ioctl+0x477/0x6a0 net/socket.c:1177 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+93976391bf299d425f44@syzkaller.appspotmail.com Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20210213133653.367130-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* can: dev: Move device back to init netns on owning netns deleteMartin Willi2021-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream. When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14) [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8) [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114) [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac) [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60) [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380) [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438) [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8) [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c) [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "netfilter: x_tables: Update remaining dereference to RCU"Mark Tomlinson2021-03-303-3/+3
| | | | | | | | | | | | | | | | | | [ Upstream commit abe7034b9a8d57737e80cc16d60ed3666990bdbf ] This reverts commit 443d6e86f821a165fae3fc3fc13086d27ac140b1. This (and the following) patch basically re-implemented the RCU mechanisms of patch 784544739a25. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: x_tables: Use correct memory barriers.Mark Tomlinson2021-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 175e476b8cdf2a4de7432583b49c871345e4f8a1 ] When a new table value was assigned, it was followed by a write memory barrier. This ensured that all writes before this point would complete before any writes after this point. However, to determine whether the rules are unused, the sequence counter is read. To ensure that all writes have been done before these reads, a full memory barrier is needed, not just a write memory barrier. The same argument applies when incrementing the counter, before the rules are read. Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic reported in cc00bcaa5899 (which is still present), while still maintaining the same speed of replacing tables. The smb_mb() barriers potentially slow the packet path, however testing has shown no measurable change in performance on a 4-core MIPS64 platform. Fixes: 7f5c6d4f665b ("netfilter: get rid of atomic ops in fast path") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Revert "netfilter: x_tables: Switch synchronization to RCU"Mark Tomlinson2021-03-304-36/+55
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit d3d40f237480abf3268956daf18cdc56edd32834 ] This reverts commit cc00bcaa589914096edef7fb87ca5cee4a166b5c. This (and the preceding) patch basically re-implemented the RCU mechanisms of patch 784544739a25. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Prior to using RCU a script calling "iptables" approx. 200 times was taking 1.16s. With RCU this increased to 11.59s. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net, bpf: Fix ip6ip6 crash with collect_md populated skbsDaniel Borkmann2021-03-301-22/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a188bb5638d41aa99090ebf2f85d3505ab13fba5 ] I ran into a crash where setting up a ip6ip6 tunnel device which was /not/ set to collect_md mode was receiving collect_md populated skbs for xmit. The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is assigning special metadata dst entry and then redirecting the skb to the device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit() and in the latter it performs a neigh lookup based on skb_dst(skb) where we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since the md_dst_ops do not populate neigh_lookup callback with a fake handler. Transform the md_dst_ops into generic dst_blackhole_ops that can also be reused elsewhere when needed, and use them for the metadata dst entries as callback ops. Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}() from dst_init() which free the skb the same way modulo the splat. Given we will be able to recover just fine from there, avoid any potential splats iff this gets ever triggered in future (or worse, panic on warns when set). Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: Consolidate common blackhole dst opsDaniel Borkmann2021-03-303-64/+55
| | | | | | | | | | | | | [ Upstream commit c4c877b2732466b4c63217baad05c96f775912c7 ] Move generic blackhole dst ops to the core and use them from both ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No functional change otherwise. We need these also in other locations and having to define them over and over again is not great. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: bridge: don't notify switchdev for local FDB addressesVladimir Oltean2021-03-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6ab4c3117aec4e08007d9e971fa4133e1de1082d ] As explained in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug. The bridge would not say that this entry is local: ip link add br0 type bridge ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master local and the switchdev driver would be more than happy to offload it as a normal static FDB entry. This is despite the fact that 'local' and non-'local' entries have completely opposite directions: a local entry is locally terminated and not forwarded, whereas a static entry is forwarded and not locally terminated. So, for example, DSA would install this entry on swp0 instead of installing it on the CPU port as it should. There is an even sadder part, which is that the 'local' flag is implicit if 'static' is not specified, meaning that this command produces the same result of adding a 'local' entry: bridge fdb add dev swp0 00:01:02:03:04:05 master I've updated the man pages for 'bridge', and after reading it now, it should be pretty clear to any user that the commands above were broken and should have never resulted in the 00:01:02:03:04:05 address being forwarded (this behavior is coherent with non-switchdev interfaces): https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/ If you're a user reading this and this is what you want, just use: bridge fdb add dev swp0 00:01:02:03:04:05 master static Because switchdev should have given drivers the means from day one to classify FDB entries as local/non-local, but didn't, it means that all drivers are currently broken. So we can just as well omit the switchdev notifications for local FDB entries, which is exactly what this patch does to close the bug in stable trees. For further development work where drivers might want to trap the local FDB entries to the host, we can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and selectively make drivers act upon that bit, while all the others ignore those entries if the 'is_local' bit is set. Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: isotp: tx-path: zero initialize outgoing CAN framesOliver Hartkopp2021-03-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b5f020f82a8e41201c6ede20fa00389d6980b223 ] Commit d4eb538e1f48 ("can: isotp: TX-path: ensure that CAN frame flags are initialized") ensured the TX flags to be properly set for outgoing CAN frames. In fact the root cause of the issue results from a missing initialization of outgoing CAN frames created by isotp. This is no problem on the CAN bus as the CAN driver only picks the correctly defined content from the struct can(fd)_frame. But when the outgoing frames are monitored (e.g. with candump) we potentially leak some bytes in the unused content of struct can(fd)_frame. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Cc: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20210319100619.10858-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selinux: vsock: Set SID for socket returned by accept()David Brazdil2021-03-301-0/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 1f935e8e72ec28dddb2dc0650b3b6626a293d94b ] For AF_VSOCK, accept() currently returns sockets that are unlabelled. Other socket families derive the child's SID from the SID of the parent and the SID of the incoming packet. This is typically done as the connected socket is placed in the queue that accept() removes from. Reuse the existing 'security_sk_clone' hook to copy the SID from the parent (server) socket to the child. There is no packet SID in this case. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: check all name nodes in __dev_alloc_nameJiri Bohac2021-03-301-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6c015a2256801597fadcbc11d287774c9c512fa5 ] __dev_alloc_name(), when supplied with a name containing '%d', will search for the first available device number to generate a unique device name. Since commit ff92741270bf8b6e78aa885f166b68c7a67ab13a ("net: introduce name_node struct to be used in hashlist") network devices may have alternate names. __dev_alloc_name() does take these alternate names into account, possibly generating a name that is already taken and failing with -ENFILE as a result. This demonstrates the bug: # rmmod dummy 2>/dev/null # ip link property add dev lo altname dummy0 # modprobe dummy numdummies=1 modprobe: ERROR: could not insert 'dummy': Too many open files in system Instead of creating a device named dummy1, modprobe fails. Fix this by checking all the names in the d->name_node list, not just d->name. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>