summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.Kuniyuki Iwashima2023-02-222-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | commit ca43ccf41224b023fc290073d5603a755fd12eed upstream. Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc. We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds. Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases. [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/ Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/sched: tcindex: update imperfect hash filters respecting rcuPedro Tammela2023-02-221-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ee059170b1f7e94e55fa6cadee544e176a6e59c2 upstream. The imperfect hash area can be updated while packets are traversing, which will cause a use-after-free when 'tcf_exts_exec()' is called with the destroyed tcf_ext. CPU 0: CPU 1: tcindex_set_parms tcindex_classify tcindex_lookup tcindex_lookup tcf_exts_change tcf_exts_exec [UAF] Stop operating on the shared area directly, by using a local copy, and update the filter with 'rcu_replace_pointer()'. Delete the old filter version only after a rcu grace period elapsed. Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change") Reported-by: valis <sec@valis.email> Suggested-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: sctp_sock_filter(): avoid list_entry() on possibly empty listPietro Borrello2023-02-221-3/+1
| | | | | | | | | | | | | | commit a1221703a0f75a9d81748c516457e0fc76951496 upstream. Use list_is_first() to check whether tsp->asoc matches the first element of ep->asocs, as the list is not guaranteed to have an entry. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: Fix unwanted sign extension in netdev_stats_to_stats64()Felix Riemann2023-02-221-1/+1
| | | | | | | | | | | | | | | | | | | commit 9b55d3f0a69af649c62cbc2633e6d695bb3cc583 upstream. When converting net_device_stats to rtnl_link_stats64 sign extension is triggered on ILP32 machines as 6c1c509778 changed the previous "ulong -> u64" conversion to "long -> u64" by accessing the net_device_stats fields through a (signed) atomic_long_t. This causes for example the received bytes counter to jump to 16EiB after having received 2^31 bytes. Casting the atomic value to "unsigned long" beforehand converting it into u64 avoids this. Fixes: 6c1c5097781f ("net: add atomic_long_t to net_device_stats fields") Signed-off-by: Felix Riemann <felix.riemann@sma.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: sch: Bounds check priorityKees Cook2023-02-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit de5ca4c3852f896cacac2bf259597aab5e17d9e3 ] Nothing was explicitly bounds checking the priority index used to access clpriop[]. WARN and bail out early if it's pathological. Seen with GCC 13: ../net/sched/sch_htb.c: In function 'htb_activate_prios': ../net/sched/sch_htb.c:437:44: warning: array subscript [0, 31] is outside array bounds of 'struct htb_prio[8]' [-Warray-bounds=] 437 | if (p->inner.clprio[prio].feed.rb_node) | ~~~~~~~~~~~~~~~^~~~~~ ../net/sched/sch_htb.c:131:41: note: while referencing 'clprio' 131 | struct htb_prio clprio[TC_HTB_NUMPRIO]; | ^~~~~~ Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20230127224036.never.561-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/rose: Fix to not accept on connected socketHyunwoo Kim2023-02-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 14caefcf9837a2be765a566005ad82cd0d2a429f ] If you call listen() and accept() on an already connect()ed rose socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and rose_accept() dequeues the skb waiting in the sk->sk_receive_queue. This creates a child socket with the sk of the parent rose socket, which can cause confusion. Fix rose_listen() to return -EINVAL if the socket has already been successfully connected, and add lock_sock to prevent this issue. Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230125105944.GA133314@ubuntu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itselfJakub Sitnicki2023-02-221-27/+34
| | | | | | | | | | | | | | | | | [ Upstream commit 5b4a79ba65a1ab479903fff2e604865d229b70a9 ] sock_map proto callbacks should never call themselves by design. Protect against bugs like [1] and break out of the recursive loop to avoid a stack overflow in favor of a resource leak. [1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-1-1e0ee7ac2f90@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mptcp: fix locking for in-kernel listener creationPaolo Abeni2023-02-222-5/+7
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ad2171009d968104ccda9dc517f5a3ba891515db ] For consistency, in mptcp_pm_nl_create_listen_socket(), we need to call the __mptcp_nmpc_socket() under the msk socket lock. Note that as a side effect, mptcp_subflow_create_socket() needs a 'nested' lockdep annotation, as it will acquire the subflow (kernel) socket lock under the in-kernel listener msk socket lock. The current lack of locking is almost harmless, because the relevant socket is not exposed to the user space, but in future we will add more complexity to the mentioned helper, let's play safe. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mptcp: deduplicate error paths on endpoint creationPaolo Abeni2023-02-221-22/+13
| | | | | | | | | | | | | | | | | [ Upstream commit 976d302fb6165ad620778d7ba834cde6e3fe9f9f ] When endpoint creation fails, we need to free the newly allocated entry and eventually destroy the paired mptcp listener socket. Consolidate such action in a single point let all the errors path reach it. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: ad2171009d96 ("mptcp: fix locking for in-kernel listener creation") Signed-off-by: Sasha Levin <sashal@kernel.org>
* mptcp: fix locking for setsockopt corner-casePaolo Abeni2023-02-221-2/+9
| | | | | | | | | | | | | | | | [ Upstream commit 21e43569685de4ad773fb060c11a15f3fd5e7ac4 ] We need to call the __mptcp_nmpc_socket(), and later subflow socket access under the msk socket lock, or e.g. a racing connect() could change the socket status under the hood, with unexpected results. Fixes: 54635bd04701 ("mptcp: add TCP_FASTOPEN_CONNECT socket option") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mptcp: sockopt: make 'tcp_fastopen_connect' genericMatthieu Baerts2023-02-221-5/+6
| | | | | | | | | | | | | | | | | | | [ Upstream commit d3d429047cc66ff49780c93e4fccd9527723d385 ] There are other socket options that need to act only on the first subflow, e.g. all TCP_FASTOPEN* socket options. This is similar to the getsockopt version. In the next commit, this new mptcp_setsockopt_first_sf_only() helper is used by other another option. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 21e43569685d ("mptcp: fix locking for setsockopt corner-case") Signed-off-by: Sasha Levin <sashal@kernel.org>
* mptcp: be careful on subflow status propagation on errorsPaolo Abeni2023-02-141-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1249db44a102d9d3541ed7798d4b01ffdcf03524 upstream. Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/339 Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mptcp: do not wait for bare sockets' timeoutPaolo Abeni2023-02-141-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | commit d4e85922e3e7ef2071f91f65e61629b60f3a9cf4 upstream. If the peer closes all the existing subflows for a given mptcp socket and later the application closes it, the current implementation let it survive until the timewait timeout expires. While the above is allowed by the protocol specification it consumes resources for almost no reason and additionally causes sporadic self-tests failures. Let's move the mptcp socket to the TCP_CLOSE state when there are no alive subflows at close time, so that the allocated resources will be freed immediately. Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* rds: rds_rm_zerocopy_callback() use list_first_entry()Pietro Borrello2023-02-141-3/+3
| | | | | | | | | | | | | | | | [ Upstream commit f753a68980cf4b59a80fe677619da2b1804f526d ] rds_rm_zerocopy_callback() uses list_entry() on the head of a list causing a type confusion. Use list_first_entry() to actually access the first element of the rs_zcookie_queue list. Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniroma1.it Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* txhash: fix sk->sk_txrehash defaultKevin Yang2023-02-144-4/+4
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c11204c78d6966c5bda6dd05c3ac5cbb193f93e3 ] This code fix a bug that sk->sk_txrehash gets its default enable value from sysctl_txrehash only when the socket is a TCP listener. We should have sysctl_txrehash to set the default sk->sk_txrehash, no matter TCP, nor listerner/connector. Tested by following packetdrill: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 4 // SO_TXREHASH == 74, default to sysctl_txrehash == 1 +0 getsockopt(3, SOL_SOCKET, 74, [1], [4]) = 0 +0 getsockopt(4, SOL_SOCKET, 74, [1], [4]) = 0 Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm: fix bug with DSCP copy to v6 from v4 tunnelChristian Hopps2023-02-141-2/+1
| | | | | | | | | | | | | | [ Upstream commit 6028da3f125fec34425dbd5fec18e85d372b2af6 ] When copying the DSCP bits for decap-dscp into IPv6 don't assume the outer encap is always IPv6. Instead, as with the inner IPv4 case, copy the DSCP bits from the correctly saved "tos" value in the control block. Fixes: 227620e29509 ("[IPSEC]: Separate inner/outer mode processing on input") Signed-off-by: Christian Hopps <chopps@chopps.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm: annotate data-race around use_timeEric Dumazet2023-02-142-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0a9e5794b21e2d1303759ff8fe5f9215db7757ba ] KCSAN reported multiple cpus can update use_time at the same time. Adds READ_ONCE()/WRITE_ONCE() annotations. Note that 32bit arches are not fully protected, but they will probably no longer be supported/used in 2106. BUG: KCSAN: data-race in __xfrm_policy_check / __xfrm_policy_check write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 0: __xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664 __xfrm_policy_check2 include/net/xfrm.h:1174 [inline] xfrm_policy_check include/net/xfrm.h:1179 [inline] xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189 udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703 udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792 udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935 __udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020 udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133 ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439 ip6_input_finish net/ipv6/ip6_input.c:484 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493 dst_input include/net/dst.h:454 [inline] ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:302 [inline] ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5482 [inline] __netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596 process_backlog+0x23f/0x3b0 net/core/dev.c:5924 __napi_poll+0x65/0x390 net/core/dev.c:6485 napi_poll net/core/dev.c:6552 [inline] net_rx_action+0x37e/0x730 net/core/dev.c:6663 __do_softirq+0xf2/0x2c7 kernel/softirq.c:571 do_softirq+0xb1/0xf0 kernel/softirq.c:472 __local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396 __raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline] _raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284 wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 1: __xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664 __xfrm_policy_check2 include/net/xfrm.h:1174 [inline] xfrm_policy_check include/net/xfrm.h:1179 [inline] xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189 udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703 udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792 udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935 __udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020 udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133 ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439 ip6_input_finish net/ipv6/ip6_input.c:484 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493 dst_input include/net/dst.h:454 [inline] ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:302 [inline] ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5482 [inline] __netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596 process_backlog+0x23f/0x3b0 net/core/dev.c:5924 __napi_poll+0x65/0x390 net/core/dev.c:6485 napi_poll net/core/dev.c:6552 [inline] net_rx_action+0x37e/0x730 net/core/dev.c:6663 __do_softirq+0xf2/0x2c7 kernel/softirq.c:571 do_softirq+0xb1/0xf0 kernel/softirq.c:472 __local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396 __raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline] _raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284 wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0000000063c62d6f -> 0x0000000063c62d70 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4185 Comm: kworker/1:2 Tainted: G W 6.2.0-rc4-syzkaller-00009-gd532dd102151-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: wg-crypt-wg0 wg_packet_tx_worker Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()Eric Dumazet2023-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b6ee896385380aa621102e8ea402ba12db1cabff ] int type = nla_type(nla); if (type > XFRMA_MAX) { return -EOPNOTSUPP; } @type is then used as an array index and can be used as a Spectre v1 gadget. if (nla_len(nla) < compat_policy[type].len) { array_index_nospec() can be used to prevent leaking content of kernel memory to malicious users. Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dmitry Safonov <dima@arista.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* xfrm: compat: change expression for switch in xfrm_xlate64Anastasia Belova2023-02-141-1/+1
| | | | | | | | | | | | | | | | | [ Upstream commit eb6c59b735aa6cca77cdbb59cc69d69a0d63d986 ] Compare XFRM_MSG_NEWSPDINFO (value from netlink configuration messages enum) with nlh_src->nlmsg_type instead of nlh_src->nlmsg_type - XFRM_MSG_BASE. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 4e9505064f58 ("net/xfrm/compat: Copy xfrm_spdattr_type_t atributes") Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Tested-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: j1939: do not wait 250 ms if the same addr was already claimedDevid Antonio Filoni2023-02-141-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4ae5e1e97c44f4654516c1d41591a462ed62fa7b upstream. The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states: d) No CF shall begin, or resume, transmission on the network until 250 ms after it has successfully claimed an address except when responding to a request for address-claimed. But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim prioritization" show that the CF begins the transmission after 250 ms from the first AC (address-claimed) message even if it sends another AC message during that time window to resolve the address contention with another CF. As stated in "4.4.2.3 - Address-claimed message": In order to successfully claim an address, the CF sending an address claimed message shall not receive a contending claim from another CF for at least 250 ms. As stated in "4.4.3.2 - NAME management (NM) message": 1) A commanding CF can d) request that a CF with a specified NAME transmit the address- claimed message with its current NAME. 2) A target CF shall d) send an address-claimed message in response to a request for a matching NAME Taking the above arguments into account, the 250 ms wait is requested only during network initialization. Do not restart the timer on AC message if both the NAME and the address match and so if the address has already been claimed (timer has expired) or the AC message has been sent to resolve the contention with another CF (timer is still running). Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnologies.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* can: isotp: handle wait_event_interruptible() return valuesOliver Hartkopp2023-02-091-0/+4
| | | | | | | | | | | | | | | commit 823b2e42720f96f277940c37ea438b7c5ead51a4 upstream. When wait_event_interruptible() has been interrupted by a signal the tx.state value might not be ISOTP_IDLE. Force the state machines into idle state to inhibit the timer handlers to continue working. Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing") Cc: stable@vger.kernel.org Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230112192347.1944-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* can: isotp: split tx timer into transmission and timeoutOliver Hartkopp2023-02-091-36/+29
| | | | | | | | | | | | | | | | | | | | | | | | | commit 4f027cba8216f42a18b544842efab134f8b1f9f4 upstream. The timer for the transmission of isotp PDUs formerly had two functions: 1. send two consecutive frames with a given time gap 2. monitor the timeouts for flow control frames and the echo frames This led to larger txstate checks and potentially to a problem discovered by syzbot which enabled the panic_on_warn feature while testing. The former 'txtimer' function is split into 'txfrtimer' and 'txtimer' to handle the two above functionalities with separate timer callbacks. The two simplified timers now run in one-shot mode and make the state transitions (especially with isotp_rcv_echo) better understandable. Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing") Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # >= v6.0 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: qrtr: free memory on error path in radix_tree_insert()Natalia Petrova2023-02-091-1/+4
| | | | | | | | | | | | | | | | | | | | | commit 29de68c2b32ce58d64dea496d281e25ad0f551bd upstream. Function radix_tree_insert() returns errors if the node hasn't been initialized and added to the tree. "kfree(node)" and return value "NULL" of node_get() help to avoid using unclear node in other calls. Found by Linux Verification Center (linuxtesting.org) with SVACE. Cc: <stable@vger.kernel.org> # 5.7 Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace") Signed-off-by: Natalia Petrova <n.petrova@fintech.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://lore.kernel.org/r/20230125134831.8090-1-n.petrova@fintech.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/x25: Fix to not accept on connected socketHyunwoo Kim2023-02-091-0/+6
| | | | | | | | | | | | | | | | | | | [ Upstream commit f2b0b5210f67c56a3bcdf92ff665fb285d6e0067 ] When listen() and accept() are called on an x25 socket that connect() succeeds, accept() succeeds immediately. This is because x25_connect() queues the skb to sk->sk_receive_queue, and x25_accept() dequeues it. This creates a child socket with the sk of the parent x25 socket, which can cause confusion. Fix x25_listen() to return -EINVAL if the socket has already been successfully connect()ed to avoid this issue. Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: openvswitch: fix flow memory leak in ovs_flow_cmd_newFedor Pchelkin2023-02-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0c598aed445eb45b0ee7ba405f7ece99ee349c30 ] Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is not freed when an allocation of a key fails. BUG: memory leak unreferenced object 0xffff888116668000 (size 632): comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline] [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77 [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957 [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739 [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800 [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515 [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339 [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934 [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline] [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671 [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356 [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410 [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 To fix this the patch rearranges the goto labels to reflect the order of object allocations and adds appropriate goto statements on the error paths. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: raw: fix CAN FD frame transmissions over CAN XL devicesOliver Hartkopp2023-02-091-16/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3793301cbaa4a62d83e21f685307da7671f812ab ] A CAN XL device is always capable to process CAN FD frames. The former check when sending CAN FD frames relied on the existence of a CAN FD device and did not check for a CAN XL device that would be correct too. With this patch the CAN FD feature is enabled automatically when CAN XL is switched on - and CAN FD cannot be switch off while CAN XL is enabled. This precondition also leads to a clean up and reduction of checks in the hot path in raw_rcv() and raw_sendmsg(). Some conditions are reordered to handle simple checks first. changes since v1: https://lore.kernel.org/all/20230131091012.50553-1-socketcan@hartkopp.net - fixed typo: devive -> device changes since v2: https://lore.kernel.org/all/20230131091824.51026-1-socketcan@hartkopp.net/ - reorder checks in if statements to handle simple checks first Fixes: 626332696d75 ("can: raw: add CAN XL support") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230131105613.55228-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivateZiyang Xuan2023-02-091-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d0553680f94c49bbe0e39eb50d033ba563b4212d ] The conclusion "j1939_session_deactivate() should be called with a session ref-count of at least 2" is incorrect. In some concurrent scenarios, j1939_session_deactivate can be called with the session ref-count less than 2. But there is not any problem because it will check the session active state before session putting in j1939_session_deactivate_locked(). Here is the concurrent scenario of the problem reported by syzbot and my reproduction log. cpu0 cpu1 j1939_xtp_rx_eoma j1939_xtp_rx_abort_one j1939_session_get_by_addr [kref == 2] j1939_session_get_by_addr [kref == 3] j1939_session_deactivate [kref == 2] j1939_session_put [kref == 1] j1939_session_completed j1939_session_deactivate WARN_ON_ONCE(kref < 2) ===================================================== WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70 CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:j1939_session_deactivate+0x5f/0x70 Call Trace: j1939_session_deactivate_activate_next+0x11/0x28 j1939_xtp_rx_eoma+0x12a/0x180 j1939_tp_recv+0x4a2/0x510 j1939_can_recv+0x226/0x380 can_rcv_filter+0xf8/0x220 can_receive+0x102/0x220 ? process_backlog+0xf0/0x2c0 can_rcv+0x53/0xf0 __netif_receive_skb_one_core+0x67/0x90 ? process_backlog+0x97/0x2c0 __netif_receive_skb+0x22/0x80 Fixes: 0c71437dd50d ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object") Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local addressThomas Winter2023-02-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 30e2291f61f93f7132c060190f8360df52644ec1 ] We recently found that our non-point-to-point tunnels were not generating any IPv6 link local address and instead generating an IPv6 compat address, breaking IPv6 communication on the tunnel. Previously, addrconf_gre_config always would call addrconf_addr_gen and generate a EUI64 link local address for the tunnel. Then commit e5dd729460ca changed the code path so that add_v4_addrs is called but this only generates a compat IPv6 address for non-point-to-point tunnels. I assume the compat address is specifically for SIT tunnels so have kept that only for SIT - GRE tunnels now always generate link local addresses. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local addressThomas Winter2023-02-091-22/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 23ca0c2c93406bdb1150659e720bda1cec1fad04 ] For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they come up to generate the IPv6 link local address for the interface. Recently we found that they were no longer generating IPv6 addresses. This issue would also have affected SIT tunnels. Commit e5dd729460ca changed the code path so that GRE tunnels generate an IPv6 address based on the tunnel source address. It also changed the code path so GRE tunnels don't call addrconf_addr_gen in addrconf_dev_config which is called by addrconf_sysctl_addr_gen_mode when the IN6_ADDR_GEN_MODE is changed. This patch aims to fix this issue by moving the code in addrconf_notify which calls the addr gen for GRE and SIT into a separate function and calling it in the places that expect the IPv6 address to be generated. The previous addrconf_dev_config is renamed to addrconf_eth_config since it only expected eth type interfaces and follows the addrconf_gre/sit_config format. A part of this changes means that the loopback address will be attempted to be configured when changing addr_gen_mode for lo. This should not be a problem because the address should exist anyway and if does already exist then no error is produced. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* sctp: do not check hb_timer.expires when resetting hb_timerXin Long2023-02-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8f35ae17ef565a605de5f409e04bcd49a55d7646 ] It tries to avoid the frequently hb_timer refresh in commit ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often"), and it only allows mod_timer when the new expires is after hb_timer.expires. It means even a much shorter interval for hb timer gets applied, it will have to wait until the current hb timer to time out. In sctp_do_8_2_transport_strike(), when a transport enters PF state, it expects to update the hb timer to resend a heartbeat every rto after calling sctp_transport_reset_hb_timer(), which will not work as the change mentioned above. The frequently hb_timer refresh was caused by sctp_transport_reset_timers() called in sctp_outq_flush() and it was already removed in the commit above. So we don't have to check hb_timer.expires when resetting hb_timer as it is now not called very often. Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: br_netfilter: disable sabotage_in hook after first suppressionFlorian Westphal2023-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2b272bb558f1d3a5aa95ed8a82253786fd1a48ba ] When using a xfrm interface in a bridged setup (the outgoing device is bridged), the incoming packets in the xfrm interface are only tracked in the outgoing direction. $ brctl show bridge name interfaces br_eth1 eth1 $ conntrack -L tcp 115 SYN_SENT src=192... dst=192... [UNREPLIED] ... If br_netfilter is enabled, the first (encrypted) packet is received onR eth1, conntrack hooks are called from br_netfilter emulation which allocates nf_bridge info for this skb. If the packet is for local machine, skb gets passed up the ip stack. The skb passes through ip prerouting a second time. br_netfilter ip_sabotage_in supresses the re-invocation of the hooks. After this, skb gets decrypted in xfrm layer and appears in network stack a second time (after decryption). Then, ip_sabotage_in is called again and suppresses netfilter hook invocation, even though the bridge layer never called them for the plaintext incarnation of the packet. Free the bridge info after the first suppression to avoid this. I was unable to figure out where the regression comes from, as far as i can see br_netfilter always had this problem; i did not expect that skb is looped again with different headers. Fixes: c4b0e771f906 ("netfilter: avoid using skb->nf_bridge directly") Reported-and-tested-by: Wolfgang Nothdurft <wolfgang@linogate.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/tls: tls_is_tx_ready() checked list_entryPietro Borrello2023-02-091-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit ffe2a22562444720b05bdfeb999c03e810d84cbb ] tls_is_tx_ready() checks that list_first_entry() does not return NULL. This condition can never happen. For empty lists, list_first_entry() returns the list_entry() of the head, which is a type confusion. Use list_first_entry_or_null() which returns NULL in case of empty lists. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-tls-v1-1-525bbfe6f0d0@diag.uniroma1.it Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netrom: Fix use-after-free caused by accept on already connected socketHyunwoo Kim2023-02-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 611792920925fb088ddccbe2783c7f92fdfb6b64 ] If you call listen() and accept() on an already connect()ed AF_NETROM socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and nr_accept() dequeues the skb waiting in the sk->sk_receive_queue. As a result, nr_accept() allocates and returns a sock with the sk of the parent AF_NETROM socket. And here use-after-free can happen through complex race conditions: ``` cpu0 cpu1 1. socket_2 = socket(AF_NETROM) . . listen(socket_2) accepted_socket = accept(socket_2) 2. socket_1 = socket(AF_NETROM) nr_create() // sk refcount : 1 connect(socket_1) 3. write(accepted_socket) nr_sendmsg() nr_output() nr_kick() nr_send_iframe() nr_transmit_buffer() nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() nr_process_rx_frame(sk, skb); // sk : socket_1's sk nr_state3_machine() nr_queue_rx_frame() sock_queue_rcv_skb() sock_queue_rcv_skb_reason() __sock_queue_rcv_skb() __skb_queue_tail(list, skb); // list : socket_1's sk->sk_receive_queue 4. listen(socket_1) nr_listen() uaf_socket = accept(socket_1) nr_accept() skb_dequeue(&sk->sk_receive_queue); 5. close(accepted_socket) nr_release() nr_write_internal(sk, NR_DISCREQ) nr_transmit_buffer() // NR_DISCREQ nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() // sk : socket_1's sk nr_process_rx_frame() // NR_STATE_3 nr_state3_machine() // NR_DISCREQ nr_disconnect() nr_sk(sk)->state = NR_STATE_0; 6. close(socket_1) // sk refcount : 3 nr_release() // NR_STATE_0 sock_put(sk); // sk refcount : 0 sk_free(sk); close(uaf_socket) nr_release() sock_hold(sk); // UAF ``` KASAN report by syzbot: ``` BUG: KASAN: use-after-free in nr_release+0x66/0x460 net/netrom/af_netrom.c:520 Write of size 4 at addr ffff8880235d8080 by task syz-executor564/5128 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x15e/0x461 mm/kasan/report.c:417 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:102 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:116 [inline] __refcount_add include/linux/refcount.h:193 [inline] __refcount_inc include/linux/refcount.h:250 [inline] refcount_inc include/linux/refcount.h:267 [inline] sock_hold include/net/sock.h:775 [inline] nr_release+0x66/0x460 net/netrom/af_netrom.c:520 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6c19e3c9b9 Code: Unable to access opcode bytes at 0x7f6c19e3c98f. RSP: 002b:00007fffd4ba2ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: 0000000000000116 RBX: 0000000000000003 RCX: 00007f6c19e3c9b9 RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000006 RBP: 0000000000000003 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 000055555566a2c0 R13: 0000000000000011 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] ____kasan_kmalloc mm/kasan/common.c:330 [inline] __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0xd0 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x140/0x290 net/core/sock.c:2038 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433 __sock_create+0x359/0x790 net/socket.c:1515 sock_create net/socket.c:1566 [inline] __sys_socket_create net/socket.c:1603 [inline] __sys_socket_create net/socket.c:1588 [inline] __sys_socket+0x133/0x250 net/socket.c:1636 __do_sys_socket net/socket.c:1649 [inline] __se_sys_socket net/socket.c:1647 [inline] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] __cache_free mm/slab.c:3394 [inline] __do_kmem_cache_free mm/slab.c:3580 [inline] __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587 sk_prot_free net/core/sock.c:2074 [inline] __sk_destruct+0x5df/0x750 net/core/sock.c:2166 sk_destruct net/core/sock.c:2181 [inline] __sk_free+0x175/0x460 net/core/sock.c:2192 sk_free+0x7c/0xa0 net/core/sock.c:2203 sock_put include/net/sock.h:1991 [inline] nr_release+0x39e/0x460 net/netrom/af_netrom.c:554 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd ``` To fix this issue, nr_listen() returns -EINVAL for sockets that successfully nr_connect(). Reported-by: syzbot+caa188bdfc1eeafeb418@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* skb: Do mix page pool and page referenced frags in GROAlexander Duyck2023-02-091-0/+9
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7d2c89b325874a35564db5630a459966afab04cc ] GSO should not merge page pool recycled frames with standard reference counted frames. Traditionally this didn't occur, at least not often. However as we start looking at adding support for wireless adapters there becomes the potential to mix the two due to A-MSDU repartitioning frames in the receive path. There are possibly other places where this may have occurred however I suspect they must be few and far between as we have not seen this issue until now. Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Reported-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* use less confusing names for iov_iter direction initializersAl Viro2023-02-0918-39/+40
| | | | | | | | | | | | | | | | | [ Upstream commit de4eda9de2d957ef2d6a8365a01e26a435e958cb ] READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: 6dd88fd59da8 ("vhost-scsi: unbreak any layout for response") Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listenerJakub Sitnicki2023-02-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ddce1e091757d0259107c6c0c7262df201de2b66 ] A listening socket linked to a sockmap has its sk_prot overridden. It points to one of the struct proto variants in tcp_bpf_prots. The variant depends on the socket's family and which sockmap programs are attached. A child socket cloned from a TCP listener initially inherits their sk_prot. But before cloning is finished, we restore the child's proto to the listener's original non-tcp_bpf_prots one. This happens in tcp_create_openreq_child -> tcp_bpf_clone. Today, in tcp_bpf_clone we detect if the child's proto should be restored by checking only for the TCP_BPF_BASE proto variant. This is not correct. The sk_prot of listening socket linked to a sockmap can point to to any variant in tcp_bpf_prots. If the listeners sk_prot happens to be not the TCP_BPF_BASE variant, then the child socket unintentionally is left if the inherited sk_prot by tcp_bpf_clone. This leads to issues like infinite recursion on close [1], because the child state is otherwise not set up for use with tcp_bpf_prot operations. Adjust the check in tcp_bpf_clone to detect all of tcp_bpf_prots variants. Note that it wouldn't be sufficient to check the socket state when overriding the sk_prot in tcp_bpf_update_proto in order to always use the TCP_BPF_BASE variant for listening sockets. Since commit b8b8315e39ff ("bpf, sockmap: Remove unhash handler for BPF sockmap usage") it is possible for a socket to transition to TCP_LISTEN state while already linked to a sockmap, e.g. connect() -> insert into map -> connect(AF_UNSPEC) -> listen(). [1]: https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/ Fixes: e80251555f0b ("tcp_bpf: Don't let child socket inherit parent protocol ops on copy") Reported-by: syzbot+04c21ed96d861dccc5cd@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-2-1e0ee7ac2f90@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: mctp: purge receive queues on sk destructionJeremy Kerr2023-02-061-0/+6
| | | | | | | | | | | | | | | | commit 60bd1d9008a50cc78c4033a16a6f5d78210d481c upstream. We may have pending skbs in the receive queue when the sk is being destroyed; add a destructor to purge the queue. MCTP doesn't use the error queue, so only the receive_queue is purged. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230126064551.464468-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: fix NULL pointer in skb_segment_listYan Zhai2023-02-061-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 876e8ca8366735a604bac86ff7e2732fc9d85d2d upstream. Commit 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") introduced UDP listifyed GRO. The segmentation relies on frag_list being untouched when passing through the network stack. This assumption can be broken sometimes, where frag_list itself gets pulled into linear area, leaving frag_list being NULL. When this happens it can trigger following NULL pointer dereference, and panic the kernel. Reverse the test condition should fix it. [19185.577801][ C1] BUG: kernel NULL pointer dereference, address: ... [19185.663775][ C1] RIP: 0010:skb_segment_list+0x1cc/0x390 ... [19185.834644][ C1] Call Trace: [19185.841730][ C1] <TASK> [19185.848563][ C1] __udp_gso_segment+0x33e/0x510 [19185.857370][ C1] inet_gso_segment+0x15b/0x3e0 [19185.866059][ C1] skb_mac_gso_segment+0x97/0x110 [19185.874939][ C1] __skb_gso_segment+0xb2/0x160 [19185.883646][ C1] udp_queue_rcv_skb+0xc3/0x1d0 [19185.892319][ C1] udp_unicast_rcv_skb+0x75/0x90 [19185.900979][ C1] ip_protocol_deliver_rcu+0xd2/0x200 [19185.910003][ C1] ip_local_deliver_finish+0x44/0x60 [19185.918757][ C1] __netif_receive_skb_one_core+0x8b/0xa0 [19185.927834][ C1] process_backlog+0x88/0x130 [19185.935840][ C1] __napi_poll+0x27/0x150 [19185.943447][ C1] net_rx_action+0x27e/0x5f0 [19185.951331][ C1] ? mlx5_cq_tasklet_cb+0x70/0x160 [mlx5_core] [19185.960848][ C1] __do_softirq+0xbc/0x25d [19185.968607][ C1] irq_exit_rcu+0x83/0xb0 [19185.976247][ C1] common_interrupt+0x43/0xa0 [19185.984235][ C1] asm_common_interrupt+0x22/0x40 ... [19186.094106][ C1] </TASK> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/Y9gt5EUizK1UImEP@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: Fix MLO address translation for multiple bss caseSriram R2023-02-061-0/+3
| | | | | | | | | | | | | | | | | | [ Upstream commit fa22b51ace8aa106267636f36170e940e676809c ] When multiple interfaces are present in the local interface list, new skb copy is taken before rx processing except for the first interface. The address translation happens each time only on the original skb since the hdr pointer is not updated properly to the newly created skb. As a result frames start to drop in userspace when address based checks or search fails. Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Link: https://lore.kernel.org/r/20221208040050.25922-1-quic_srirrama@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: conntrack: unify established states for SCTP pathsSriram Yagnaraman2023-02-012-62/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit a44b7651489f26271ac784b70895e8a85d0cebf4 upstream. An SCTP endpoint can start an association through a path and tear it down over another one. That means the initial path will not see the shutdown sequence, and the conntrack entry will remain in ESTABLISHED state for 5 days. By merging the HEARTBEAT_ACKED and ESTABLISHED states into one ESTABLISHED state, there remains no difference between a primary or secondary path. The timeout for the merged ESTABLISHED state is set to 210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a path doesn't see the shutdown sequence, it will expire in a reasonable amount of time. With this change in place, there is now more than one state from which we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so handle the setting of ASSURED bit whenever a state change has happened and the new state is ESTABLISHED. Removed the check for dir==REPLY since the transition to ESTABLISHED can happen only in the reply direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: mctp: mark socks as dead on unhash, prevent re-addJeremy Kerr2023-02-012-0/+7
| | | | | | | | | | | | | | | | | | [ Upstream commit b98e1a04e27fddfdc808bf46fe78eca30db89ab3 ] Once a socket has been unhashed, we want to prevent it from being re-used in a sk_key entry as part of a routing operation. This change marks the sk as SOCK_DEAD on unhash, which prevents addition into the net's key list. We need to do this during the key add path, rather than key lookup, as we release the net keys_lock between those operations. Fixes: 4a992bbd3650 ("mctp: Implement message fragmentation & reassembly") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: mctp: hold key reference when looking up a general keyPaolo Abeni2023-02-011-7/+7
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 6e54ea37e344f145665c2dc3cc534b92529e8de5 ] Currently, we have a race where we look up a sock through a "general" (ie, not directly associated with the (src,dest,tag) tuple) key, then drop the key reference while still holding the key's sock. This change expands the key reference until we've finished using the sock, and hence the sock reference too. Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>. Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: mctp: move expiry timer delete to unhashJeremy Kerr2023-02-011-3/+6
| | | | | | | | | | | | | | | | [ Upstream commit 5f41ae6fca9d40ab3cb9b0507931ef7a9b3ea50b ] Currently, we delete the key expiry timer (in sk->close) before unhashing the sk. This means that another thread may find the sk through its presence on the key list, and re-queue the timer. This change moves the timer deletion to the unhash, after we have made the key no longer observable, so the timer cannot be re-queued. Fixes: 7b14e15ae6f4 ("mctp: Implement a timeout for tags") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: mctp: add an explicit reference from a mctp_sk_key to sockJeremy Kerr2023-02-011-6/+8
| | | | | | | | | | | | | | | | | | [ Upstream commit de8a6b15d9654c3e4f672d76da9d9df8ee06331d ] Currently, we correlate the mctp_sk_key lifetime to the sock lifetime through the sock hash/unhash operations, but this is pretty tenuous, and there are cases where we may have a temporary reference to an unhashed sk. This change makes the reference more explicit, by adding a hold on the sock when it's associated with a mctp_sk_key, released on final key unref. Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* sctp: fail if no bound addresses can be used for a given scopeMarcelo Ricardo Leitner2023-02-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 458e279f861d3f61796894cd158b780765a1569f ] Currently, if you bind the socket to something like: servaddr.sin6_family = AF_INET6; servaddr.sin6_port = htons(0); servaddr.sin6_scope_id = 0; inet_pton(AF_INET6, "::1", &servaddr.sin6_addr); And then request a connect to: connaddr.sin6_family = AF_INET6; connaddr.sin6_port = htons(20000); connaddr.sin6_scope_id = if_nametoindex("lo"); inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr); What the stack does is: - bind the socket - create a new asoc - to handle the connect - copy the addresses that can be used for the given scope - try to connect But the copy returns 0 addresses, and the effect is that it ends up trying to connect as if the socket wasn't bound, which is not the desired behavior. This unexpected behavior also allows KASLR leaks through SCTP diag interface. The fix here then is, if when trying to copy the addresses that can be used for the scope used in connect() it returns 0 addresses, bail out. This is what TCP does with a similar reproducer. Reported-by: Pietro Borrello <borrello@diag.uniroma1.it> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/sched: sch_taprio: do not schedule in taprio_reset()Eric Dumazet2023-02-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ea4fdbaa2f7798cb25adbe4fd52ffc6356f097bb ] As reported by syzbot and hinted by Vinicius, I should not have added a qdisc_synchronize() call in taprio_reset() taprio_reset() can be called with qdisc spinlock held (and BH disabled) as shown in included syzbot report [1]. Only taprio_destroy() needed this synchronization, as explained in the blamed commit changelog. [1] BUG: scheduling while atomic: syz-executor150/5091/0x00000202 2 locks held by syz-executor150/5091: Modules linked in: Preemption disabled at: [<0000000000000000>] 0x0 Kernel panic - not syncing: scheduling while atomic: panic_on_warn set ... CPU: 1 PID: 5091 Comm: syz-executor150 Not tainted 6.2.0-rc3-syzkaller-00219-g010a74f52203 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 panic+0x2cc/0x626 kernel/panic.c:318 check_panic_on_warn.cold+0x19/0x35 kernel/panic.c:238 __schedule_bug.cold+0xd5/0xfe kernel/sched/core.c:5836 schedule_debug kernel/sched/core.c:5865 [inline] __schedule+0x34e4/0x5450 kernel/sched/core.c:6500 schedule+0xde/0x1b0 kernel/sched/core.c:6682 schedule_timeout+0x14e/0x2a0 kernel/time/timer.c:2167 schedule_timeout_uninterruptible kernel/time/timer.c:2201 [inline] msleep+0xb6/0x100 kernel/time/timer.c:2322 qdisc_synchronize include/net/sch_generic.h:1295 [inline] taprio_reset+0x93/0x270 net/sched/sch_taprio.c:1703 qdisc_reset+0x10c/0x770 net/sched/sch_generic.c:1022 dev_reset_queue+0x92/0x130 net/sched/sch_generic.c:1285 netdev_for_each_tx_queue include/linux/netdevice.h:2464 [inline] dev_deactivate_many+0x36d/0x9f0 net/sched/sch_generic.c:1351 dev_deactivate+0xed/0x1b0 net/sched/sch_generic.c:1374 qdisc_graft+0xe4a/0x1380 net/sched/sch_api.c:1080 tc_modify_qdisc+0xb6b/0x19a0 net/sched/sch_api.c:1689 rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559 do_syscall_x64 arch/x86/entry/common.c:50 [inline] Fixes: 3a415d59c1db ("net/sched: sch_taprio: fix possible use-after-free") Link: https://lore.kernel.org/netdev/167387581653.2747.13878941339893288655.git-patchwork-notify@kernel.org/T/ Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20230123084552.574396-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netrom: Fix use-after-free of a listening socket.Kuniyuki Iwashima2023-02-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 409db27e3a2eb5e8ef7226ca33be33361b3ed1c9 ] syzbot reported a use-after-free in do_accept(), precisely nr_accept() as sk_prot_alloc() allocated the memory and sock_put() frees it. [0] The issue could happen if the heartbeat timer is fired and nr_heartbeat_expiry() calls nr_destroy_socket(), where a socket has SOCK_DESTROY or a listening socket has SOCK_DEAD. In this case, the first condition cannot be true. SOCK_DESTROY is flagged in nr_release() only when the file descriptor is close()d, but accept() is being called for the listening socket, so the second condition must be true. Usually, the AF_NETROM listener neither starts timers nor sets SOCK_DEAD. However, the condition is met if connect() fails before listen(). connect() starts the t1 timer and heartbeat timer, and t1timer calls nr_disconnect() when timeout happens. Then, SOCK_DEAD is set, and if we call listen(), the heartbeat timer calls nr_destroy_socket(). nr_connect nr_establish_data_link(sk) nr_start_t1timer(sk) nr_start_heartbeat(sk) nr_t1timer_expiry nr_disconnect(sk, ETIMEDOUT) nr_sk(sk)->state = NR_STATE_0 sk->sk_state = TCP_CLOSE sock_set_flag(sk, SOCK_DEAD) nr_listen if (sk->sk_state != TCP_LISTEN) sk->sk_state = TCP_LISTEN nr_heartbeat_expiry switch (nr->state) case NR_STATE_0 if (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_DEAD)) nr_destroy_socket(sk) This path seems expected, and nr_destroy_socket() is called to clean up resources. Initially, there was sock_hold() before nr_destroy_socket() so that the socket would not be freed, but the commit 517a16b1a88b ("netrom: Decrease sock refcount when sock timers expire") accidentally removed it. To fix use-after-free, let's add sock_hold(). [0]: BUG: KASAN: use-after-free in do_accept+0x483/0x510 net/socket.c:1848 Read of size 8 at addr ffff88807978d398 by task syz-executor.3/5315 CPU: 0 PID: 5315 Comm: syz-executor.3 Not tainted 6.2.0-rc3-syzkaller-00165-gd9fc1511728c #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x15e/0x461 mm/kasan/report.c:417 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517 do_accept+0x483/0x510 net/socket.c:1848 __sys_accept4_file net/socket.c:1897 [inline] __sys_accept4+0x9a/0x120 net/socket.c:1927 __do_sys_accept net/socket.c:1944 [inline] __se_sys_accept net/socket.c:1941 [inline] __x64_sys_accept+0x75/0xb0 net/socket.c:1941 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa436a8c0c9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fa437784168 EFLAGS: 00000246 ORIG_RAX: 000000000000002b RAX: ffffffffffffffda RBX: 00007fa436bac050 RCX: 00007fa436a8c0c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 00007fa436ae7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffebc6700df R14: 00007fa437784300 R15: 0000000000022000 </TASK> Allocated by task 5294: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] ____kasan_kmalloc mm/kasan/common.c:330 [inline] __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0xd0 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x140/0x290 net/core/sock.c:2038 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433 __sock_create+0x359/0x790 net/socket.c:1515 sock_create net/socket.c:1566 [inline] __sys_socket_create net/socket.c:1603 [inline] __sys_socket_create net/socket.c:1588 [inline] __sys_socket+0x133/0x250 net/socket.c:1636 __do_sys_socket net/socket.c:1649 [inline] __se_sys_socket net/socket.c:1647 [inline] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 14: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] __cache_free mm/slab.c:3394 [inline] __do_kmem_cache_free mm/slab.c:3580 [inline] __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587 sk_prot_free net/core/sock.c:2074 [inline] __sk_destruct+0x5df/0x750 net/core/sock.c:2166 sk_destruct net/core/sock.c:2181 [inline] __sk_free+0x175/0x460 net/core/sock.c:2192 sk_free+0x7c/0xa0 net/core/sock.c:2203 sock_put include/net/sock.h:1991 [inline] nr_heartbeat_expiry+0x1d7/0x460 net/netrom/nr_timer.c:148 call_timer_fn+0x1da/0x7c0 kernel/time/timer.c:1700 expire_timers+0x2c6/0x5c0 kernel/time/timer.c:1751 __run_timers kernel/time/timer.c:2022 [inline] __run_timers kernel/time/timer.c:1995 [inline] run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035 __do_softirq+0x1fb/0xadc kernel/softirq.c:571 Fixes: 517a16b1a88b ("netrom: Decrease sock refcount when sock timers expire") Reported-by: syzbot+5fafd5cfe1fc91f6b352@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230120231927.51711-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETESriram Yagnaraman2023-02-011-9/+16
| | | | | | | | | | | | | | | | | [ Upstream commit a9993591fa94246b16b444eea55d84c54608282a ] RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk MUST be accepted if the vtag of the packet matches its own tag and the T bit is not set OR if it is set to its peer's vtag and the T bit is set in chunk flags. Otherwise the packet MUST be silently dropped. Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above description. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ipv4: prevent potential spectre v1 gadget in fib_metrics_match()Eric Dumazet2023-02-011-0/+2
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 5e9398a26a92fc402d82ce1f97cc67d832527da0 ] if (!type) continue; if (type > RTAX_MAX) return false; ... fi_val = fi->fib_metrics->metrics[type - 1]; @type being used as an array index, we need to prevent cpu speculation or risk leaking kernel memory content. Fixes: 5f9ae3d9e7e4 ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ipv4: prevent potential spectre v1 gadget in ip_metrics_convert()Eric Dumazet2023-02-011-0/+2
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 1d1d63b612801b3f0a39b7d4467cad0abd60e5c8 ] if (!type) continue; if (type > RTAX_MAX) return -EINVAL; ... metrics[type - 1] = val; @type being used as an array index, we need to prevent cpu speculation or risk leaking kernel memory content. Fixes: 6cf9dfd3bd62 ("net: fib: move metrics parsing to a helper") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>