summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* rbd: harden get_lock_owner_info() a bitIlya Dryomov2023-08-031-0/+1
| | | | | | | | | | | | | commit 8ff2c64c9765446c3cef804fb99da04916603e27 upstream. - we want the exclusive lock type, so test for it directly - use sscanf() to actually parse the lock cookie and avoid admitting invalid handles - bail if locker has a blank address Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mptcp: more accurate NL event generationPaolo Abeni2023-08-031-2/+1
| | | | | | | | | | | | | | | | | commit 21d9b73a7d5241905367098d260a3c68b811da32 upstream. Currently the mptcp code generate a "new listener" event even if the actual listen() syscall fails. Address the issue moving the event generation call under the successful branch. Cc: stable@vger.kernel.org Fixes: f8c9dfbd875b ("mptcp: add pm listener events") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-2-6f60fe7137a9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tipc: stop tipc crypto on failure in tipc_node_createFedor Pchelkin2023-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit de52e17326c3e9a719c9ead4adb03467b8fae0ef ] If tipc_link_bc_create() fails inside tipc_node_create() for a newly allocated tipc node then we should stop its tipc crypto and free the resources allocated with a call to tipc_crypto_start(). As the node ref is initialized to one to that point, just put the ref on tipc_link_bc_create() error case that would lead to tipc_node_free() be eventually executed and properly clean the node and its crypto resources. Found by Linux Verification Center (linuxtesting.org). Fixes: cb8092d70a6f ("tipc: move bc link creation back to tipc_node_create") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230725214628.25246-1-pchelkin@ispras.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tipc: check return value of pskb_trim()Yuanjun Gong2023-08-031-1/+2
| | | | | | | | | | | | | | [ Upstream commit e46e06ffc6d667a89b979701288e2264f45e6a7b ] goto free_skb if an unexpected result is returned by pskb_tirm() in tipc_crypto_rcv_complete(). Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/20230725064810.5820-1-ruc_gongyuanjun@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64Lin Ma2023-08-031-0/+14
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 6c58c8816abb7b93b21fa3b1d0c1726402e5e568 ] The nla_for_each_nested parsing in function mqprio_parse_nlattr() does not check the length of the nested attribute. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as 8 byte integer and passed to priv->max_rate/min_rate. This patch adds the check based on nla_len() when check the nla_type(), which ensures that the length of these two attribute must equals sizeof(u64). Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20230725024227.426561-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: disallow rule addition to bound chain via ↵Pablo Neira Ayuso2023-08-031-2/+3
| | | | | | | | | | | | | | | | | | | | NFTA_RULE_CHAIN_ID [ Upstream commit 0ebc1064e4874d5987722a2ddbc18f94aa53b211 ] Bail out with EOPNOTSUPP when adding rule to bound chain via NFTA_RULE_CHAIN_ID. The following warning splat is shown when adding a rule to a deleted bound chain: WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Kevin Rich <kevinrich1337@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERRORPablo Neira Ayuso2023-08-031-9/+18
| | | | | | | | | | | | | | | | | | | [ Upstream commit 0a771f7b266b02d262900c75f1e175c7fe76fec2 ] On error when building the rule, the immediate expression unbinds the chain, hence objects can be deactivated by the transaction records. Otherwise, it is possible to trigger the following warning: WARNING: CPU: 3 PID: 915 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 3 PID: 915 Comm: chain-bind-err- Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic") Reported-by: Kevin Rich <kevinrich1337@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nft_set_rbtree: fix overlap expiration walkFlorian Westphal2023-08-031-6/+14
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit f718863aca469a109895cb855e6b81fff4827d71 ] The lazy gc on insert that should remove timed-out entries fails to release the other half of the interval, if any. Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0 in nftables.git and kmemleak enabled kernel. Second bug is the use of rbe_prev vs. prev pointer. If rbe_prev() returns NULL after at least one iteration, rbe_prev points to element that is not an end interval, hence it should not be removed. Lastly, check the genmask of the end interval if this is active in the current generation. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new ↵Maciej Żenczykowski2023-08-031-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | temporary address [ Upstream commit 69172f0bcb6a09110c5d2a6d792627f5095a9018 ] currently on 6.4 net/main: # ip link add dummy1 type dummy # echo 1 > /proc/sys/net/ipv6/conf/dummy1/use_tempaddr # ip link set dummy1 up # ip -6 addr add 2000::1/64 mngtmpaddr dev dummy1 # ip -6 addr show dev dummy1 11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2000::44f3:581c:8ca:3983/64 scope global temporary dynamic valid_lft 604800sec preferred_lft 86172sec inet6 2000::1/64 scope global mngtmpaddr valid_lft forever preferred_lft forever inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link valid_lft forever preferred_lft forever # ip -6 addr del 2000::44f3:581c:8ca:3983/64 dev dummy1 (can wait a few seconds if you want to, the above delete isn't [directly] the problem) # ip -6 addr show dev dummy1 11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2000::1/64 scope global mngtmpaddr valid_lft forever preferred_lft forever inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link valid_lft forever preferred_lft forever # ip -6 addr del 2000::1/64 mngtmpaddr dev dummy1 # ip -6 addr show dev dummy1 11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2000::81c9:56b7:f51a:b98f/64 scope global temporary dynamic valid_lft 604797sec preferred_lft 86169sec inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link valid_lft forever preferred_lft forever This patch prevents this new 'global temporary dynamic' address from being created by the deletion of the related (same subnet prefix) 'mngtmpaddr' (which is triggered by there already being no temporary addresses). Cc: Jiri Pirko <jiri@resnulli.us> Fixes: 53bd67491537 ("ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses") Reported-by: Xiao Ma <xiaom@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230720160022.1887942-1-maze@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around fastopenq.max_qlenEric Dumazet2023-07-272-3/+5
| | | | | | | | | | | | [ Upstream commit 70f360dd7042cb843635ece9d28335a4addff9eb ] This field can be read locklessly. Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around icsk->icsk_user_timeoutEric Dumazet2023-07-271-3/+3
| | | | | | | | | | | | [ Upstream commit 26023e91e12c68669db416b97234328a03d8e499 ] This field can be read locklessly from do_tcp_getsockopt() Fixes: dca43c75e7e5 ("tcp: Add TCP_USER_TIMEOUT socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->notsent_lowatEric Dumazet2023-07-271-2/+2
| | | | | | | | | | | | | [ Upstream commit 1aeb87bc1440c5447a7fa2d6e3c2cca52cbd206b ] tp->notsent_lowat can be read locklessly from do_tcp_getsockopt() and tcp_poll(). Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around rskq_defer_acceptEric Dumazet2023-07-271-5/+6
| | | | | | | | | | | | | [ Upstream commit ae488c74422fb1dcd807c0201804b3b5e8a322a3 ] do_tcp_getsockopt() reads rskq_defer_accept while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->linger2Eric Dumazet2023-07-271-4/+4
| | | | | | | | | | | | | [ Upstream commit 9df5335ca974e688389c875546e5819778a80d59 ] do_tcp_getsockopt() reads tp->linger2 while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around icsk->icsk_syn_retriesEric Dumazet2023-07-272-4/+4
| | | | | | | | | | | | | [ Upstream commit 3a037f0f3c4bfe44518f2fbb478aa2f99a9cd8bb ] do_tcp_getsockopt() and reqsk_timer_handler() read icsk->icsk_syn_retries while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->keepalive_probesEric Dumazet2023-07-271-2/+3
| | | | | | | | | | | | | [ Upstream commit 6e5e1de616bf5f3df1769abc9292191dfad9110a ] do_tcp_getsockopt() reads tp->keepalive_probes while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->keepalive_intvlEric Dumazet2023-07-271-2/+2
| | | | | | | | | | | | | [ Upstream commit 5ecf9d4f52ff2f1d4d44c9b68bc75688e82f13b4 ] do_tcp_getsockopt() reads tp->keepalive_intvl while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->keepalive_timeEric Dumazet2023-07-271-1/+2
| | | | | | | | | | | | | [ Upstream commit 4164245c76ff906c9086758e1c3f87082a7f5ef5 ] do_tcp_getsockopt() reads tp->keepalive_time while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->tsoffsetEric Dumazet2023-07-272-4/+5
| | | | | | | | | | | | | [ Upstream commit dd23c9f1e8d5c1d2e3d29393412385ccb9c7a948 ] do_tcp_getsockopt() reads tp->tsoffset while another cpu might change its value. Fixes: 93be6ce0e91b ("tcp: set and get per-socket timestamp") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tp->tcp_tx_delayEric Dumazet2023-07-271-2/+2
| | | | | | | | | | | | | [ Upstream commit 348b81b68b13ebd489a3e6a46aa1c384c731c919 ] do_tcp_getsockopt() reads tp->tcp_tx_delay while another cpu might change its value. Fixes: a842fe1425cb ("tcp: add optional per socket transmit delay") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: SCO: fix sco_conn related locking and validity issuesPauli Virtanen2023-07-271-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3dcaa192ac2159193bc6ab57bc5369dcb84edd8e ] Operations that check/update sk_state and access conn should hold lock_sock, otherwise they can race. The order of taking locks is hci_dev_lock > lock_sock > sco_conn_lock, which is how it is in connect/disconnect_cfm -> sco_conn_del -> sco_chan_del. Fix locking in sco_connect to take lock_sock around updating sk_state and conn. sco_conn_del must not occur during sco_connect, as it frees the sco_conn. Hold hdev->lock longer to prevent that. sco_conn_add shall return sco_conn with valid hcon. Make it so also when reusing an old SCO connection waiting for disconnect timeout (see __sco_sock_close where conn->hcon is set to NULL). This should not reintroduce the issue fixed in the earlier commit 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm"), the relevant fix of releasing lock_sock in sco_sock_connect before acquiring hdev->lock is retained. These changes mirror similar fixes earlier in ISO sockets. Fixes: 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no linkSiddh Raman Pant2023-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b4066eb04bb67e7ff66e5aaab0db4a753f37eaad ] hci_connect_sco currently returns NULL when there is no link (i.e. when hci_conn_link() returns NULL). sco_connect() expects an ERR_PTR in case of any error (see line 266 in sco.c). Thus, hcon set as NULL passes through to sco_conn_add(), which tries to get hcon->hdev, resulting in dereferencing a NULL pointer as reported by syzkaller. The same issue exists for iso_connect_cis() calling hci_connect_cis(). Thus, make hci_connect_sco() and hci_connect_cis() return ERR_PTR instead of NULL. Reported-and-tested-by: syzbot+37acd5d80d00d609d233@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=37acd5d80d00d609d233 Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon") Signed-off-by: Siddh Raman Pant <code@siddh.me> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()Douglas Anderson2023-07-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit de6dfcefd107667ce2dbedf4d9337f5ed557a4a1 ] KASAN reports that there's a use-after-free in hci_remove_adv_monitor(). Trawling through the disassembly, you can see that the complaint is from the access in bt_dev_dbg() under the HCI_ADV_MONITOR_EXT_MSFT case. The problem case happens because msft_remove_monitor() can end up freeing the monitor structure. Specifically: hci_remove_adv_monitor() -> msft_remove_monitor() -> msft_remove_monitor_sync() -> msft_le_cancel_monitor_advertisement_cb() -> hci_free_adv_monitor() Let's fix the problem by just stashing the relevant data when it's still valid. Fixes: 7cf5c2978f23 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: ISO: fix iso_conn related locking and validity issuesPauli Virtanen2023-07-271-22/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d40ae85ee62e3666f45bc61864b22121346f88ef ] sk->sk_state indicates whether iso_pi(sk)->conn is valid. Operations that check/update sk_state and access conn should hold lock_sock, otherwise they can race. The order of taking locks is hci_dev_lock > lock_sock > iso_conn_lock, which is how it is in connect/disconnect_cfm -> iso_conn_del -> iso_chan_del. Fix locking in iso_connect_cis/bis and sendmsg/recvmsg to take lock_sock around updating sk_state and conn. iso_conn_del must not occur during iso_connect_cis/bis, as it frees the iso_conn. Hold hdev->lock longer to prevent that. This should not reintroduce the issue fixed in commit 241f51931c35 ("Bluetooth: ISO: Avoid circular locking dependency"), since the we acquire locks in order. We retain the fix in iso_sock_connect to release lock_sock before iso_connect_* acquires hdev->lock. Similarly for commit 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency"). We retain the fix in iso_conn_ready to not acquire iso_conn_lock before lock_sock. iso_conn_add shall return iso_conn with valid hcon. Make it so also when reusing an old CIS connection waiting for disconnect timeout (see __iso_sock_close where conn->hcon is set to NULL). Trace with iso_conn_del after iso_chan_add in iso_connect_cis: =============================================================== iso_sock_create:771: sock 00000000be9b69b7 iso_sock_init:693: sk 000000004dff667e iso_sock_bind:827: sk 000000004dff667e 70:1a:b8:98:ff:a2 type 1 iso_sock_setsockopt:1289: sk 000000004dff667e iso_sock_setsockopt:1289: sk 000000004dff667e iso_sock_setsockopt:1289: sk 000000004dff667e iso_sock_connect:875: sk 000000004dff667e iso_connect_cis:353: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da hci_conn_add:1005: hci0 dst 28:3d:c2:4a:7e:da iso_conn_add:140: hcon 000000007b65d182 conn 00000000daf8625e __iso_chan_add:214: conn 00000000daf8625e iso_connect_cfm:1700: hcon 000000007b65d182 bdaddr 28:3d:c2:4a:7e:da status 12 iso_conn_del:187: hcon 000000007b65d182 conn 00000000daf8625e, err 16 iso_sock_clear_timer:117: sock 000000004dff667e state 3 <Note: sk_state is BT_BOUND (3), so iso_connect_cis is still running at this point> iso_chan_del:153: sk 000000004dff667e, conn 00000000daf8625e, err 16 hci_conn_del:1151: hci0 hcon 000000007b65d182 handle 65535 hci_conn_unlink:1102: hci0: hcon 000000007b65d182 hci_chan_list_flush:2780: hcon 000000007b65d182 iso_sock_getsockopt:1376: sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_getsockopt:1376: sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_shutdown:1434: sock 00000000be9b69b7, sk 000000004dff667e, how 1 __iso_sock_close:632: sk 000000004dff667e state 5 socket 00000000be9b69b7 <Note: sk_state is BT_CONNECT (5), even though iso_chan_del sets BT_CLOSED (6). Only iso_connect_cis sets it to BT_CONNECT, so it must be that iso_chan_del occurred between iso_chan_add and end of iso_connect_cis.> BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 8000000006467067 P4D 8000000006467067 PUD 3f5f067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:__iso_sock_close (net/bluetooth/iso.c:664) bluetooth =============================================================== Trace with iso_conn_del before iso_chan_add in iso_connect_cis: =============================================================== iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da ... iso_conn_add:140: hcon 0000000093bc551f conn 00000000768ae504 hci_dev_put:1487: hci0 orig refcnt 21 hci_event_packet:7607: hci0: event 0x0e hci_cmd_complete_evt:4231: hci0: opcode 0x2062 hci_cc_le_set_cig_params:3846: hci0: status 0x07 hci_sent_cmd_data:3107: hci0 opcode 0x2062 iso_connect_cfm:1703: hcon 0000000093bc551f bdaddr 28:3d:c2:4a:7e:da status 7 iso_conn_del:187: hcon 0000000093bc551f conn 00000000768ae504, err 12 hci_conn_del:1151: hci0 hcon 0000000093bc551f handle 65535 hci_conn_unlink:1102: hci0: hcon 0000000093bc551f hci_chan_list_flush:2780: hcon 0000000093bc551f __iso_chan_add:214: conn 00000000768ae504 <Note: this conn was already freed in iso_conn_del above> iso_sock_clear_timer:117: sock 0000000098323f95 state 3 general protection fault, probably for non-canonical address 0x30b29c630930aec8: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1920 Comm: bluetoothd Tainted: G E 6.3.0-rc7+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:detach_if_pending+0x28/0xd0 Code: 90 90 0f 1f 44 00 00 48 8b 47 08 48 85 c0 0f 84 ad 00 00 00 55 89 d5 53 48 83 3f 00 48 89 fb 74 7d 66 90 48 8b 03 48 8b 53 08 <> RSP: 0018:ffffb90841a67d08 EFLAGS: 00010007 RAX: 0000000000000000 RBX: ffff9141bd5061b8 RCX: 0000000000000000 RDX: 30b29c630930aec8 RSI: ffff9141fdd21e80 RDI: ffff9141bd5061b8 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb90841a67b88 R10: 0000000000000003 R11: ffffffff8613f558 R12: ffff9141fdd21e80 R13: 0000000000000000 R14: ffff9141b5976010 R15: ffff914185755338 FS: 00007f45768bd840(0000) GS:ffff9141fdd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000619000424074 CR3: 0000000009f5e005 CR4: 0000000000170ee0 Call Trace: <TASK> timer_delete+0x48/0x80 try_to_grab_pending+0xdf/0x170 __cancel_work+0x37/0xb0 iso_connect_cis+0x141/0x400 [bluetooth] =============================================================== Trace with NULL conn->hcon in state BT_CONNECT: =============================================================== __iso_sock_close:619: sk 00000000f7c71fc5 state 1 socket 00000000d90c5fe5 ... __iso_sock_close:619: sk 00000000f7c71fc5 state 8 socket 00000000d90c5fe5 iso_chan_del:153: sk 00000000f7c71fc5, conn 0000000022c03a7e, err 104 ... iso_sock_connect:862: sk 00000000129b56c3 iso_connect_cis:348: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a hci_dev_hold:1495: hci0 orig refcnt 19 __iso_chan_add:214: conn 0000000022c03a7e <Note: reusing old conn> iso_sock_clear_timer:117: sock 00000000129b56c3 state 3 ... iso_sock_ready:1485: sk 00000000129b56c3 ... iso_sock_sendmsg:1077: sock 00000000e5013966, sk 00000000129b56c3 BUG: kernel NULL pointer dereference, address: 00000000000006a8 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1403 Comm: wireplumber Tainted: G E 6.3.0-rc7+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:iso_sock_sendmsg+0x63/0x2a0 [bluetooth] =============================================================== Fixes: 241f51931c35 ("Bluetooth: ISO: Avoid circular locking dependency") Fixes: 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: hci_event: call disconnect callback before deleting connPauli Virtanen2023-07-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7f7cfcb6f0825652973b780f248603e23f16ee90 ] In hci_cs_disconnect, we do hci_conn_del even if disconnection failed. ISO, L2CAP and SCO connections refer to the hci_conn without hci_conn_get, so disconn_cfm must be called so they can clean up their conn, otherwise use-after-free occurs. ISO: ========================================================== iso_sock_connect:880: sk 00000000eabd6557 iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da ... iso_conn_add:140: hcon 000000001696f1fd conn 00000000b6251073 hci_dev_put:1487: hci0 orig refcnt 17 __iso_chan_add:214: conn 00000000b6251073 iso_sock_clear_timer:117: sock 00000000eabd6557 state 3 ... hci_rx_work:4085: hci0 Event packet hci_event_packet:7601: hci0: event 0x0f hci_cmd_status_evt:4346: hci0: opcode 0x0406 hci_cs_disconnect:2760: hci0: status 0x0c hci_sent_cmd_data:3107: hci0 opcode 0x0406 hci_conn_del:1151: hci0 hcon 000000001696f1fd handle 2560 hci_conn_unlink:1102: hci0: hcon 000000001696f1fd hci_conn_drop:1451: hcon 00000000d8521aaf orig refcnt 2 hci_chan_list_flush:2780: hcon 000000001696f1fd hci_dev_put:1487: hci0 orig refcnt 21 hci_dev_put:1487: hci0 orig refcnt 20 hci_req_cmd_complete:3978: opcode 0x0406 status 0x0c ... <no iso_* activity on sk/conn> ... iso_sock_sendmsg:1098: sock 00000000dea5e2e0, sk 00000000eabd6557 BUG: kernel NULL pointer dereference, address: 0000000000000668 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:iso_sock_sendmsg (net/bluetooth/iso.c:1112) bluetooth ========================================================== L2CAP: ================================================================== hci_cmd_status_evt:4359: hci0: opcode 0x0406 hci_cs_disconnect:2760: hci0: status 0x0c hci_sent_cmd_data:3085: hci0 opcode 0x0406 hci_conn_del:1151: hci0 hcon ffff88800c999000 handle 3585 hci_conn_unlink:1102: hci0: hcon ffff88800c999000 hci_chan_list_flush:2780: hcon ffff88800c999000 hci_chan_del:2761: hci0 hcon ffff88800c999000 chan ffff888018ddd280 ... BUG: KASAN: slab-use-after-free in hci_send_acl+0x2d/0x540 [bluetooth] Read of size 8 at addr ffff888018ddd298 by task bluetoothd/1175 CPU: 0 PID: 1175 Comm: bluetoothd Tainted: G E 6.4.0-rc4+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5b/0x90 print_report+0xcf/0x670 ? __virt_addr_valid+0xf8/0x180 ? hci_send_acl+0x2d/0x540 [bluetooth] kasan_report+0xa8/0xe0 ? hci_send_acl+0x2d/0x540 [bluetooth] hci_send_acl+0x2d/0x540 [bluetooth] ? __pfx___lock_acquire+0x10/0x10 l2cap_chan_send+0x1fd/0x1300 [bluetooth] ? l2cap_sock_sendmsg+0xf2/0x170 [bluetooth] ? __pfx_l2cap_chan_send+0x10/0x10 [bluetooth] ? lock_release+0x1d5/0x3c0 ? mark_held_locks+0x1a/0x90 l2cap_sock_sendmsg+0x100/0x170 [bluetooth] sock_write_iter+0x275/0x280 ? __pfx_sock_write_iter+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 do_iter_readv_writev+0x176/0x220 ? __pfx_do_iter_readv_writev+0x10/0x10 ? find_held_lock+0x83/0xa0 ? selinux_file_permission+0x13e/0x210 do_iter_write+0xda/0x340 vfs_writev+0x1b4/0x400 ? __pfx_vfs_writev+0x10/0x10 ? __seccomp_filter+0x112/0x750 ? populate_seccomp_data+0x182/0x220 ? __fget_light+0xdf/0x100 ? do_writev+0x19d/0x210 do_writev+0x19d/0x210 ? __pfx_do_writev+0x10/0x10 ? mark_held_locks+0x1a/0x90 do_syscall_64+0x60/0x90 ? lockdep_hardirqs_on_prepare+0x149/0x210 ? do_syscall_64+0x6c/0x90 ? lockdep_hardirqs_on_prepare+0x149/0x210 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7ff45cb23e64 Code: 15 d1 1f 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d 9d a7 0d 00 00 74 13 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89 RSP: 002b:00007fff21ae09b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff45cb23e64 RDX: 0000000000000001 RSI: 00007fff21ae0aa0 RDI: 0000000000000017 RBP: 00007fff21ae0aa0 R08: 000000000095a8a0 R09: 0000607000053f40 R10: 0000000000000001 R11: 0000000000000202 R12: 00007fff21ae0ac0 R13: 00000fffe435c150 R14: 00007fff21ae0a80 R15: 000060f000000040 </TASK> Allocated by task 771: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 __kasan_kmalloc+0xaa/0xb0 hci_chan_create+0x67/0x1b0 [bluetooth] l2cap_conn_add.part.0+0x17/0x590 [bluetooth] l2cap_connect_cfm+0x266/0x6b0 [bluetooth] hci_le_remote_feat_complete_evt+0x167/0x310 [bluetooth] hci_event_packet+0x38d/0x800 [bluetooth] hci_rx_work+0x287/0xb20 [bluetooth] process_one_work+0x4f7/0x970 worker_thread+0x8f/0x620 kthread+0x17f/0x1c0 ret_from_fork+0x2c/0x50 Freed by task 771: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 ____kasan_slab_free+0x169/0x1c0 slab_free_freelist_hook+0x9e/0x1c0 __kmem_cache_free+0xc0/0x310 hci_chan_list_flush+0x46/0x90 [bluetooth] hci_conn_cleanup+0x7d/0x330 [bluetooth] hci_cs_disconnect+0x35d/0x530 [bluetooth] hci_cmd_status_evt+0xef/0x2b0 [bluetooth] hci_event_packet+0x38d/0x800 [bluetooth] hci_rx_work+0x287/0xb20 [bluetooth] process_one_work+0x4f7/0x970 worker_thread+0x8f/0x620 kthread+0x17f/0x1c0 ret_from_fork+0x2c/0x50 ================================================================== Fixes: b8d290525e39 ("Bluetooth: clean up connection in hci_cs_disconnect") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: use RCU for hci_conn_params and iterate safely in hci_syncPauli Virtanen2023-07-275-44/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 195ef75e19287b4bc413da3e3e3722b030ac881e ] hci_update_accept_list_sync iterates over hdev->pend_le_conns and hdev->pend_le_reports, and waits for controller events in the loop body, without holding hdev lock. Meanwhile, these lists and the items may be modified e.g. by le_scan_cleanup. This can invalidate the list cursor or any other item in the list, resulting to invalid behavior (eg use-after-free). Use RCU for the hci_conn_params action lists. Since the loop bodies in hci_sync block and we cannot use RCU or hdev->lock for the whole loop, copy list items first and then iterate on the copy. Only the flags field is written from elsewhere, so READ_ONCE/WRITE_ONCE should guarantee we read valid values. Free params everywhere with hci_conn_params_free so the cleanup is guaranteed to be done properly. This fixes the following, which can be triggered e.g. by BlueZ new mgmt-tester case "Add + Remove Device Nowait - Success", or by changing hci_le_set_cig_params to always return false, and running iso-tester: ================================================================== BUG: KASAN: slab-use-after-free in hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) Read of size 8 at addr ffff888001265018 by task kworker/u3:0/32 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Workqueue: hci0 hci_cmd_sync_work Call Trace: <TASK> dump_stack_lvl (./arch/x86/include/asm/irqflags.h:134 lib/dump_stack.c:107) print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) ? __virt_addr_valid (./include/linux/mmzone.h:1915 ./include/linux/mmzone.h:2011 arch/x86/mm/physaddr.c:65) ? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) kasan_report (mm/kasan/report.c:538) ? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) ? __pfx_hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2780) ? mutex_lock (kernel/locking/mutex.c:282) ? __pfx_mutex_lock (kernel/locking/mutex.c:282) ? __pfx_mutex_unlock (kernel/locking/mutex.c:538) ? __pfx_update_passive_scan_sync (net/bluetooth/hci_sync.c:2861) hci_cmd_sync_work (net/bluetooth/hci_sync.c:306) process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538) ? __pfx_worker_thread (kernel/workqueue.c:2480) kthread (kernel/kthread.c:376) ? __pfx_kthread (kernel/kthread.c:331) ret_from_fork (arch/x86/entry/entry_64.S:314) </TASK> Allocated by task 31: kasan_save_stack (mm/kasan/common.c:46) kasan_set_track (mm/kasan/common.c:52) __kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383) hci_conn_params_add (./include/linux/slab.h:580 ./include/linux/slab.h:720 net/bluetooth/hci_core.c:2277) hci_connect_le_scan (net/bluetooth/hci_conn.c:1419 net/bluetooth/hci_conn.c:1589) hci_connect_cis (net/bluetooth/hci_conn.c:2266) iso_connect_cis (net/bluetooth/iso.c:390) iso_sock_connect (net/bluetooth/iso.c:899) __sys_connect (net/socket.c:2003 net/socket.c:2020) __x64_sys_connect (net/socket.c:2027) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) Freed by task 15: kasan_save_stack (mm/kasan/common.c:46) kasan_set_track (mm/kasan/common.c:52) kasan_save_free_info (mm/kasan/generic.c:523) __kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244) __kmem_cache_free (mm/slub.c:1807 mm/slub.c:3787 mm/slub.c:3800) hci_conn_params_del (net/bluetooth/hci_core.c:2323) le_scan_cleanup (net/bluetooth/hci_conn.c:202) process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538) kthread (kernel/kthread.c:376) ret_from_fork (arch/x86/entry/entry_64.S:314) ================================================================== Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: skip bound chain on rule flushPablo Neira Ayuso2023-07-271-0/+2
| | | | | | | | | | | | | | | | | | | [ Upstream commit 6eaf41e87a223ae6f8e7a28d6e78384ad7e407f8 ] Skip bound chain when flushing table rules, the rule that owns this chain releases these objects. Otherwise, the following warning is triggered: WARNING: CPU: 2 PID: 1217 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 2 PID: 1217 Comm: chain-flush Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Kevin Rich <kevinrich1337@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: skip bound chain in netns release pathPablo Neira Ayuso2023-07-271-0/+3
| | | | | | | | | | | | [ Upstream commit 751d460ccff3137212f47d876221534bf0490996 ] Skip bound chain from netns release path, the rule that owns this chain releases these objects. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nft_set_pipapo: fix improper element removalFlorian Westphal2023-07-271-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 87b5a5c209405cb6b57424cdfa226a6dbd349232 ] end key should be equal to start unless NFT_SET_EXT_KEY_END is present. Its possible to add elements that only have a start key ("{ 1.0.0.0 . 2.0.0.0 }") without an internval end. Insertion treats this via: if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END)) end = (const u8 *)nft_set_ext_key_end(ext)->data; else end = start; but removal side always uses nft_set_ext_key_end(). This is wrong and leads to garbage remaining in the set after removal next lookup/insert attempt will give: BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90 Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399 Call Trace: kasan_report+0x105/0x140 pipapo_get+0x8eb/0xb90 nft_pipapo_insert+0x1dc/0x1710 nf_tables_newsetelem+0x31f5/0x4e00 .. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: lonial con <kongln9170@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: can't schedule in nft_chain_validateFlorian Westphal2023-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 314c82841602a111c04a7210c21dc77e0d560242 ] Can be called via nft set element list iteration, which may acquire rcu and/or bh read lock (depends on set type). BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 2 locks held by nft/1232: #0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid #1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire Call Trace: nft_chain_validate nft_lookup_validate_setelem nft_pipapo_walk nft_lookup_validate nft_chain_validate nft_immediate_validate nft_chain_validate nf_tables_validate nf_tables_abort No choice but to move it to nf_tables_validate(). Fixes: 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: fix spurious set element insertion failureFlorian Westphal2023-07-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ddbd8be68941985f166f5107109a90ce13147c44 ] On some platforms there is a padding hole in the nft_verdict structure, between the verdict code and the chain pointer. On element insertion, if the new element clashes with an existing one and NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as the data associated with duplicated element is the same as the existing one. The data equality check uses memcmp. For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT padding area leads to spurious failure even if the verdict data is the same. This then makes the insertion fail with 'already exists' error, even though the new "key : data" matches an existing entry and userspace told the kernel that it doesn't want to receive an error indication. Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* llc: Don't drop packet from non-root netns.Kuniyuki Iwashima2023-07-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6631463b6e6673916d2481f692938f393148aa82 ] Now these upper layer protocol handlers can be called from llc_rcv() as sap->rcv_func(), which is registered by llc_sap_open(). * function which is passed to register_8022_client() -> no in-kernel user calls register_8022_client(). * snap_rcv() `- proto->rcvfunc() : registered by register_snap_client() -> aarp_rcv() and atalk_rcv() drop packets from non-root netns * stp_pdu_rcv() `- garp_protos[]->rcv() : registered by stp_proto_register() -> garp_pdu_rcv() and br_stp_rcv() are netns-aware So, we can safely remove the netns restriction in llc_rcv(). Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Revert "tcp: avoid the lookup process failing to get sk in ehash table"Kuniyuki Iwashima2023-07-272-19/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 81b3ade5d2b98ad6e0a473b0e1e420a801275592 ] This reverts commit 3f4ca5fafc08881d7a57daa20449d171f2887043. Commit 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in ehash table") reversed the order in how a socket is inserted into ehash to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are swapped. However, it introduced another lookup failure. The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU and does not have SOCK_RCU_FREE, so the socket could be reused even while it is being referenced on another CPU doing RCU lookup. Let's say a socket is reused and inserted into the same hash bucket during lookup. After the blamed commit, a new socket is inserted at the end of the list. If that happens, we will skip sockets placed after the previous position of the reused socket, resulting in ehash lookup failure. As described in Documentation/RCU/rculist_nulls.rst, we should insert a new socket at the head of the list to avoid such an issue. This issue, the swap-lookup-failure, and another variant reported in [0] can all be handled properly by adding a locked ehash lookup suggested by Eric Dumazet [1]. However, this issue could occur for every packet, thus more likely than the other two races, so let's revert the change for now. Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0] Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1] Fixes: 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in ehash table") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net:ipv6: check return value of pskb_trim()Yuanjun Gong2023-07-271-1/+2
| | | | | | | | | | | | | | [ Upstream commit 4258faa130be4ea43e5e2d839467da421b8ff274 ] goto tx_err if an unexpected result is returned by pskb_tirm() in ip6erspan_tunnel_xmit(). Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: ipv4: Use kfree_sensitive instead of kfreeWang Ming2023-07-271-1/+1
| | | | | | | | | | | | | | [ Upstream commit daa751444fd9d4184270b1479d8af49aaf1a1ee6 ] key might contain private part of the key, so better use kfree_sensitive to free it. Fixes: 38320c70d282 ("[IPSEC]: Use crypto_aead and authenc in ESP") Signed-off-by: Wang Ming <machel@vivo.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tcp_rsk(req)->ts_recentEric Dumazet2023-07-274-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eba20811f32652bc1a52d5e7cc403859b86390d9 ] TCP request sockets are lockless, tcp_rsk(req)->ts_recent can change while being read by another cpu as syzbot noticed. This is harmless, but we should annotate the known races. Note that tcp_check_req() changes req->ts_recent a bit early, we might change this in the future. BUG: KCSAN: data-race in tcp_check_req / tcp_check_req write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1: tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762 tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071 ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 do_softirq+0x7e/0xb0 kernel/softirq.c:472 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396 local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33 rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline] __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271 dev_queue_xmit include/linux/netdevice.h:3088 [inline] neigh_hh_output include/net/neighbour.h:528 [inline] neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431 dst_output include/net/dst.h:458 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533 ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547 __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399 tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline] tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693 __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877 tcp_push_pending_frames include/net/tcp.h:1952 [inline] __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline] tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343 rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52 rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422 rds_send_worker+0x42/0x1d0 net/rds/threads.c:200 process_one_work+0x3e6/0x750 kernel/workqueue.c:2408 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0: tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622 tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071 ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 run_ksoftirqd+0x17/0x20 kernel/softirq.c:939 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x1cd237f1 -> 0x1cd237f2 Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: annotate data-races around tcp_rsk(req)->txhashEric Dumazet2023-07-274-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5e5265522a9a7f91d1b0bd411d634bdaf16c80cd ] TCP request sockets are lockless, some of their fields can change while being read by another cpu as syzbot noticed. This is usually harmless, but we should annotate the known races. This patch takes care of tcp_rsk(req)->txhash, a separate one is needed for tcp_rsk(req)->ts_recent. BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1: tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213 inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880 tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665 tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 netif_receive_skb_internal net/core/dev.c:5652 [inline] netif_receive_skb+0x4a/0x310 net/core/dev.c:5711 tun_rx_batched+0x3bf/0x400 tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997 tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x4ab/0x7d0 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0: tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663 tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544 tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059 tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175 tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494 tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509 tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 netif_receive_skb_internal net/core/dev.c:5652 [inline] netif_receive_skb+0x4a/0x310 net/core/dev.c:5711 tun_rx_batched+0x3bf/0x400 tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997 tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x4ab/0x7d0 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x91d25731 -> 0xe79325cd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1a754 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: ipv4: use consistent txhash in TIME_WAIT and SYN_RECVAntoine Tenart2023-07-272-6/+12
| | | | | | | | | | | | | | | | | | | [ Upstream commit c0a8966e2bc7d31f77a7246947ebc09c1ff06066 ] When using IPv4/TCP, skb->hash comes from sk->sk_txhash except in TIME_WAIT and SYN_RECV where it's not set in the reply skb from ip_send_unicast_reply. Those packets will have a mismatched hash with others from the same flow as their hashes will be 0. IPv6 does not have the same issue as the hash is set from the socket txhash in those cases. This commits sets the hash in the reply skb from ip_send_unicast_reply, which makes the IPv4 code behaving like IPv6. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 5e5265522a9a ("tcp: annotate data-races around tcp_rsk(req)->txhash") Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: sched: cls_bpf: Undo tcf_bind_filter in case of an errorVictor Nogueira2023-07-271-52/+47
| | | | | | | | | | | | | | | | | [ Upstream commit 26a22194927e8521e304ed75c2f38d8068d55fc7 ] If cls_bpf_offload errors out, we must also undo tcf_bind_filter that was done before the error. Fix that by calling tcf_unbind_filter in errout_parms. Fixes: eadb41489fd2 ("net: cls_bpf: add support for marking filters as hardware-only") Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: sched: cls_u32: Undo refcount decrement in case update failedVictor Nogueira2023-07-271-0/+7
| | | | | | | | | | | | | | | | | | | | [ Upstream commit e8d3d78c19be0264a5692bed477c303523aead31 ] In the case of an update, when TCA_U32_LINK is set, u32_set_parms will decrement the refcount of the ht_down (struct tc_u_hnode) pointer present in the older u32 filter which we are replacing. However, if u32_replace_hw_knode errors out, the update command fails and that ht_down pointer continues decremented. To fix that, when u32_replace_hw_knode fails, check if ht_down's refcount was decremented and undo the decrement. Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.") Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knodeVictor Nogueira2023-07-271-11/+30
| | | | | | | | | | | | | | | [ Upstream commit 9cb36faedeafb9720ac236aeae2ea57091d90a09 ] When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter operation done at u32_set_parms. Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.") Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after ↵Victor Nogueira2023-07-271-23/+12
| | | | | | | | | | | | | | | | | | | mall_set_parms [ Upstream commit b3d0e0489430735e2e7626aa37e6462cdd136e9d ] In case an error occurred after mall_set_parms executed successfully, we must undo the tcf_bind_filter call it issues. Fix that by calling tcf_unbind_filter in err_replace_hw_filter label. Fixes: ec2507d2a306 ("net/sched: cls_matchall: Fix error path") Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* gso: fix dodgy bit handling for GSO_UDP_L4Yan Zhai2023-07-272-7/+12
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9840036786d90cea11a90d1f30b6dc003b34ee67 ] Commit 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.") checks DODGY bit for UDP, but for packets that can be fed directly to the device after gso_segs reset, it actually falls through to fragmentation: https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYDg+A@mail.gmail.com/ This change restores the expected behavior of GSO_UDP_L4 packets. Fixes: 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.") Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bridge: Add extack warning when enabling STP in netns.Kuniyuki Iwashima2023-07-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 56a16035bb6effb37177867cea94c13a8382f745 ] When we create an L2 loop on a bridge in netns, we will see packets storm even if STP is enabled. # unshare -n # ip link add br0 type bridge # ip link add veth0 type veth peer name veth1 # ip link set veth0 master br0 up # ip link set veth1 master br0 up # ip link set br0 type bridge stp_state 1 # ip link set br0 up # sleep 30 # ip -s link show br0 2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 956553768 12861249 0 0 0 12861249 <-. Keep TX: bytes packets errors dropped carrier collsns | increasing 1027834 11951 0 0 0 0 <-' rapidly This is because llc_rcv() drops all packets in non-root netns and BPDU is dropped. Let's add extack warning when enabling STP in netns. # unshare -n # ip link add br0 type bridge # ip link set br0 type bridge stp_state 1 Warning: bridge: STP does not work in non-root netns. Note this commit will be reverted later when we namespacify the whole LLC infra. Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe") Suggested-by: Harry Coin <hcoin@quietfountain.com> Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/ Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()Gustavo A. R. Silva2023-07-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 71e7552c90db2a2767f5c17c7ec72296b0d92061 ] -Wstringop-overflow is legitimately warning us about extra_size pontentially being zero at some point, hence potenially ending up _allocating_ zero bytes of memory for extra pointer and then trying to access such object in a call to copy_from_user(). Fix this by adding a sanity check to ensure we never end up trying to allocate zero bytes of data for extra pointer, before continue executing the rest of the code in the function. Address the following -Wstringop-overflow warning seen when built m68k architecture with allyesconfig configuration: from net/wireless/wext-core.c:11: In function '_copy_from_user', inlined from 'copy_from_user' at include/linux/uaccess.h:183:7, inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:825:7: arch/m68k/include/asm/string.h:48:25: warning: '__builtin_memset' writing 1 or more bytes into a region of size 0 overflows the destination [-Wstringop-overflow=] 48 | #define memset(d, c, n) __builtin_memset(d, c, n) | ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/uaccess.h:153:17: note: in expansion of macro 'memset' 153 | memset(to + (n - res), 0, res); | ^~~~~~ In function 'kmalloc', inlined from 'kzalloc' at include/linux/slab.h:694:9, inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:819:10: include/linux/slab.h:577:16: note: at offset 1 into destination object of size 0 allocated by '__kmalloc' 577 | return __kmalloc(size, flags); | ^~~~~~~~~~~~~~~~~~~~~~ This help with the ongoing efforts to globally enable -Wstringop-overflow. Link: https://github.com/KSPP/linux/issues/315 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/ZItSlzvIpjdjNfd8@work Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* devlink: report devlink_port_type_warn source devicePetr Oros2023-07-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a52305a81d6bb74b90b400dfa56455d37872fe4b ] devlink_port_type_warn is scheduled for port devlink and warning when the port type is not set. But from this warning it is not easy found out which device (driver) has no devlink port set. [ 3709.975552] Type was not set for devlink port. [ 3709.975579] WARNING: CPU: 1 PID: 13092 at net/devlink/leftover.c:6775 devlink_port_type_warn+0x11/0x20 [ 3709.993967] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nfnetlink bluetooth rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs vhost_net vhost vhost_iotlb tap tun bridge stp llc qrtr intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal mlx5_ib intel_powerclamp coretemp dell_wmi ledtrig_audio sparse_keymap ipmi_ssif kvm_intel ib_uverbs rfkill ib_core video kvm iTCO_wdt acpi_ipmi intel_vsec irqbypass ipmi_si iTCO_vendor_support dcdbas ipmi_devintf mei_me ipmi_msghandler rapl mei intel_cstate isst_if_mmio isst_if_mbox_pci dell_smbios intel_uncore isst_if_common i2c_i801 dell_wmi_descriptor wmi_bmof i2c_smbus intel_pch_thermal pcspkr acpi_power_meter xfs libcrc32c sd_mod sg nvme_tcp mgag200 i2c_algo_bit nvme_fabrics drm_shmem_helper drm_kms_helper nvme syscopyarea ahci sysfillrect sysimgblt nvme_core fb_sys_fops crct10dif_pclmul libahci mlx5_core sfc crc32_pclmul nvme_common drm [ 3709.994030] crc32c_intel mtd t10_pi mlxfw libata tg3 mdio megaraid_sas psample ghash_clmulni_intel pci_hyperv_intf wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse [ 3710.108431] CPU: 1 PID: 13092 Comm: kworker/1:1 Kdump: loaded Not tainted 5.14.0-319.el9.x86_64 #1 [ 3710.108435] Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.8.2 09/14/2022 [ 3710.108437] Workqueue: events devlink_port_type_warn [ 3710.108440] RIP: 0010:devlink_port_type_warn+0x11/0x20 [ 3710.108443] Code: 84 76 fe ff ff 48 c7 03 20 0e 1a ad 31 c0 e9 96 fd ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 48 c7 c7 18 24 4e ad e8 ef 71 62 ff <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f6 87 [ 3710.108445] RSP: 0018:ff3b6d2e8b3c7e90 EFLAGS: 00010282 [ 3710.108447] RAX: 0000000000000000 RBX: ff366d6580127080 RCX: 0000000000000027 [ 3710.108448] RDX: 0000000000000027 RSI: 00000000ffff86de RDI: ff366d753f41f8c8 [ 3710.108449] RBP: ff366d658ff5a0c0 R08: ff366d753f41f8c0 R09: ff3b6d2e8b3c7e18 [ 3710.108450] R10: 0000000000000001 R11: 0000000000000023 R12: ff366d753f430600 [ 3710.108451] R13: ff366d753f436900 R14: 0000000000000000 R15: ff366d753f436905 [ 3710.108452] FS: 0000000000000000(0000) GS:ff366d753f400000(0000) knlGS:0000000000000000 [ 3710.108453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3710.108454] CR2: 00007f1c57bc74e0 CR3: 000000111d26a001 CR4: 0000000000773ee0 [ 3710.108456] PKRU: 55555554 [ 3710.108457] Call Trace: [ 3710.108458] <TASK> [ 3710.108459] process_one_work+0x1e2/0x3b0 [ 3710.108466] ? rescuer_thread+0x390/0x390 [ 3710.108468] worker_thread+0x50/0x3a0 [ 3710.108471] ? rescuer_thread+0x390/0x390 [ 3710.108473] kthread+0xdd/0x100 [ 3710.108477] ? kthread_complete_and_exit+0x20/0x20 [ 3710.108479] ret_from_fork+0x1f/0x30 [ 3710.108485] </TASK> [ 3710.108486] ---[ end trace 1b4b23cd0c65d6a0 ]--- After patch: [ 402.473064] ice 0000:41:00.0: Type was not set for devlink port. [ 402.473064] ice 0000:41:00.1: Type was not set for devlink port. Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230615095447.8259-1-poros@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* devlink: make health report on unregistered instance warn just onceJakub Kicinski2023-07-271-1/+1
| | | | | | | | | | | | | | | | | | | [ Upstream commit 6f4b98147b8dfcabacb19b5c6abd087af66d0049 ] Devlink health is involved in error recovery. Machines in bad state tend to be fairly unreliable, and occasionally get stuck in error loops. Even with a reasonable grace period devlink health may get a thousand reports in an hour. In case of reporting on an unregistered devlink instance the subsequent reports don't add much value. Switch to WARN_ON_ONCE() to avoid flooding dmesg and fleet monitoring dashboards. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230531015523.48961-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpf: tcp: Avoid taking fast sock lock in iteratorAditi Ghag2023-07-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9378096e8a656fb5c4099b26b1370c56f056eab9 ] This is a preparatory commit to replace `lock_sock_fast` with `lock_sock`,and facilitate BPF programs executed from the TCP sockets iterator to be able to destroy TCP sockets using the bpf_sock_destroy kfunc (implemented in follow-up commits). Previously, BPF TCP iterator was acquiring the sock lock with BH disabled. This led to scenarios where the sockets hash table bucket lock can be acquired with BH enabled in some path versus disabled in other. In such situation, kernel issued a warning since it thinks that in the BH enabled path the same bucket lock *might* be acquired again in the softirq context (BH disabled), which will lead to a potential dead lock. Since bpf_sock_destroy also happens in a process context, the potential deadlock warning is likely a false alarm. Here is a snippet of annotated stack trace that motivated this change: ``` Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&h->lhash2[i].lock); local_bh_disable(); lock(&h->lhash2[i].lock); kernel imagined possible scenario: local_bh_disable(); /* Possible softirq */ lock(&h->lhash2[i].lock); *** Potential Deadlock *** process context: lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> Acquire (bucket) lhash2.lock with BH enabled __inet_hash+0x4b/0x210 inet_csk_listen_start+0xe6/0x100 inet_listen+0x95/0x1d0 __sys_listen+0x69/0xb0 __x64_sys_listen+0x14/0x20 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc bpf_sock_destroy run from iterator: lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> Acquire (bucket) lhash2.lock with BH disabled inet_unhash+0x9a/0x110 tcp_set_state+0x6a/0x210 tcp_abort+0x10d/0x200 bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9 bpf_iter_run_prog+0x1ff/0x340 ------> lock_sock_fast that acquires sock lock with BH disabled bpf_iter_tcp_seq_show+0xca/0x190 bpf_seq_read+0x177/0x450 ``` Also, Yonghong reported a deadlock for non-listening TCP sockets that this change resolves. Previously, `lock_sock_fast` held the sock spin lock with BH which was again being acquired in `tcp_abort`: ``` watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331] RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500 Call Trace: <TASK> _raw_spin_lock+0x84/0x90 tcp_abort+0x13c/0x1f0 bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89 bpf_iter_run_prog+0x1aa/0x2c0 ? preempt_count_sub+0x1c/0xd0 ? from_kuid_munged+0x1c8/0x210 bpf_iter_tcp_seq_show+0x14e/0x1b0 bpf_seq_read+0x36c/0x6a0 bpf_iter_tcp_seq_show lock_sock_fast __lock_sock_fast spin_lock_bh(&sk->sk_lock.slock); /* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */ ... tcp_abort local_bh_disable(); spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk) ``` With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning: ``` lock_sock lock_sock_nested spin_lock_bh(&sk->sk_lock.slock); : spin_unlock_bh(&sk->sk_lock.slock); ``` Acked-by: Yonghong Song <yhs@meta.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* can: bcm: Fix UAF in bcm_proc_show()YueHaibing2023-07-271-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 55c3b96074f3f9b0aee19bf93cd71af7516582bb upstream. BUG: KASAN: slab-use-after-free in bcm_proc_show+0x969/0xa80 Read of size 8 at addr ffff888155846230 by task cat/7862 CPU: 1 PID: 7862 Comm: cat Not tainted 6.5.0-rc1-00153-gc8746099c197 #230 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xd5/0x150 print_report+0xc1/0x5e0 kasan_report+0xba/0xf0 bcm_proc_show+0x969/0xa80 seq_read_iter+0x4f6/0x1260 seq_read+0x165/0x210 proc_reg_read+0x227/0x300 vfs_read+0x1d5/0x8d0 ksys_read+0x11e/0x240 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Allocated by task 7846: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x9e/0xa0 bcm_sendmsg+0x264b/0x44e0 sock_sendmsg+0xda/0x180 ____sys_sendmsg+0x735/0x920 ___sys_sendmsg+0x11d/0x1b0 __sys_sendmsg+0xfa/0x1d0 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 7846: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 ____kasan_slab_free+0x161/0x1c0 slab_free_freelist_hook+0x119/0x220 __kmem_cache_free+0xb4/0x2e0 rcu_core+0x809/0x1bd0 bcm_op is freed before procfs entry be removed in bcm_release(), this lead to bcm_proc_show() may read the freed bcm_op. Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230715092543.15548-1-yuehaibing@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/ncsi: change from ndo_set_mac_address to dev_set_mac_addressIvan Mikhaylov2023-07-231-2/+3
| | | | | | | | | | | | | | | | | | | | | commit 790071347a0a1a89e618eedcd51c687ea783aeb3 upstream. Change ndo_set_mac_address to dev_set_mac_address because dev_set_mac_address provides a way to notify network layer about MAC change. In other case, services may not aware about MAC change and keep using old one which set from network adapter driver. As example, DHCP client from systemd do not update MAC address without notification from net subsystem which leads to the problem with acquiring the right address from DHCP server. Fixes: cb10c7c0dfd9e ("net/ncsi: Add NCSI Broadcom OEM command") Cc: stable@vger.kernel.org # v6.0+ 2f38e84 net/ncsi: make one oem_gma function for all mfr id Signed-off-by: Paul Fertser <fercerpav@gmail.com> Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>