summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
...
* | | tcp: fix rate_app_limited to default to 1David Morley2023-01-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initial default value of 0 for tp->rate_app_limited was incorrect, since a flow is indeed application-limited until it first sends data. Fixing the default to be 1 is generally correct but also specifically will help user-space applications avoid using the initial tcpi_delivery_rate value of 0 that persists until the connection has some non-zero bandwidth sample. Fixes: eb8329e0a04d ("tcp: export data delivery rate") Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David Morley <morleyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Tested-by: David Morley <morleyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/ulp: use consistent error code when blocking ULPPaolo Abeni2023-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The referenced commit changed the error code returned by the kernel when preventing a non-established socket from attaching the ktls ULP. Before to such a commit, the user-space got ENOTCONN instead of EINVAL. The existing self-tests depend on such error code, and the change caused a failure: RUN global.non_established ... tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107) non_established: Test failed at step #3 FAIL global.non_established In the unlikely event existing applications do the same, address the issue by restoring the prior error code in the above scenario. Note that the only other ULP performing similar checks at init time - smc_ulp_ops - also fails with ENOTCONN when trying to attach the ULP to a non-established socket. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Fixes: 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | tcp: avoid the lookup process failing to get sk in ehash tableJason Xing2023-01-192-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While one cpu is working on looking up the right socket from ehash table, another cpu is done deleting the request socket and is about to add (or is adding) the big socket from the table. It means that we could miss both of them, even though it has little chance. Let me draw a call trace map of the server side. CPU 0 CPU 1 ----- ----- tcp_v4_rcv() syn_recv_sock() inet_ehash_insert() -> sk_nulls_del_node_init_rcu(osk) __inet_lookup_established() -> __sk_nulls_add_node_rcu(sk, list) Notice that the CPU 0 is receiving the data after the final ack during 3-way shakehands and CPU 1 is still handling the final ack. Why could this be a real problem? This case is happening only when the final ack and the first data receiving by different CPUs. Then the server receiving data with ACK flag tries to search one proper established socket from ehash table, but apparently it fails as my map shows above. After that, the server fetches a listener socket and then sends a RST because it finds a ACK flag in the skb (data), which obeys RST definition in RFC 793. Besides, Eric pointed out there's one more race condition where it handles tw socket hashdance. Only by adding to the tail of the list before deleting the old one can we avoid the race if the reader has already begun the bucket traversal and it would possibly miss the head. Many thanks to Eric for great help from beginning to end. Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/ Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | | net: sched: gred: prevent races when adding offloads to statsJakub Kicinski2023-01-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Naresh reports seeing a warning that gred is calling u64_stats_update_begin() with preemption enabled. Arnd points out it's coming from _bstats_update(). We should be holding the qdisc lock when writing to stats, they are also updated from the datapath. Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/all/CA+G9fYsTr9_r893+62u6UGD3dVaCE-kN9C-Apmb2m=hxjc1Cqg@mail.gmail.com/ Fixes: e49efd5288bd ("net: sched: gred: support reporting stats from offloads") Link: https://lore.kernel.org/r/20230113044137.1383067-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | Merge tag 'wireless-2023-01-18' of ↵Jakub Kicinski2023-01-181-1/+0
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.2 Third set of fixes for v6.2. This time most of them are for drivers, only one revert for mac80211. For an important mt76 fix we had to cherry pick two commits from wireless-next. * tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()" wifi: mt76: dma: fix a regression in adding rx buffers wifi: mt76: handle possible mt76_rx_token_consume failures wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device wifi: brcmfmac: avoid handling disabled channels for survey dump ==================== Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"Eric Dumazet2023-01-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 13e5afd3d773c6fc6ca2b89027befaaaa1ea7293. ieee80211_if_free() is already called from free_netdev(ndev) because ndev->priv_destructor == ieee80211_if_free syzbot reported: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 0 PID: 10041 Comm: syz-executor.0 Not tainted 6.2.0-rc2-syzkaller-00388-g55b98837e37d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:pcpu_get_page_chunk mm/percpu.c:262 [inline] RIP: 0010:pcpu_chunk_addr_search mm/percpu.c:1619 [inline] RIP: 0010:free_percpu mm/percpu.c:2271 [inline] RIP: 0010:free_percpu+0x186/0x10f0 mm/percpu.c:2254 Code: 80 3c 02 00 0f 85 f5 0e 00 00 48 8b 3b 48 01 ef e8 cf b3 0b 00 48 ba 00 00 00 00 00 fc ff df 48 8d 78 20 48 89 f9 48 c1 e9 03 <80> 3c 11 00 0f 85 3b 0e 00 00 48 8b 58 20 48 b8 00 00 00 00 00 fc RSP: 0018:ffffc90004ba7068 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88823ffe2b80 RCX: 0000000000000004 RDX: dffffc0000000000 RSI: ffffffff81c1f4e7 RDI: 0000000000000020 RBP: ffffe8fffe8fc220 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 1ffffffff2179ab2 R12: ffff8880b983d000 R13: 0000000000000003 R14: 0000607f450fc220 R15: ffff88823ffe2988 FS: 00007fcb349de700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32220000 CR3: 000000004914f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> netdev_run_todo+0x6bf/0x1100 net/core/dev.c:10352 ieee80211_register_hw+0x2663/0x4040 net/mac80211/main.c:1411 mac80211_hwsim_new_radio+0x2537/0x4d80 drivers/net/wireless/mac80211_hwsim.c:4583 hwsim_new_radio_nl+0xa09/0x10f0 drivers/net/wireless/mac80211_hwsim.c:5176 genl_family_rcv_msg_doit.isra.0+0x1e6/0x2d0 net/netlink/genetlink.c:968 genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline] genl_rcv_msg+0x4ff/0x7e0 net/netlink/genetlink.c:1065 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 13e5afd3d773 ("wifi: mac80211: fix memory leak in ieee80211_if_add()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Zhengchao Shao <shaozhengchao@huawei.com> Cc: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230113124326.3533978-1-edumazet@google.com
* | | l2tp: prevent lockdep issue in l2tp_tunnel_register()Eric Dumazet2023-01-181-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep complains with the following lock/unlock sequence: lock_sock(sk); write_lock_bh(&sk->sk_callback_lock); [1] release_sock(sk); [2] write_unlock_bh(&sk->sk_callback_lock); We need to swap [1] and [2] to fix this issue. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Reported-by: syzbot+bbd35b345c7cab0d9a08@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20230114030137.672706-1-xiyou.wangcong@gmail.com/T/#m1164ff20628671b0f326a24cb106ab3239c70ce3 Cc: Cong Wang <cong.wang@bytedance.com> Cc: Guillaume Nault <gnault@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller2023-01-181-0/+15
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Niera Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Fix syn-retransmits until initiator gives up when connection is re-used due to rst marked as invalid, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: conntrack: handle tcp challenge acks during connection reuseFlorian Westphal2023-01-171-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a connection is re-used, following can happen: [ connection starts to close, fin sent in either direction ] > syn # initator quickly reuses connection < ack # peer sends a challenge ack > rst # rst, sequence number == ack_seq of previous challenge ack > syn # this syn is expected to pass Problem is that the rst will fail window validation, so it gets tagged as invalid. If ruleset drops such packets, we get repeated syn-retransmits until initator gives up or peer starts responding with syn/ack. Before the commit indicated in the "Fixes" tag below this used to work: The challenge-ack made conntrack re-init state based on the challenge ack itself, so the following rst would pass window validation. Add challenge-ack support: If we get ack for syn, record the ack_seq, and then check if the rst sequence number matches the last ack number seen in reverse direction. Fixes: c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets only") Reported-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | Bluetooth: Fix possible deadlock in rfcomm_sk_state_changeYing Hsu2023-01-171-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports a possible deadlock in rfcomm_sk_state_change [1]. While rfcomm_sock_connect acquires the sk lock and waits for the rfcomm lock, rfcomm_sock_release could have the rfcomm lock and hit a deadlock for acquiring the sk lock. Here's a simplified flow: rfcomm_sock_connect: lock_sock(sk) rfcomm_dlc_open: rfcomm_lock() rfcomm_sock_release: rfcomm_sock_shutdown: rfcomm_lock() __rfcomm_dlc_close: rfcomm_k_state_change: lock_sock(sk) This patch drops the sk lock before calling rfcomm_dlc_open to avoid the possible deadlock and holds sk's reference count to prevent use-after-free after rfcomm_dlc_open completes. Reported-by: syzbot+d7ce59...@syzkaller.appspotmail.com Fixes: 1804fdf6e494 ("Bluetooth: btintel: Combine setting up MSFT extension") Link: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 [1] Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: ISO: Fix possible circular locking dependencyLuiz Augusto von Dentz2023-01-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This attempts to fix the following trace: iso-tester/52 is trying to acquire lock: ffff8880024e0070 (&hdev->lock){+.+.}-{3:3}, at: iso_sock_listen+0x29e/0x440 but task is already holding lock: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x1a3/0x630 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 hci_le_remote_feat_complete_evt+0x17e/0x320 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #0 (&hdev->lock){+.+.}-{3:3}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 iso_sock_listen+0x29e/0x440 __sys_listen+0xe6/0x160 __x64_sys_listen+0x25/0x30 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by iso-tester/52: #0: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: hci_event: Fix Invalid wait contextLuiz Augusto von Dentz2023-01-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following trace caused by attempting to lock cmd_sync_work_lock while holding the rcu_read_lock: kworker/u3:2/212 is trying to lock: ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at: hci_cmd_sync_queue+0xad/0x140 other info that might help us debug this: context-{4:4} 4 locks held by kworker/u3:2/212: #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at: hci_cc_le_set_cig_params+0x64/0x4f0 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at: hci_cc_le_set_cig_params+0x2f9/0x4f0 Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: ISO: Fix possible circular locking dependencyLuiz Augusto von Dentz2023-01-171-35/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This attempts to fix the following trace: kworker/u3:1/184 is trying to acquire lock: ffff888001888130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_connect_cfm+0x2de/0x690 but task is already holding lock: ffff8880028d1c20 (&conn->lock){+.+.}-{2:2}, at: iso_connect_cfm+0x265/0x690 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&conn->lock){+.+.}-{2:2}: lock_acquire+0x176/0x3d0 _raw_spin_lock+0x2a/0x40 __iso_sock_close+0x1dd/0x4f0 iso_sock_release+0xa0/0x1b0 sock_close+0x5e/0x120 __fput+0x102/0x410 task_work_run+0xf1/0x160 exit_to_user_mode_prepare+0x170/0x180 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x2de/0x690 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); *** DEADLOCK *** Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: hci_sync: fix memory leak in hci_update_adv_data()Zhengchao Shao2023-01-171-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When hci_cmd_sync_queue() failed in hci_update_adv_data(), inst_ptr is not freed, which will cause memory leak, convert to use ERR_PTR/PTR_ERR to pass the instance to callback so no memory needs to be allocated. Fixes: 651cd3d65b0f ("Bluetooth: convert hci_update_adv_data to hci_sync") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: hci_conn: Fix memory leaksZhengchao Shao2023-01-171-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When hci_cmd_sync_queue() failed in hci_le_terminate_big() or hci_le_big_terminate(), the memory pointed by variable d is not freed, which will cause memory leak. Add release process to error path. Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: hci_sync: Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2Luiz Augusto von Dentz2023-01-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't try to use HCI_OP_LE_READ_BUFFER_SIZE_V2 if controller don't support ISO channels, but in order to check if ISO channels are supported HCI_OP_LE_READ_LOCAL_FEATURES needs to be done earlier so the features bits can be checked on hci_le_read_buffer_size_sync. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216817 Fixes: c1631dbc00c1 ("Bluetooth: hci_sync: Fix hci_read_buffer_size_sync") Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | | Bluetooth: Fix a buffer overflow in mgmt_mesh_add()Harshit Mogalapalli2023-01-171-1/+1
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Smatch Warning: net/bluetooth/mgmt_util.c:375 mgmt_mesh_add() error: __memcpy() 'mesh_tx->param' too small (48 vs 50) Analysis: 'mesh_tx->param' is array of size 48. This is the destination. u8 param[sizeof(struct mgmt_cp_mesh_send) + 29]; // 19 + 29 = 48. But in the caller 'mesh_send' we reject only when len > 50. len > (MGMT_MESH_SEND_SIZE + 31) // 19 + 31 = 50. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* | | l2tp: close all race conditions in l2tp_tunnel_register()Cong Wang2023-01-161-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in l2tp_tunnel_register() is racy in several ways: 1. It modifies the tunnel socket _after_ publishing it. 2. It calls setup_udp_tunnel_sock() on an existing socket without locking. 3. It changes sock lock class on fly, which triggers many syzbot reports. This patch amends all of them by moving socket initialization code before publishing and under sock lock. As suggested by Jakub, the l2tp lockdep class is not necessary as we can just switch to bh_lock_sock_nested(). Fixes: 37159ef2c1ae ("l2tp: fix a lockdep splat") Fixes: 6b9f34239b00 ("l2tp: fix races in tunnel creation") Reported-by: syzbot+52866e24647f9a23403f@syzkaller.appspotmail.com Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Guillaume Nault <gnault@redhat.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | l2tp: convert l2tp_tunnel_list to idrCong Wang2023-01-161-43/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | l2tp uses l2tp_tunnel_list to track all registered tunnels and to allocate tunnel ID's. IDR can do the same job. More importantly, with IDR we can hold the ID before a successful registration so that we don't need to worry about late error handling, it is not easy to rollback socket changes. This is a preparation for the following fix. Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Guillaume Nault <gnault@redhat.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/sched: sch_taprio: fix possible use-after-freeEric Dumazet2023-01-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported a nasty crash [1] in net_tx_action() which made little sense until we got a repro. This repro installs a taprio qdisc, but providing an invalid TCA_RATE attribute. qdisc_create() has to destroy the just initialized taprio qdisc, and taprio_destroy() is called. However, the hrtimer used by taprio had already fired, therefore advance_sched() called __netif_schedule(). Then net_tx_action was trying to use a destroyed qdisc. We can not undo the __netif_schedule(), so we must wait until one cpu serviced the qdisc before we can proceed. Many thanks to Alexander Potapenko for his help. [1] BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline] BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] do_raw_spin_trylock include/linux/spinlock.h:191 [inline] __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 spin_trylock include/linux/spinlock.h:359 [inline] qdisc_run_begin include/net/sch_generic.h:187 [inline] qdisc_run+0xee/0x540 include/net/pkt_sched.h:125 net_tx_action+0x77c/0x9a0 net/core/dev.c:5086 __do_softirq+0x1cc/0x7fb kernel/softirq.c:571 run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934 smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164 kthread+0x31b/0x430 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 Uninit was created at: slab_post_alloc_hook mm/slab.h:732 [inline] slab_alloc_node mm/slub.c:3258 [inline] __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970 kmalloc_reserve net/core/skbuff.c:358 [inline] __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430 alloc_skb include/linux/skbuff.h:1257 [inline] nlmsg_new include/net/netlink.h:953 [inline] netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436 netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507 rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536 __sys_sendmsg net/socket.c:2565 [inline] __do_sys_sendmsg net/socket.c:2574 [inline] __se_sys_sendmsg net/socket.c:2572 [inline] __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller2023-01-162-3/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pable Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Increase timeout to 120 seconds for netfilter selftests to fix nftables transaction tests, from Florian Westphal. 2) Fix overflow in bitmap_ip_create() due to integer arithmetics in a 64-bit bitmask, from Gavrilov Ilia. 3) Fix incorrect arithmetics in nft_payload with double-tagged vlan matching. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bitsPablo Neira Ayuso2023-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the offset + length goes over the ethernet + vlan header, then the length is adjusted to copy the bytes that are within the boundaries of the vlan_ethhdr scratchpad area. The remaining bytes beyond ethernet + vlan header are copied directly from the skbuff data area. Fix incorrect arithmetic operator: subtract, not add, the size of the vlan header in case of double-tagged packets to adjust the length accordingly to address CVE-2023-0179. Reported-by: Davide Ornaghi <d.ornaghi97@gmail.com> Fixes: f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.Gavrilov Ilia2023-01-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When first_ip is 0, last_ip is 0xFFFFFFFF, and netmask is 31, the value of an arithmetic expression 2 << (netmask - mask_bits - 1) is subject to overflow due to a failure casting operands to a larger data type before performing the arithmetic. Note that it's harmless since the value will be checked at the next step. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b9fed748185a ("netfilter: ipset: Check and reject crazy /0 input parameters") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htbRahul Rameshbabu2023-01-131-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peek at old qdisc and graft only when deleting a leaf class in the htb, rather than when deleting the htb itself. Do not peek at the qdisc of the netdev queue when destroying the htb. The caller may already have grafted a new qdisc that is not part of the htb structure being destroyed. This fix resolves two use cases. 1. Using tc to destroy the htb. - Netdev was being prematurely activated before the htb was fully destroyed. 2. Using tc to replace the htb with another qdisc (which also leads to the htb being destroyed). - Premature netdev activation like previous case. Newly grafted qdisc was also getting accidentally overwritten when destroying the htb. Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230113005528.302625-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | mptcp: netlink: respect v4/v6-only socketsMatthieu Baerts2023-01-133-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an MPTCP socket has been created with AF_INET6 and the IPV6_V6ONLY option has been set, the userspace PM would allow creating subflows using IPv4 addresses, e.g. mapped in v6. The kernel side of userspace PM will also accept creating subflows with local and remote addresses having different families. Depending on the subflow socket's family, different behaviours are expected: - If AF_INET is forced with a v6 address, the kernel will take the last byte of the IP and try to connect to that: a new subflow is created but to a non expected address. - If AF_INET6 is forced with a v4 address, the kernel will try to connect to a v4 address (v4-mapped-v6). A -EBADF error from the connect() part is then expected. It is then required to check the given families can be accepted. This is done by using a new helper for addresses family matching, taking care of IPv4 vs IPv4-mapped-IPv6 addresses. This helper will be re-used later by the in-kernel path-manager to use mixed IPv4 and IPv6 addresses. While at it, a clear error message is now reported if there are some conflicts with the families that have been passed by the userspace. Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | mptcp: explicitly specify sock family at subflow creation timePaolo Abeni2023-01-133-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let the caller specify the to-be-created subflow family. For a given MPTCP socket created with the AF_INET6 family, the current userspace PM can already ask the kernel to create subflows in v4 and v6. If "plain" IPv4 addresses are passed to the kernel, they are automatically mapped in v6 addresses "by accident". This can be problematic because the userspace will need to pass different addresses, now the v4-mapped-v6 addresses to destroy this new subflow. On the other hand, if the MPTCP socket has been created with the AF_INET family, the command to create a subflow in v6 will be accepted but the result will not be the one as expected as new subflow will be created in IPv4 using part of the v6 addresses passed to the kernel: not creating the expected subflow then. No functional change intended for the in-kernel PM where an explicit enforcement is currently in place. This arbitrary enforcement will be leveraged by other patches in a future version. Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | net: nfc: Fix use-after-free in local_cleanup()Jisoo Jang2023-01-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a use-after-free that occurs in kfree_skb() called from local_cleanup(). This could happen when killing nfc daemon (e.g. neard) after detaching an nfc device. When detaching an nfc device, local_cleanup() called from nfc_llcp_unregister_device() frees local->rx_pending and decreases local->ref by kref_put() in nfc_llcp_local_put(). In the terminating process, nfc daemon releases all sockets and it leads to decreasing local->ref. After the last release of local->ref, local_cleanup() called from local_release() frees local->rx_pending again, which leads to the bug. Setting local->rx_pending to NULL in local_cleanup() could prevent use-after-free when local_cleanup() is called twice. Found by a modified version of syzkaller. BUG: KASAN: use-after-free in kfree_skb() Call Trace: dump_stack_lvl (lib/dump_stack.c:106) print_address_description.constprop.0.cold (mm/kasan/report.c:306) kasan_check_range (mm/kasan/generic.c:189) kfree_skb (net/core/skbuff.c:955) local_cleanup (net/nfc/llcp_core.c:159) nfc_llcp_local_put.part.0 (net/nfc/llcp_core.c:172) nfc_llcp_local_put (net/nfc/llcp_core.c:181) llcp_sock_destruct (net/nfc/llcp_sock.c:959) __sk_destruct (net/core/sock.c:2133) sk_destruct (net/core/sock.c:2181) __sk_free (net/core/sock.c:2192) sk_free (net/core/sock.c:2203) llcp_sock_release (net/nfc/llcp_sock.c:646) __sock_release (net/socket.c:650) sock_close (net/socket.c:1365) __fput (fs/file_table.c:306) task_work_run (kernel/task_work.c:179) ptrace_notify (kernel/signal.c:2354) syscall_exit_to_user_mode_prepare (kernel/entry/common.c:278) syscall_exit_to_user_mode (kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:86) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:106) Allocated by task 4719: kasan_save_stack (mm/kasan/common.c:45) __kasan_slab_alloc (mm/kasan/common.c:325) slab_post_alloc_hook (mm/slab.h:766) kmem_cache_alloc_node (mm/slub.c:3497) __alloc_skb (net/core/skbuff.c:552) pn533_recv_response (drivers/nfc/pn533/usb.c:65) __usb_hcd_giveback_urb (drivers/usb/core/hcd.c:1671) usb_giveback_urb_bh (drivers/usb/core/hcd.c:1704) tasklet_action_common.isra.0 (kernel/softirq.c:797) __do_softirq (kernel/softirq.c:571) Freed by task 1901: kasan_save_stack (mm/kasan/common.c:45) kasan_set_track (mm/kasan/common.c:52) kasan_save_free_info (mm/kasan/genericdd.c:518) __kasan_slab_free (mm/kasan/common.c:236) kmem_cache_free (mm/slub.c:3809) kfree_skbmem (net/core/skbuff.c:874) kfree_skb (net/core/skbuff.c:931) local_cleanup (net/nfc/llcp_core.c:159) nfc_llcp_unregister_device (net/nfc/llcp_core.c:1617) nfc_unregister_device (net/nfc/core.c:1179) pn53x_unregister_nfc (drivers/nfc/pn533/pn533.c:2846) pn533_usb_disconnect (drivers/nfc/pn533/usb.c:579) usb_unbind_interface (drivers/usb/core/driver.c:458) device_release_driver_internal (drivers/base/dd.c:1279) bus_remove_device (drivers/base/bus.c:529) device_del (drivers/base/core.c:3665) usb_disable_device (drivers/usb/core/message.c:1420) usb_disconnect (drivers/usb/core.c:2261) hub_event (drivers/usb/core/hub.c:5833) process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:212 include/trace/events/workqueue.h:108 kernel/workqueue.c:2281) worker_thread (include/linux/list.h:282 kernel/workqueue.c:2423) kthread (kernel/kthread.c:319) ret_from_fork (arch/x86/entry/entry_64.S:301) Fixes: 3536da06db0b ("NFC: llcp: Clean local timers and works when removing a device") Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr> Link: https://lore.kernel.org/r/20230111131914.3338838-1-jisoo.jang@yonsei.ac.kr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | ethtool: add netlink attr in rss get reply only if value is not nullSudheer Mogilappagari2023-01-121-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code for RSS_GET ethtool command includes netlink attributes in reply message to user space even if they are null. Added checks to include netlink attribute in reply message only if a value is received from driver. Drivers might return null for RSS indirection table or hash key. Instead of including attributes with empty value in the reply message, add netlink attribute only if there is content. Fixes: 7112a04664bf ("ethtool: add netlink based get rss support") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/20230111235607.85509-1-sudheer.mogilappagari@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | rxrpc: Fix wrong error return in rxrpc_connect_call()David Howells2023-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix rxrpc_connect_call() to return -ENOMEM rather than 0 if it fails to look up a peer. This generated a smatch warning: net/rxrpc/call_object.c:303 rxrpc_connect_call() warn: missing error code 'ret' I think this also fixes a syzbot-found bug: rxrpc: Assertion failed - 1(0x1) == 11(0xb) is false ------------[ cut here ]------------ kernel BUG at net/rxrpc/call_object.c:645! where the call being put is in the wrong state - as would be the case if we failed to clear up correctly after the error in rxrpc_connect_call(). Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reported-and-tested-by: syzbot+4bb6356bb29d6299360e@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/202301111153.9eZRYLf1-lkp@intel.com/ Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/2438405.1673460435@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | Merge tag 'wireless-2023-01-12' of ↵Jakub Kicinski2023-01-1211-187/+176
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Some fixes, stack only for now: * iTXQ conversion fixes, various bugs reported * properly reset multiple BSSID settings * fix for a link_sta crash * fix for AP VLAN checks * fix for MLO address translation * tag 'wireless-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: fix MLO + AP_VLAN check mac80211: Fix MLO address translation for multiple bss case wifi: mac80211: reset multiple BSSID options in stop_ap() wifi: mac80211: Fix iTXQ AMPDU fragmentation handling wifi: mac80211: sdata can be NULL during AMPDU start wifi: mac80211: Proper mark iTXQs for resumption wifi: mac80211: fix initialization of rx->link and rx->link_sta ==================== Link: https://lore.kernel.org/r/20230112111941.82408-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | wifi: mac80211: fix MLO + AP_VLAN checkFelix Fietkau2023-01-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of preventing adding AP_VLAN to MLO enabled APs, this check was preventing adding more than one 4-addr AP_VLAN regardless of the MLO status. Fix this by adding missing extra checks. Fixes: ae960ee90bb1 ("wifi: mac80211: prevent VLANs on MLDs") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20221214130326.37756-1-nbd@nbd.name Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | mac80211: Fix MLO address translation for multiple bss caseSriram R2023-01-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When multiple interfaces are present in the local interface list, new skb copy is taken before rx processing except for the first interface. The address translation happens each time only on the original skb since the hdr pointer is not updated properly to the newly created skb. As a result frames start to drop in userspace when address based checks or search fails. Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Link: https://lore.kernel.org/r/20221208040050.25922-1-quic_srirrama@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | wifi: mac80211: reset multiple BSSID options in stop_ap()Aloka Dixit2023-01-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reset multiple BSSID options when all AP related configurations are reset in ieee80211_stop_ap(). Stale values result in HWSIM test failures (e.g. p2p_group_cli_invalid), if run after 'he_ap_ema'. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Link: https://lore.kernel.org/r/20221221185616.11514-1-quic_alokad@quicinc.com Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | wifi: mac80211: Fix iTXQ AMPDU fragmentation handlingAlexander Wetzel2023-01-103-14/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mac80211 must not enable aggregation wile transmitting a fragmented MPDU. Enforce that for mac80211 internal TX queues (iTXQs). Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202301021738.7cd3e6ae-oliver.sang@intel.com Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20230106223141.98696-1-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | wifi: mac80211: sdata can be NULL during AMPDU startAlexander Wetzel2023-01-102-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ieee80211_tx_ba_session_handle_start() may get NULL for sdata when a deauthentication is ongoing. Here a trace triggering the race with the hostapd test multi_ap_fronthaul_on_ap: (gdb) list *drv_ampdu_action+0x46 0x8b16 is in drv_ampdu_action (net/mac80211/driver-ops.c:396). 391 int ret = -EOPNOTSUPP; 392 393 might_sleep(); 394 395 sdata = get_bss_sdata(sdata); 396 if (!check_sdata_in_driver(sdata)) 397 return -EIO; 398 399 trace_drv_ampdu_action(local, sdata, params); 400 wlan0: moving STA 02:00:00:00:03:00 to state 3 wlan0: associated wlan0: deauthenticating from 02:00:00:00:03:00 by local choice (Reason: 3=DEAUTH_LEAVING) wlan3.sta1: Open BA session requested for 02:00:00:00:00:00 tid 0 wlan3.sta1: dropped frame to 02:00:00:00:00:00 (unauthorized port) wlan0: moving STA 02:00:00:00:03:00 to state 2 wlan0: moving STA 02:00:00:00:03:00 to state 1 wlan0: Removed STA 02:00:00:00:03:00 wlan0: Destroyed STA 02:00:00:00:03:00 BUG: unable to handle page fault for address: fffffffffffffb48 PGD 11814067 P4D 11814067 PUD 11816067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 2 PID: 133397 Comm: kworker/u16:1 Tainted: G W 6.1.0-rc8-wt+ #59 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Workqueue: phy3 ieee80211_ba_session_work [mac80211] RIP: 0010:drv_ampdu_action+0x46/0x280 [mac80211] Code: 53 48 89 f3 be 89 01 00 00 e8 d6 43 bf ef e8 21 46 81 f0 83 bb a0 1b 00 00 04 75 0e 48 8b 9b 28 0d 00 00 48 81 eb 10 0e 00 00 <8b> 93 58 09 00 00 f6 c2 20 0f 84 3b 01 00 00 8b 05 dd 1c 0f 00 85 RSP: 0018:ffffc900025ebd20 EFLAGS: 00010287 RAX: 0000000000000000 RBX: fffffffffffff1f0 RCX: ffff888102228240 RDX: 0000000080000000 RSI: ffffffff918c5de0 RDI: ffff888102228b40 RBP: ffffc900025ebd40 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888118c18ec0 R13: 0000000000000000 R14: ffffc900025ebd60 R15: ffff888018b7efb8 FS: 0000000000000000(0000) GS:ffff88817a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffb48 CR3: 0000000105228006 CR4: 0000000000170ee0 Call Trace: <TASK> ieee80211_tx_ba_session_handle_start+0xd0/0x190 [mac80211] ieee80211_ba_session_work+0xff/0x2e0 [mac80211] process_one_work+0x29f/0x620 worker_thread+0x4d/0x3d0 ? process_one_work+0x620/0x620 kthread+0xfb/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20221230121850.218810-2-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | wifi: mac80211: Proper mark iTXQs for resumptionAlexander Wetzel2023-01-105-50/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a running wake_tx_queue() call is aborted due to a hw queue stop the corresponding iTXQ is not always correctly marked for resumption: wake_tx_push_queue() can stops the queue run without setting @IEEE80211_TXQ_STOP_NETIF_TX. Without the @IEEE80211_TXQ_STOP_NETIF_TX flag __ieee80211_wake_txqs() will not schedule a new queue run and remaining frames in the queue get stuck till another frame is queued to it. Fix the issue for all drivers - also the ones with custom wake_tx_queue callbacks - by moving the logic into ieee80211_tx_dequeue() and drop the redundant @txqs_stopped. @IEEE80211_TXQ_STOP_NETIF_TX is also renamed to @IEEE80211_TXQ_DIRTY to better describe the flag. Fixes: c850e31f79f0 ("wifi: mac80211: add internal handler for wake_tx_queue") Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20221230121850.218810-1-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | wifi: mac80211: fix initialization of rx->link and rx->link_staFelix Fietkau2023-01-101-123/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some codepaths that do not initialize rx->link_sta properly. This causes a crash in places which assume that rx->link_sta is valid if rx->sta is valid. One known instance is triggered by __ieee80211_rx_h_amsdu being called from fast-rx. It results in a crash like this one: BUG: kernel NULL pointer dereference, address: 00000000000000a8 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 1 PID: 506 Comm: mt76-usb-rx phy Tainted: G E 6.1.0-debian64x+1.7 #3 Hardware name: ZOTAC ZBOX-ID92/ZBOX-IQ01/ZBOX-ID92/ZBOX-IQ01, BIOS B220P007 05/21/2014 RIP: 0010:ieee80211_deliver_skb+0x62/0x1f0 [mac80211] Code: 00 48 89 04 24 e8 9e a7 c3 df 89 c0 48 03 1c c5 a0 ea 39 a1 4c 01 6b 08 48 ff 03 48 83 7d 28 00 74 11 48 8b 45 30 48 63 55 44 <48> 83 84 d0 a8 00 00 00 01 41 8b 86 c0 11 00 00 8d 50 fd 83 fa 01 RSP: 0018:ffff999040803b10 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffb9903f496480 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff999040803ce0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d21828ac900 R13: 000000000000004a R14: ffff8d2198ed89c0 R15: ffff8d2198ed8000 FS: 0000000000000000(0000) GS:ffff8d24afe80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 0000000429810002 CR4: 00000000001706e0 Call Trace: <TASK> __ieee80211_rx_h_amsdu+0x1b5/0x240 [mac80211] ? ieee80211_prepare_and_rx_handle+0xcdd/0x1320 [mac80211] ? __local_bh_enable_ip+0x3b/0xa0 ieee80211_prepare_and_rx_handle+0xcdd/0x1320 [mac80211] ? prepare_transfer+0x109/0x1a0 [xhci_hcd] ieee80211_rx_list+0xa80/0xda0 [mac80211] mt76_rx_complete+0x207/0x2e0 [mt76] mt76_rx_poll_complete+0x357/0x5a0 [mt76] mt76u_rx_worker+0x4f5/0x600 [mt76_usb] ? mt76_get_min_avg_rssi+0x140/0x140 [mt76] __mt76_worker_fn+0x50/0x80 [mt76] kthread+0xed/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Since the initialization of rx->link and rx->link_sta is rather convoluted and duplicated in many places, clean it up by using a helper function to set it. Fixes: ccdde7c74ffd ("wifi: mac80211: properly implement MLO key handling") Fixes: b320d6c456ff ("wifi: mac80211: use correct rx link_sta instead of default") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20221230200747.19040-1-nbd@nbd.name [remove unnecessary rx->sta->sta.mlo check] Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | Merge tag 'net-6.2-rc4' of ↵Linus Torvalds2023-01-1229-1719/+1470
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc. The rxrpc changes are noticeable large: to address a recent regression has been necessary completing the threaded refactor. Current release - regressions: - rxrpc: - only disconnect calls in the I/O thread - move client call connection to the I/O thread - fix incoming call setup race - eth: mlx5: - restore pkt rate policing support - fix memory leak on updating vport counters Previous releases - regressions: - gro: take care of DODGY packets - ipv6: deduct extension header length in rawv6_push_pending_frames - tipc: fix unexpected link reset due to discovery messages Previous releases - always broken: - sched: disallow noqueue for qdisc classes - eth: ice: fix potential memory leak in ice_gnss_tty_write() - eth: ixgbe: fix pci device refcount leak - eth: mlx5: - fix command stats access after free - fix macsec possible null dereference when updating MAC security entity (SecY)" * tag 'net-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) r8152: add vendor/device ID pair for Microsoft Devkit net: stmmac: add aux timestamps fifo clearance wait bnxt: make sure we return pages to the pool net: hns3: fix wrong use of rss size during VF rss config ipv6: raw: Deduct extension header length in rawv6_push_pending_frames net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit() net: sched: disallow noqueue for qdisc classes iavf/iavf_main: actually log ->src mask when talking about it igc: Fix PPS delta between two synchronized end-points ixgbe: fix pci device refcount leak octeontx2-pf: Fix resource leakage in VF driver unbind selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure. selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns. selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad". net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY) net/mlx5e: Fix macsec ssci attribute handling in offload path net/mlx5: E-switch, Coverity: overlapping copy net/mlx5e: Don't support encap rules with gbp option net/mlx5: Fix ptp max frequency adjustment range net/mlx5e: Fix memory leak on updating vport counters ...
| * | | ipv6: raw: Deduct extension header length in rawv6_push_pending_framesHerbert Xu2023-01-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The total cork length created by ip6_append_data includes extension headers, so we must exclude them when comparing them against the IPV6_CHECKSUM offset which does not include extension headers. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: sched: disallow noqueue for qdisc classesFrederick Lawler2023-01-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While experimenting with applying noqueue to a classful queue discipline, we discovered a NULL pointer dereference in the __dev_queue_xmit() path that generates a kernel OOPS: # dev=enp0s5 # tc qdisc replace dev $dev root handle 1: htb default 1 # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit # tc qdisc add dev $dev parent 1:1 handle 10: noqueue # ping -I $dev -w 1 -c 1 1.1.1.1 [ 2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2.173217] #PF: supervisor instruction fetch in kernel mode ... [ 2.178451] Call Trace: [ 2.178577] <TASK> [ 2.178686] htb_enqueue+0x1c8/0x370 [ 2.178880] dev_qdisc_enqueue+0x15/0x90 [ 2.179093] __dev_queue_xmit+0x798/0xd00 [ 2.179305] ? _raw_write_lock_bh+0xe/0x30 [ 2.179522] ? __local_bh_enable_ip+0x32/0x70 [ 2.179759] ? ___neigh_create+0x610/0x840 [ 2.179968] ? eth_header+0x21/0xc0 [ 2.180144] ip_finish_output2+0x15e/0x4f0 [ 2.180348] ? dst_output+0x30/0x30 [ 2.180525] ip_push_pending_frames+0x9d/0xb0 [ 2.180739] raw_sendmsg+0x601/0xcb0 [ 2.180916] ? _raw_spin_trylock+0xe/0x50 [ 2.181112] ? _raw_spin_unlock_irqrestore+0x16/0x30 [ 2.181354] ? get_page_from_freelist+0xcd6/0xdf0 [ 2.181594] ? sock_sendmsg+0x56/0x60 [ 2.181781] sock_sendmsg+0x56/0x60 [ 2.181958] __sys_sendto+0xf7/0x160 [ 2.182139] ? handle_mm_fault+0x6e/0x1d0 [ 2.182366] ? do_user_addr_fault+0x1e1/0x660 [ 2.182627] __x64_sys_sendto+0x1b/0x30 [ 2.182881] do_syscall_64+0x38/0x90 [ 2.183085] entry_SYSCALL_64_after_hwframe+0x63/0xcd ... [ 2.187402] </TASK> Previously in commit d66d6c3152e8 ("net: sched: register noqueue qdisc"), NULL was set for the noqueue discipline on noqueue init so that __dev_queue_xmit() falls through for the noqueue case. This also sets a bypass of the enqueue NULL check in the register_qdisc() function for the struct noqueue_disc_ops. Classful queue disciplines make it past the NULL check in __dev_queue_xmit() because the discipline is set to htb (in this case), and then in the call to __dev_xmit_skb(), it calls into htb_enqueue() which grabs a leaf node for a class and then calls qdisc_enqueue() by passing in a queue discipline which assumes ->enqueue() is not set to NULL. Fix this by not allowing classes to be assigned to the noqueue discipline. Linux TC Notes states that classes cannot be set to the noqueue discipline. [1] Let's enforce that here. Links: 1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt Fixes: d66d6c3152e8 ("net: sched: register noqueue qdisc") Cc: stable@vger.kernel.org Signed-off-by: Frederick Lawler <fred@cloudflare.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net/sched: act_mpls: Fix warning during failed attribute validationIdo Schimmel2023-01-091-1/+7
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'TCA_MPLS_LABEL' attribute is of 'NLA_U32' type, but has a validation type of 'NLA_VALIDATE_FUNCTION'. This is an invalid combination according to the comment above 'struct nla_policy': " Meaning of `validate' field, use via NLA_POLICY_VALIDATE_FN: NLA_BINARY Validation function called for the attribute. All other Unused - but note that it's a union " This can trigger the warning [1] in nla_get_range_unsigned() when validation of the attribute fails. Despite being of 'NLA_U32' type, the associated 'min'/'max' fields in the policy are negative as they are aliased by the 'validate' field. Fix by changing the attribute type to 'NLA_BINARY' which is consistent with the above comment and all other users of NLA_POLICY_VALIDATE_FN(). As a result, move the length validation to the validation function. No regressions in MPLS tests: # ./tdc.py -f tc-tests/actions/mpls.json [...] # echo $? 0 [1] WARNING: CPU: 0 PID: 17743 at lib/nlattr.c:118 nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117 Modules linked in: CPU: 0 PID: 17743 Comm: syz-executor.0 Not tainted 6.1.0-rc8 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117 [...] Call Trace: <TASK> __netlink_policy_dump_write_attr+0x23d/0x990 net/netlink/policy.c:310 netlink_policy_dump_write_attr+0x22/0x30 net/netlink/policy.c:411 netlink_ack_tlv_fill net/netlink/af_netlink.c:2454 [inline] netlink_ack+0x546/0x760 net/netlink/af_netlink.c:2506 netlink_rcv_skb+0x1b7/0x240 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6109 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2482 ___sys_sendmsg net/socket.c:2536 [inline] __sys_sendmsg+0x197/0x230 net/socket.c:2565 __do_sys_sendmsg net/socket.c:2574 [inline] __se_sys_sendmsg net/socket.c:2572 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2572 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lore.kernel.org/netdev/CAO4mrfdmjvRUNbDyP0R03_DrD_eFCLCguz6OxZ2TYRSv0K9gxA@mail.gmail.com/ Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reported-by: Wei Chen <harperchen1110@gmail.com> Tested-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230107171004.608436-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | gro: take care of DODGY packetsEric Dumazet2023-01-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jaroslav reported a recent throughput regression with virtio_net caused by blamed commit. It is unclear if DODGY GSO packets coming from user space can be accepted by GRO engine in the future with minimal changes, and if there is any expected gain from it. In the meantime, make sure to detect and flush DODGY packets. Fixes: 5eddb24901ee ("gro: add support of (hw)gro packets to gro stack") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge tag 'rxrpc-fixes-20230107' of ↵David S. Miller2023-01-0724-1712/+1443
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix race between call connection, data transmit and call disconnect Here are patches to fix an oops[1] caused by a race between call connection, initial packet transmission and call disconnection which results in something like: kernel BUG at net/rxrpc/peer_object.c:413! when the syzbot test is run. The problem is that the connection procedure is effectively split across two threads and can get expanded by taking an interrupt, thereby adding the call to the peer error distribution list *after* it has been disconnected (say by the rxrpc socket shutting down). The easiest solution is to look at the fourth set of I/O thread conversion/SACK table expansion patches that didn't get applied[2] and take from it those patches that move call connection and disconnection into the I/O thread. Moving these things into the I/O thread means that the sequencing is managed by all being done in the same thread - and the race can no longer happen. This is preferable to introducing an extra lock as adding an extra lock would make the I/O thread have to wait for the app thread in yet another place. The changes can be considered as a number of logical parts: (1) Move all of the call state changes into the I/O thread. (2) Make client connection ID space per-local endpoint so that the I/O thread doesn't need locks to access it. (3) Move actual abort generation into the I/O thread and clean it up. If sendmsg or recvmsg want to cause an abort, they have to delegate it. (4) Offload the setting up of the security context on a connection to the thread of one of the apps that's starting a call. We don't want to be doing any sort of crypto in the I/O thread. (5) Connect calls (ie. assign them to channel slots on connections) in the I/O thread. Calls are set up by sendmsg/kafs and passed to the I/O thread to connect. Connections are allocated in the I/O thread after this. (6) Disconnect calls in the I/O thread. I've also added a patch for an unrelated bug that cropped up during testing, whereby a race can occur between an incoming call and socket shutdown. Note that whilst this fixes the original syzbot bug, another bug may get triggered if this one is fixed: INFO: rcu detected stall in corrupted rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T rcu: blocking rcu_node structures (internal RCU debug): It doesn't look this should be anything to do with rxrpc, though, as I've tested an additional patch[3] that removes practically all the RCU usage from rxrpc and it still occurs. It seems likely that it is being caused by something in the tunnelling setup that the syzbot test does, but there's not enough info to go on. It also seems unlikely to be anything to do with the afs driver as the test doesn't use that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | rxrpc: Fix incoming call setup raceDavid Howells2023-01-074-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An incoming call can race with rxrpc socket destruction, leading to a leaked call. This may result in an oops when the call timer eventually expires: BUG: kernel NULL pointer dereference, address: 0000000000000874 RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50 Call Trace: <IRQ> try_to_wake_up+0x59/0x550 ? __local_bh_enable_ip+0x37/0x80 ? rxrpc_poke_call+0x52/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] call_timer_fn+0x24/0x120 with a warning in the kernel log looking something like: rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)! incurred during rmmod of rxrpc. The 1061d is the call flags: RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED, IS_SERVICE, RELEASED but no DISCONNECTED flag (0x800), so it's an incoming (service) call and it's still connected. The race appears to be that: (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state and allocates a call - then pauses, possibly for an interrupt. (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer, discards the prealloc and releases all calls attached to the socket. (3) rxrpc_new_incoming_call() resumes, launching the new call, including its timer and attaching it to the socket. Fix this by read-locking local->services_lock to access the AF_RXRPC socket providing the service rather than RCU in rxrpc_new_incoming_call(). There's no real need to use RCU here as local->services_lock is only write-locked by the socket side in two places: when binding and when shutting down. Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
| | * | rxrpc: Move client call connection to the I/O threadDavid Howells2023-01-0613-527/+295
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
| | * | rxrpc: Move the client conn cache management to the I/O threadDavid Howells2023-01-066-86/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
| | * | rxrpc: Remove call->state_lockDavid Howells2023-01-0612-184/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the setters of call->state are now in the I/O thread and thus the state lock is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
| | * | rxrpc: Move call state changes from recvmsg to I/O threadDavid Howells2023-01-064-111/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the call state changes that are made in rxrpc_recvmsg() to the I/O thread. This means that, thenceforth, only the I/O thread does this and the call state lock can be removed. This requires the Rx phase to be ended when the last packet is received, not when it is processed. Since this now changes the rxrpc call state to SUCCEEDED before we've consumed all the data from it, rxrpc_kernel_check_life() mustn't say the call is dead until the recvmsg queue is empty (unless the call has failed). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
| | * | rxrpc: Move call state changes from sendmsg to I/O threadDavid Howells2023-01-062-58/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move all the call state changes that are made in rxrpc_sendmsg() to the I/O thread. This is a step towards removing the call state lock. This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is added to ->tx_sendmsg by sendmsg(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
| | * | rxrpc: Wrap accesses to get call state to put the barrier in one placeDavid Howells2023-01-065-24/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrap accesses to get the state of a call from outside of the I/O thread in a single place so that the barrier needed to order wrt the error code and abort code is in just that place. Also use a barrier when setting the call state and again when reading the call state such that the auxiliary completion info (error code, abort code) can be read without taking a read lock on the call state lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org