summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx4_en: update moderation when config resetKevin(Yudong) Yang2021-03-053-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug that the moderation config will not be applied when calling mlx4_en_reset_config. For example, when turning on rx timestamping, mlx4_en_reset_config() will be called, causing the NIC to forget previous moderation config. This fix is in phase with a previous fix: commit 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called") Tested: Before this patch, on a host with NIC using mlx4, run netserver and stream TCP to the host at full utilization. $ sar -I SUM 1 INTR intr/s 14:03:56 sum 48758.00 After rx hwtstamp is enabled: $ sar -I SUM 1 14:10:38 sum 317771.00 We see the moderation is not working properly and issued 7x more interrupts. After the patch, and turned on rx hwtstamp, the rate of interrupts is as expected: $ sar -I SUM 1 14:52:11 sum 49332.00 Fixes: 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called") Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> CC: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2021-03-059-40/+128
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf 2021-03-04 The following pull-request contains BPF updates for your *net* tree. We've added 7 non-merge commits during the last 4 day(s) which contain a total of 9 files changed, 128 insertions(+), 40 deletions(-). The main changes are: 1) Fix 32-bit cmpxchg, from Brendan. 2) Fix atomic+fetch logic, from Ilya. 3) Fix usage of bpf_csum_diff in selftests, from Yauheni. ====================
| * bpf: Explicitly zero-extend R0 after 32-bit cmpxchgBrendan Jackman2021-03-044-1/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Ilya and explained in the new comment, there's a discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads the value from memory into r0, while x86 only does so when r0 and the value in memory are different. The same issue affects s390. At first this might sound like pure semantics, but it makes a real difference when the comparison is 32-bit, since the load will zero-extend r0/rax. The fix is to explicitly zero-extend rax after doing such a CMPXCHG. Since this problem affects multiple archs, this is done in the verifier by patching in a BPF_ZEXT_REG instruction after every 32-bit cmpxchg. Any archs that don't need such manual zero-extension can do a look-ahead with insn_is_zext to skip the unnecessary mov. Note this still goes on top of Ilya's patch: https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com/T/#u Differences v5->v6[1]: - Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it and removed 'inline' to match the style of the is_*_function helpers. - Fixed up comments in verifier test (thanks for the careful review, Martin!) Differences v4->v5[1]: - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin for suggesting this. Differences v3->v4[1]: - Moved the optimization against pointless zext into the correct place: opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls. Differences v2->v3[1]: - Moved patching into fixup_bpf_calls (patch incoming to rename this function) - Added extra commentary on bpf_jit_needs_zext - Added check to avoid adding a pointless zext(r0) if there's already one there. Difference v1->v2[1]: Now solved centrally in the verifier instead of specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions! [1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") Signed-off-by: Brendan Jackman <jackmanb@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Account for BPF_FETCH in insn_has_def32()Ilya Leoshkevich2021-03-041-31/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | insn_has_def32() returns false for 32-bit BPF_FETCH insns. This makes adjust_insn_aux_data() incorrectly set zext_dst, as can be seen in [1]. This happens because insn_no_def() does not know about the BPF_FETCH variants of BPF_STX. Fix in two steps. First, replace insn_no_def() with insn_def_regno(), which returns the register an insn defines. Normally insn_no_def() calls are followed by insn->dst_reg uses; replace those with the insn_def_regno() return value. Second, adjust the BPF_STX special case in is_reg64() to deal with queries made from opt_subreg_zext_lo32_rnd_hi32(), where the state information is no longer available. Add a comment, since the purpose of this special case is not clear at first glance. [1] https://lore.kernel.org/bpf/20210223150845.1857620-1-jackmanb@google.com/ Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com
| * libbpf: Clear map_info before each bpf_obj_get_info_by_fdMaciej Fijalkowski2021-03-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xsk_lookup_bpf_maps, based on prog_fd, looks whether current prog has a reference to XSKMAP. BPF prog can include insns that work on various BPF maps and this is covered by iterating through map_ids. The bpf_map_info that is passed to bpf_obj_get_info_by_fd for filling needs to be cleared at each iteration, so that it doesn't contain any outdated fields and that is currently missing in the function of interest. To fix that, zero-init map_info via memset before each bpf_obj_get_info_by_fd call. Also, since the area of this code is touched, in general strcmp is considered harmful, so let's convert it to strncmp and provide the size of the array name for current map_info. While at it, do s/continue/break/ once we have found the xsks_map to terminate the search. Fixes: 5750902a6e9b ("libbpf: proper XSKMAP cleanup") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210303185636.18070-4-maciej.fijalkowski@intel.com
| * samples, bpf: Add missing munmap in xdpsockMaciej Fijalkowski2021-03-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | We mmap the umem region, but we never munmap it. Add the missing call at the end of the cleanup. Fixes: 3945b37a975d ("samples/bpf: use hugepages in xdpsock app") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210303185636.18070-3-maciej.fijalkowski@intel.com
| * xsk: Remove dangling function declaration from header fileMaciej Fijalkowski2021-03-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | xdp_umem_query() is dead for a long time, drop the declaration from include/linux/netdevice.h Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210303185636.18070-2-maciej.fijalkowski@intel.com
| * selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifierYauheni Kaliuta2021-03-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The verifier test labelled "valid read map access into a read-only array 2" calls the bpf_csum_diff() helper and checks its return value. However, architecture implementations of csum_partial() (which is what the helper uses) differ in whether they fold the return value to 16 bit or not. For example, x86 version has ... if (unlikely(odd)) { result = from32to16(result); result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); } ... while generic lib/checksum.c does: result = from32to16(result); if (odd) result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); This makes the helper return different values on different architectures, breaking the test on non-x86. To fix this, add an additional instruction to always mask the return value to 16 bits, and update the expected return value accordingly. Fixes: fb2abb73e575 ("bpf, selftest: test {rd, wr}only flags and direct value access") Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210228103017.320240-1-yauheni.kaliuta@redhat.com
| * selftests/bpf: Use the last page in test_snprintf_btf on s390Ilya Leoshkevich2021-03-021-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test_snprintf_btf fails on s390, because NULL points to a readable struct lowcore there. Fix by using the last page instead. Error message example: printing fffffffffffff000 should generate error, got (361) Fixes: 076a95f5aff2 ("selftests/bpf: Add bpf_snprintf_btf helper tests") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210227051726.121256-1-iii@linux.ibm.com
* | cipso,calipso: resolve a number of problems with the DOI refcountsPaul Moore2021-03-043-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current CIPSO and CALIPSO refcounting scheme for the DOI definitions is a bit flawed in that we: 1. Don't correctly match gets/puts in netlbl_cipsov4_list(). 2. Decrement the refcount on each attempt to remove the DOI from the DOI list, only removing it from the list once the refcount drops to zero. This patch fixes these problems by adding the missing "puts" to netlbl_cipsov4_list() and introduces a more conventional, i.e. not-buggy, refcounting mechanism to the DOI definitions. Upon the addition of a DOI to the DOI list, it is initialized with a refcount of one, removing a DOI from the list removes it from the list and drops the refcount by one; "gets" and "puts" behave as expected with respect to refcounts, increasing and decreasing the DOI's refcount by one. Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence counts") Fixes: d7cce01504a0 ("netlabel: Add support for removing a CALIPSO DOI.") Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ibmvnic: always store valid MAC addressJiri Wiesner2021-03-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last change to ibmvnic_set_mac(), 8fc3672a8ad3, meant to prevent users from setting an invalid MAC address on an ibmvnic interface that has not been brought up yet. The change also prevented the requested MAC address from being stored by the adapter object for an ibmvnic interface when the state of the ibmvnic interface is VNIC_PROBED - that is after probing has finished but before the ibmvnic interface is brought up. The MAC address stored by the adapter object is used and sent to the hypervisor for checking when an ibmvnic interface is brought up. The ibmvnic driver ignoring the requested MAC address when in VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more than one slave to malfunction. The bonding code must be able to change the MAC address of its slaves before they are brought up during enslaving. The inability of kernels with 8fc3672a8ad3 to set the MAC addresses of bonding slaves is observable in the output of "ip address show". The MAC addresses of the slaves are the same as the MAC address of the bond on a working system whereas the slaves retain their original MAC addresses on a system with a malfunctioning LACP bond. Fixes: 8fc3672a8ad3 ("ibmvnic: fix ibmvnic_set_mac") Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netdevsim: init u64 stats for 32bit hardwareHillf Danton2021-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Init the u64 stats in order to avoid the lockdep prints on the 32bit hardware like INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0 Hardware name: ARM-Versatile Express Backtrace: [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252) [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline]) [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120) [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline]) [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247) [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711) [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442) [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415) [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline]) [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline]) [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline]) [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70) [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405) [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211) [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783) [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798) [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline]) [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline]) [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839) [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103) [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317) [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941) [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline]) [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119) [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287) [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554) [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740) [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846) [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431) [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914) [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961) [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491) [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109) [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182) [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline]) [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215) [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122) [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139) [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296) [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline]) [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline]) [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605) [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658) [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline]) [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667) [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64) Fixes: 83c9e13aa39a ("netdevsim: add software driver for testing offloads") Reported-by: syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'mptcp-fixes'David S. Miller2021-03-042-67/+112
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mat Martineau says: ==================== mptcp: Fixes for v5.12 These patches from the MPTCP tree fix a few multipath TCP issues: Patches 1 and 5 clear some stale pointers when subflows close. Patches 2, 4, and 9 plug some memory leaks. Patch 3 fixes a memory accounting error identified by syzkaller. Patches 6 and 7 fix a race condition that slowed data transmission. Patch 8 adds missing wakeups when write buffer space is freed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: free resources when the port number is mismatchedGeliang Tang2021-03-041-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the port number is mismatched with the announced ones, use 'goto dispose_child' to free the resources instead of using 'goto out'. This patch also moves the port number checking code in subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx will fail in dispose_child. Fixes: 5bc56388c74f ("mptcp: add port number check for MP_JOIN") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: fix missing wakeupPaolo Abeni2021-03-041-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __mptcp_clean_una() can free write memory and should wake-up user-space processes when needed. When such function is invoked by the MPTCP receive path, the wakeup is not needed, as the TCP stack will later trigger subflow_write_space which will do the wakeup as needed. Other __mptcp_clean_una() call sites need an additional wakeup check Let's bundle the relevant code in a new helper and use it. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165 Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions") Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: fix race in release_cbPaolo Abeni2021-03-041-12/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we receive a MPTCP_PUSH_PENDING even from a subflow when mptcp_release_cb() is serving the previous one, the latter will be delayed up to the next release_sock(msk). Address the issue implementing a test/serve loop for such event. Additionally rename the push helper to __mptcp_push_pending() to be more consistent with the existing code. Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: factor out __mptcp_retrans helper()Paolo Abeni2021-03-041-43/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | Will simplify the following patch, no functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: reset 'first' and ack_hint on subflow closeFlorian Westphal2021-03-041-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like with last_snd, we have to NULL 'first' on subflow close. ack_hint isn't strictly required (its never dereferenced), but better to clear this explicitly as well instead of making it an exception. msk->first is dereferenced unconditionally at accept time, but at that point the ssk is not on the conn_list yet -- this means worker can't see it when iterating the conn_list. Reported-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: dispose initial struct socket when its subflow is closedFlorian Westphal2021-03-041-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph Paasch reported following crash: dst_release underflow WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175 CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70 Call Trace: rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503 rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612 __mkroute_output net/ipv4/route.c:2484 [inline] ... The worker leaves msk->subflow alone even when it happened to close the subflow ssk associated with it. Fixes: 866f26f2a9c33b ("mptcp: always graft subflow socket to parent") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157 Reported-by: Christoph Paasch <cpaasch@apple.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: fix memory accounting on allocation errorPaolo Abeni2021-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of memory pressure the MPTCP xmit path keeps at most a single skb in the tx cache, eventually freeing additional ones. The associated counter for forward memory is not update accordingly, and that causes the following splat: WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0 RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003 RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7 R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100 R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85 FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0 Call Trace: __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272 process_one_work+0x896/0x1170 kernel/workqueue.c:2275 worker_thread+0x605/0x1350 kernel/workqueue.c:2421 kthread+0x344/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 At close time, as reported by syzkaller/Christoph. This change address the issue properly updating the fwd allocated memory counter in the error path. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136 Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: put subflow sock on connect errorFlorian Westphal2021-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mptcp_add_pending_subflow() performs a sock_hold() on the subflow, then adds the subflow to the join list. Without a sock_put the subflow sk won't be freed in case connect() fails. unreferenced object 0xffff88810c03b100 (size 3000): [..] sk_prot_alloc.isra.0+0x2f/0x110 sk_alloc+0x5d/0xc20 inet6_create+0x2b7/0xd30 __sock_create+0x17f/0x410 mptcp_subflow_create_socket+0xff/0x9c0 __mptcp_subflow_connect+0x1da/0xaf0 mptcp_pm_nl_work+0x6e0/0x1120 mptcp_worker+0x508/0x9a0 Fixes: 5b950ff4331ddda ("mptcp: link MPC subflow into msk only after accept") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mptcp: reset last_snd on subflow closeFlorian Westphal2021-03-041-0/+5
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | Send logic caches last active subflow in the msk, so it needs to be cleared when the cached subflow is closed. Fixes: d5f49190def61c ("mptcp: allow picking different xmit subflows") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155 Reported-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: avoid duplicates in classes dumpMaximilian Heyne2021-03-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up of commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") which has fixed the issue only for the qdisc dump. The duplicate printing also occurs when dumping the classes via tc class show dev eth0 Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: usb: qmi_wwan: allow qmimux add/del with master upDaniele Palmas2021-03-041-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason for preventing the creation and removal of qmimux network interfaces when the underlying interface is up. This makes qmi_wwan mux implementation more similar to the rmnet one, simplifying userspace management of the same logical interfaces. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Reported-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dsa: sja1105: fix ucast/bcast flooding always remaining enabledVladimir Oltean2021-03-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the blamed patch I managed to introduce a bug while moving code around: the same logic is applied to the ucast_egress_floods and bcast_egress_floods variables both on the "if" and the "else" branches. This is clearly an unintended change compared to how the code used to be prior to that bugfix, so restore it. Fixes: 7f7ccdea8c73 ("net: dsa: sja1105: fix leakage of flooded frames outside bridging domain") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of ↵Vladimir Oltean2021-03-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPEED_10 When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is read before resetting the switch so it can be reprogrammed afterwards. This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps, because SPEED_10 is actually 0, so AND-ing anything with 0 is false, therefore that last branch is dead code. Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just remove the check for SPEED_10, let it fall into the default case. Fixes: ffe10e679cec ("net: dsa: sja1105: Add support for the SGMII port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mscc: ocelot: properly reject destination IP keys in VCAP IS1Vladimir Oltean2021-03-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | An attempt is made to warn the user about the fact that VCAP IS1 cannot offload keys matching on destination IP (at least given the current half key format), but sadly that warning fails miserably in practice, due to the fact that it operates on an uninitialized "match" variable. We must first decode the keys from the flow rule. Fixes: 75944fda1dfe ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'nexthop-blackhole'David S. Miller2021-03-042-3/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ido Schimmel says: ==================== nexthop: Do not flush blackhole nexthops when loopback goes down Patch #1 prevents blackhole nexthops from being flushed when the loopback device goes down given that as far as user space is concerned, these nexthops do not have a nexthop device. Patch #2 adds a test case. There are no regressions in fib_nexthops.sh with this change: # ./fib_nexthops.sh ... Tests passed: 165 Tests failed: 0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | selftests: fib_nexthops: Test blackhole nexthops when loopback goes downIdo Schimmel2021-03-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test that blackhole nexthops are not flushed when the loopback device goes down. Output without previous patch: # ./fib_nexthops.sh -t basic Basic functional tests ---------------------- TEST: List with nothing defined [ OK ] TEST: Nexthop get on non-existent id [ OK ] TEST: Nexthop with no device or gateway [ OK ] TEST: Nexthop with down device [ OK ] TEST: Nexthop with device that is linkdown [ OK ] TEST: Nexthop with device only [ OK ] TEST: Nexthop with duplicate id [ OK ] TEST: Blackhole nexthop [ OK ] TEST: Blackhole nexthop with other attributes [ OK ] TEST: Blackhole nexthop with loopback device down [FAIL] TEST: Create group [ OK ] TEST: Create group with blackhole nexthop [FAIL] TEST: Create multipath group where 1 path is a blackhole [ OK ] TEST: Multipath group can not have a member replaced by blackhole [ OK ] TEST: Create group with non-existent nexthop [ OK ] TEST: Create group with same nexthop multiple times [ OK ] TEST: Replace nexthop with nexthop group [ OK ] TEST: Replace nexthop group with nexthop [ OK ] TEST: Nexthop group and device [ OK ] TEST: Test proto flush [ OK ] TEST: Nexthop group and blackhole [ OK ] Tests passed: 19 Tests failed: 2 Output with previous patch: # ./fib_nexthops.sh -t basic Basic functional tests ---------------------- TEST: List with nothing defined [ OK ] TEST: Nexthop get on non-existent id [ OK ] TEST: Nexthop with no device or gateway [ OK ] TEST: Nexthop with down device [ OK ] TEST: Nexthop with device that is linkdown [ OK ] TEST: Nexthop with device only [ OK ] TEST: Nexthop with duplicate id [ OK ] TEST: Blackhole nexthop [ OK ] TEST: Blackhole nexthop with other attributes [ OK ] TEST: Blackhole nexthop with loopback device down [ OK ] TEST: Create group [ OK ] TEST: Create group with blackhole nexthop [ OK ] TEST: Create multipath group where 1 path is a blackhole [ OK ] TEST: Multipath group can not have a member replaced by blackhole [ OK ] TEST: Create group with non-existent nexthop [ OK ] TEST: Create group with same nexthop multiple times [ OK ] TEST: Replace nexthop with nexthop group [ OK ] TEST: Replace nexthop group with nexthop [ OK ] TEST: Nexthop group and device [ OK ] TEST: Test proto flush [ OK ] TEST: Nexthop group and blackhole [ OK ] Tests passed: 21 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthop: Do not flush blackhole nexthops when loopback goes downIdo Schimmel2021-03-041-3/+7
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As far as user space is concerned, blackhole nexthops do not have a nexthop device and therefore should not be affected by the administrative or carrier state of any netdev. However, when the loopback netdev goes down all the blackhole nexthops are flushed. This happens because internally the kernel associates blackhole nexthops with the loopback netdev. This behavior is both confusing to those not familiar with kernel internals and also diverges from the legacy API where blackhole IPv4 routes are not flushed when the loopback netdev goes down: # ip route add blackhole 198.51.100.0/24 # ip link set dev lo down # ip route show 198.51.100.0/24 blackhole 198.51.100.0/24 Blackhole IPv6 routes are flushed, but at least user space knows that they are associated with the loopback netdev: # ip -6 route show 2001:db8:1::/64 blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium Fix this by only flushing blackhole nexthops when the loopback netdev is unregistered. Fixes: ab84be7e54fc ("net: Initial nexthop code") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Donald Sharp <sharpd@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sctp: trivial: fix typo in commentDrew Fustini2021-03-041-1/+1
| | | | | | | | | | | | | | | | Fix typo of 'overflow' for comment in sctp_tsnmap_check(). Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Drew Fustini <drew@beagleboard.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch '10GbE' of ↵David S. Miller2021-03-043-2/+14
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-03-03 This series contains updates to ixgbe and ixgbevf drivers. Bartosz Golaszewski does not error on -ENODEV from ixgbe_mii_bus_init() as this is valid for some devices with a shared bus for ixgbe. Antony Antony adds a check to fail for non transport mode SA with offload as this is not supported for ixgbe and ixgbevf. Dinghao Liu fixes a memory leak on failure to program a perfect filter for ixgbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ixgbe: Fix memleak in ixgbe_configure_clsu32Dinghao Liu2021-03-041-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ixgbe_fdir_write_perfect_filter_82599() fails, input allocated by kzalloc() has not been freed, which leads to memleak. Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ixgbe: fail to create xfrm offload of IPsec tunnel mode SAAntony Antony2021-03-042-0/+10
|/ / | | | | | | | | | | | | | | | | | | | | | | | | Based on talks and indirect references ixgbe IPsec offlod do not support IPsec tunnel mode offload. It can only support IPsec transport mode offload. Now explicitly fail when creating non transport mode SA with offload to avoid false performance expectations. Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA") Signed-off-by: Antony Antony <antony@phenome.org> Acked-by: Shannon Nelson <snelson@pensando.io> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | rtnetlink: using dev_base_seq from target netzhang kai2021-03-031-1/+1
| | | | | | | | | | Signed-off-by: zhang kai <zhangkaiheb@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: 9p: advance iov on empty readJisheng Zhang2021-03-031-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I met below warning when cating a small size(about 80bytes) txt file on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we miss iov_iter_advance() if the read count is 0 for zerocopy case, so we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is full. Fix it by removing the exception for 0 to ensure to call iov_iter_advance() even on empty read for zerocopy case. [ 8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40 [ 8.280028] Modules linked in: [ 8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6 [ 8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40 [ 8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b [ 8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246 [ 8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000 [ 8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90 [ 8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058 [ 8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050 [ 8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600 [ 8.285439] FS: 00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 8.285844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0 [ 8.286710] Call Trace: [ 8.288279] generic_file_splice_read+0x31/0x1a0 [ 8.289273] ? do_splice_to+0x2f/0x90 [ 8.289511] splice_direct_to_actor+0xcc/0x220 [ 8.289788] ? pipe_to_sendpage+0xa0/0xa0 [ 8.290052] do_splice_direct+0x8b/0xd0 [ 8.290314] do_sendfile+0x1ad/0x470 [ 8.290576] do_syscall_64+0x2d/0x40 [ 8.290818] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.291409] RIP: 0033:0x7f24fd7dca0a [ 8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8 [ 8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028 [ 8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a [ 8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 [ 8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003 [ 8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [ 8.295782] ---[ end trace 63317af81b3ca24b ]--- Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"Hayes Wang2021-03-031-29/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 134f98bcf1b898fb9d6f2b91bc85dd2e5478b4b8. The r8153_mac_clk_spd() is used for RTL8153A only, because the register table of RTL8153B is different from RTL8153A. However, this function would be called when RTL8153B calls r8153_first_init() and r8153_enter_oob(). That causes RTL8153B becomes unstable when suspending and resuming. The worst case may let the device stop working. Besides, revert this commit to disable MAC clock speed down for RTL8153A. It would avoid the known issue when enabling U1. The data of the first control transfer may be wrong when exiting U1. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: l2tp: reduce log level of messages in receive path, add counter insteadMatthias Schiffer2021-03-034-19/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5ee759cda51b ("l2tp: use standard API for warning log messages") changed a number of warnings about invalid packets in the receive path so that they are always shown, instead of only when a special L2TP debug flag is set. Even with rate limiting these warnings can easily cause significant log spam - potentially triggered by a malicious party sending invalid packets on purpose. In addition these warnings were noticed by projects like Tunneldigger [1], which uses L2TP for its data path, but implements its own control protocol (which is sufficiently different from L2TP data packets that it would always be passed up to userspace even with future extensions of L2TP). Some of the warnings were already redundant, as l2tp_stats has a counter for these packets. This commit adds one additional counter for invalid packets that are passed up to userspace. Packets with unknown session are not counted as invalid, as there is nothing wrong with the format of these packets. With the additional counter, all of these messages are either redundant or benign, so we reduce them to pr_debug_ratelimited(). [1] https://github.com/wlanslovenija/tunneldigger/issues/160 Fixes: 5ee759cda51b ("l2tp: use standard API for warning log messages") Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: macb: Add default usrio config to default gem configAtish Patra2021-03-031-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | There is no usrio config defined for default gem config leading to a kernel panic devices that don't define a data. This issue can be reprdouced with microchip polar fire soc where compatible string is defined as "cdns,macb". Fixes: edac63861db7 ("add userio bits as platform configuration") Signed-off-by: Atish Patra <atish.patra@wdc.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'wireless-drivers-2021-03-03' of ↵David S. Miller2021-03-033-3/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.12 Second set of fixes for v5.12. Only three iwlwifi fixes this time, the crash with MVM being the most important one and reported by multiple people. iwlwifi * fix kernel crash regression when using LTO with MVM devices * fix printk format warnings * fix potential deadlock found by lockdep ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | iwlwifi: don't call netif_napi_add() with rxq->lock held (was Re: Lockdep ↵Jiri Kosina2021-03-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | warning in iwl_pcie_rx_handle()) We can't call netif_napi_add() with rxq-lock held, as there is a potential for deadlock as spotted by lockdep (see below). rxq->lock is not protecting anything over the netif_napi_add() codepath anyway, so let's drop it just before calling into NAPI. ======================================================== WARNING: possible irq lock inversion dependency detected 5.12.0-rc1-00002-gbada49429032 #5 Not tainted -------------------------------------------------------- irq/136-iwlwifi/565 just changed the state of lock: ffff89f28433b0b0 (&rxq->lock){+.-.}-{2:2}, at: iwl_pcie_rx_handle+0x7f/0x960 [iwlwifi] but this lock took another, SOFTIRQ-unsafe lock in the past: (napi_hash_lock){+.+.}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(napi_hash_lock); local_irq_disable(); lock(&rxq->lock); lock(napi_hash_lock); <Interrupt> lock(&rxq->lock); *** DEADLOCK *** 1 lock held by irq/136-iwlwifi/565: #0: ffff89f2b1440170 (sync_cmd_lockdep_map){+.+.}-{0:0}, at: iwl_pcie_irq_handler+0x5/0xb30 the shortest dependencies between 2nd lock and 1st lock: -> (napi_hash_lock){+.+.}-{2:2} { HARDIRQ-ON-W at: lock_acquire+0x277/0x3d0 _raw_spin_lock+0x2c/0x40 netif_napi_add+0x14b/0x270 e1000_probe+0x2fe/0xee0 [e1000e] local_pci_probe+0x42/0x90 pci_device_probe+0x10b/0x1c0 really_probe+0xef/0x4b0 driver_probe_device+0xde/0x150 device_driver_attach+0x4f/0x60 __driver_attach+0x9c/0x140 bus_for_each_dev+0x79/0xc0 bus_add_driver+0x18d/0x220 driver_register+0x5b/0xf0 do_one_initcall+0x5b/0x300 do_init_module+0x5b/0x21c load_module+0x1dae/0x22c0 __do_sys_finit_module+0xad/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae SOFTIRQ-ON-W at: lock_acquire+0x277/0x3d0 _raw_spin_lock+0x2c/0x40 netif_napi_add+0x14b/0x270 e1000_probe+0x2fe/0xee0 [e1000e] local_pci_probe+0x42/0x90 pci_device_probe+0x10b/0x1c0 really_probe+0xef/0x4b0 driver_probe_device+0xde/0x150 device_driver_attach+0x4f/0x60 __driver_attach+0x9c/0x140 bus_for_each_dev+0x79/0xc0 bus_add_driver+0x18d/0x220 driver_register+0x5b/0xf0 do_one_initcall+0x5b/0x300 do_init_module+0x5b/0x21c load_module+0x1dae/0x22c0 __do_sys_finit_module+0xad/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae INITIAL USE at: lock_acquire+0x277/0x3d0 _raw_spin_lock+0x2c/0x40 netif_napi_add+0x14b/0x270 e1000_probe+0x2fe/0xee0 [e1000e] local_pci_probe+0x42/0x90 pci_device_probe+0x10b/0x1c0 really_probe+0xef/0x4b0 driver_probe_device+0xde/0x150 device_driver_attach+0x4f/0x60 __driver_attach+0x9c/0x140 bus_for_each_dev+0x79/0xc0 bus_add_driver+0x18d/0x220 driver_register+0x5b/0xf0 do_one_initcall+0x5b/0x300 do_init_module+0x5b/0x21c load_module+0x1dae/0x22c0 __do_sys_finit_module+0xad/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae } ... key at: [<ffffffffae84ef38>] napi_hash_lock+0x18/0x40 ... acquired at: _raw_spin_lock+0x2c/0x40 netif_napi_add+0x14b/0x270 _iwl_pcie_rx_init+0x1f4/0x710 [iwlwifi] iwl_pcie_rx_init+0x1b/0x3b0 [iwlwifi] iwl_trans_pcie_start_fw+0x2ac/0x6a0 [iwlwifi] iwl_mvm_load_ucode_wait_alive+0x116/0x460 [iwlmvm] iwl_run_init_mvm_ucode+0xa4/0x3a0 [iwlmvm] iwl_op_mode_mvm_start+0x9ed/0xbf0 [iwlmvm] _iwl_op_mode_start.isra.4+0x42/0x80 [iwlwifi] iwl_opmode_register+0x71/0xe0 [iwlwifi] iwl_mvm_init+0x34/0x1000 [iwlmvm] do_one_initcall+0x5b/0x300 do_init_module+0x5b/0x21c load_module+0x1dae/0x22c0 __do_sys_finit_module+0xad/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae [ ... lockdep output trimmed .... ] Fixes: 25edc8f259c7106 ("iwlwifi: pcie: properly implement NAPI") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2103021134060.12405@cbobk.fhfr.pm
| * | iwlwifi: fix ARCH=i386 compilation warningsPierre-Louis Bossart2021-03-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An unsigned long variable should rely on '%lu' format strings, not '%zd' Fixes: a1a6a4cf49ece ("iwlwifi: pnvm: implement reading PNVM from UEFI") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Acked-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210302011640.1276636-1-pierre-louis.bossart@linux.intel.com
| * | iwlwifi: mvm: add terminate entry for dmi_system_id tablesWei Yongjun2021-03-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure dmi_system_id tables are NULL terminated. This crashed when LTO was enabled: BUG: KASAN: global-out-of-bounds in dmi_check_system+0x5a/0x70 Read of size 1 at addr ffffffffc16af750 by task NetworkManager/1913 CPU: 4 PID: 1913 Comm: NetworkManager Not tainted 5.12.0-rc1+ #10057 Hardware name: LENOVO 20THCTO1WW/20THCTO1WW, BIOS N2VET27W (1.12 ) 12/21/2020 Call Trace: dump_stack+0x90/0xbe print_address_description.constprop.0+0x1d/0x140 ? dmi_check_system+0x5a/0x70 ? dmi_check_system+0x5a/0x70 kasan_report.cold+0x7b/0xd4 ? dmi_check_system+0x5a/0x70 __asan_load1+0x4d/0x50 dmi_check_system+0x5a/0x70 iwl_mvm_up+0x1360/0x1690 [iwlmvm] ? iwl_mvm_send_recovery_cmd+0x270/0x270 [iwlmvm] ? setup_object.isra.0+0x27/0xd0 ? kasan_poison+0x20/0x50 ? ___slab_alloc.constprop.0+0x483/0x5b0 ? mempool_kmalloc+0x17/0x20 ? ftrace_graph_ret_addr+0x2a/0xb0 ? kasan_poison+0x3c/0x50 ? cfg80211_iftype_allowed+0x2e/0x90 [cfg80211] ? __kasan_check_write+0x14/0x20 ? mutex_lock+0x86/0xe0 ? __mutex_lock_slowpath+0x20/0x20 __iwl_mvm_mac_start+0x49/0x290 [iwlmvm] iwl_mvm_mac_start+0x37/0x50 [iwlmvm] drv_start+0x73/0x1b0 [mac80211] ieee80211_do_open+0x53e/0xf10 [mac80211] ? ieee80211_check_concurrent_iface+0x266/0x2e0 [mac80211] ieee80211_open+0xb9/0x100 [mac80211] __dev_open+0x1b8/0x280 Fixes: a2ac0f48a07c ("iwlwifi: mvm: implement approved list for the PPAG feature") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Victor Michel <vic.michel.web@gmail.com> Acked-by: Luca Coelho <luciano.coelho@intel.com> [kvalo@codeaurora.org: improve commit log] Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210223140039.1708534-1-weiyongjun1@huawei.com
* | | docs: networking: drop special stable handlingJakub Kicinski2021-03-033-77/+6
| | | | | | | | | | | | | | | | | | | | | Leave it to Greg. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10Ong Boon Leong2021-03-031-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We introduce dwmac410_dma_init_channel() here for both EQoS v4.10 and above which use different DMA_CH(n)_Interrupt_Enable bit definitions for NIE and AIE. Fixes: 48863ce5940f ("stmmac: add DMA support for GMAC 4.xx") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Ramesh Babu B <ramesh.babu.b@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.Michal Suchanek2021-03-031-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 7.5 reports: ../drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_reset_init': ../drivers/net/ethernet/ibm/ibmvnic.c:5373:51: warning: 'old_num_tx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized] ../drivers/net/ethernet/ibm/ibmvnic.c:5373:6: warning: 'old_num_rx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized] The variable is initialized only if(reset) and used only if(reset && something) so this is a false positive. However, there is no reason to not initialize the variables unconditionally avoiding the warning. Fixes: 635e442f4a48 ("ibmvnic: merge ibmvnic_reset_init and ibmvnic_init") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | octeontx2-af: cn10k: fix an array overflow in is_lmac_valid()Dan Carpenter2021-03-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of "lmac_id" can be controlled by the user and if it is larger then the number of bits in long then it reads outside the bitmap. The highest valid value is less than MAX_LMAC_PER_CGX (4). Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: ethernet: mtk-star-emac: fix wrong unmap in RX handlingBiao Huang2021-03-021-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtk_star_dma_unmap_rx() should unmap the dma_addr of old skb rather than that of new skb. Assign new_dma_addr to desc_data.dma_addr after all handling of old skb ends to avoid unexpected receive side error. Fixes: f96e9641e92b ("net: ethernet: mtk-star-emac: fix error path in RX handling") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | stmmac: intel: Fix mdio bus registration issue for TGL-H/ADL-SWong Vee Khee2021-03-021-13/+41
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Intel platforms which consist of two Ethernet Controllers such as TGL-H and ADL-S, a unique MDIO bus id is required for MDIO bus to be successful registered: [ 13.076133] sysfs: cannot create duplicate filename '/class/mdio_bus/stmmac-1' [ 13.083404] CPU: 8 PID: 1898 Comm: systemd-udevd Tainted: G U 5.11.0-net-next #106 [ 13.092410] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020 [ 13.105709] Call Trace: [ 13.108176] dump_stack+0x64/0x7c [ 13.111553] sysfs_warn_dup+0x56/0x70 [ 13.115273] sysfs_do_create_link_sd.isra.2+0xbd/0xd0 [ 13.120371] device_add+0x4df/0x840 [ 13.123917] ? complete_all+0x2a/0x40 [ 13.127636] __mdiobus_register+0x98/0x310 [libphy] [ 13.132572] stmmac_mdio_register+0x1c5/0x3f0 [stmmac] [ 13.137771] ? stmmac_napi_add+0xa5/0xf0 [stmmac] [ 13.142493] stmmac_dvr_probe+0x806/0xee0 [stmmac] [ 13.147341] intel_eth_pci_probe+0x1cb/0x250 [dwmac_intel] [ 13.152884] pci_device_probe+0xd2/0x150 [ 13.156897] really_probe+0xf7/0x4d0 [ 13.160527] driver_probe_device+0x5d/0x140 [ 13.164761] device_driver_attach+0x4f/0x60 [ 13.168996] __driver_attach+0xa2/0x140 [ 13.172891] ? device_driver_attach+0x60/0x60 [ 13.177300] bus_for_each_dev+0x76/0xc0 [ 13.181188] bus_add_driver+0x189/0x230 [ 13.185083] ? 0xffffffffc0795000 [ 13.188446] driver_register+0x5b/0xf0 [ 13.192249] ? 0xffffffffc0795000 [ 13.195577] do_one_initcall+0x4d/0x210 [ 13.199467] ? kmem_cache_alloc_trace+0x2ff/0x490 [ 13.204228] do_init_module+0x5b/0x21c [ 13.208031] load_module+0x2a0c/0x2de0 [ 13.211838] ? __do_sys_finit_module+0xb1/0x110 [ 13.216420] __do_sys_finit_module+0xb1/0x110 [ 13.220825] do_syscall_64+0x33/0x40 [ 13.224451] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 13.229515] RIP: 0033:0x7fc2b1919ccd [ 13.233113] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48 [ 13.251912] RSP: 002b:00007ffcea2e5b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 13.259527] RAX: ffffffffffffffda RBX: 0000560558920f10 RCX: 00007fc2b1919ccd [ 13.266706] RDX: 0000000000000000 RSI: 00007fc2b1a881e3 RDI: 0000000000000012 [ 13.273887] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 [ 13.281036] R10: 0000000000000012 R11: 0000000000000246 R12: 00007fc2b1a881e3 [ 13.288183] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcea2e5d58 [ 13.295389] libphy: mii_bus stmmac-1 failed to register Fixes: 88af9bd4efbd ("stmmac: intel: Add ADL-S 1Gbps PCI IDs") Fixes: 8450e23f142f ("stmmac: intel: Add PCI IDs for TGL-H platform") Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: add sanity tests to TCP_QUEUE_SEQEric Dumazet2021-03-011-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qingyu Li reported a syzkaller bug where the repro changes RCV SEQ _after_ restoring data in the receive queue. mprotect(0x4aa000, 12288, PROT_READ) = 0 mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000 mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000 mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0 setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0 setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0 recvfrom(3, NULL, 20, 0, NULL, NULL) = -1 ECONNRESET (Connection reset by peer) syslog shows: [ 111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0 [ 111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0 This should not be allowed. TCP_QUEUE_SEQ should only be used when queues are empty. This patch fixes this case, and the tx path as well. Fixes: ee9952831cfd ("tcp: Initial repair mode") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005 Reported-by: Qingyu Li <ieatmuttonchuan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>