summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | Convert filldir[64]() from __put_user() to unsafe_put_user()Linus Torvalds2019-10-051-35/+93
| | |_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We really should avoid the "__{get,put}_user()" functions entirely, because they can easily be mis-used and the original intent of being used for simple direct user accesses no longer holds in a post-SMAP/PAN world. Manually optimizing away the user access range check makes no sense any more, when the range check is generally much cheaper than the "enable user accesses" code that the __{get,put}_user() functions still need. So instead of __put_user(), use the unsafe_put_user() interface with user_access_{begin,end}() that really does generate better code these days, and which is generally a nicer interface. Under some loads, the multiple user writes that filldir() does are actually quite noticeable. This also makes the dirent name copy use unsafe_put_user() with a couple of macros. We do not want to make function calls with SMAP/PAN disabled, and the code this generates is quite good when the architecture uses "asm goto" for unsafe_put_user() like x86 does. Note that this doesn't bother with the legacy cases. Nobody should use them anyway, so performance doesn't really matter there. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2019-10-0599-281/+539
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix ieeeu02154 atusb driver use-after-free, from Johan Hovold. 2) Need to validate TCA_CBQ_WRROPT netlink attributes, from Eric Dumazet. 3) txq null deref in mac80211, from Miaoqing Pan. 4) ionic driver needs to select NET_DEVLINK, from Arnd Bergmann. 5) Need to disable bh during nft_connlimit GC, from Pablo Neira Ayuso. 6) Avoid division by zero in taprio scheduler, from Vladimir Oltean. 7) Various xgmac fixes in stmmac driver from Jose Abreu. 8) Avoid 64-bit division in mlx5 leading to link errors on 32-bit from Michal Kubecek. 9) Fix bad VLAN check in rtl8366 DSA driver, from Linus Walleij. 10) Fix sleep while atomic in sja1105, from Vladimir Oltean. 11) Suspend/resume deadlock in stmmac, from Thierry Reding. 12) Various UDP GSO fixes from Josh Hunt. 13) Fix slab out of bounds access in tcp_zerocopy_receive(), from Eric Dumazet. 14) Fix OOPS in __ipv6_ifa_notify(), from David Ahern. 15) Memory leak in NFC's llcp_sock_bind, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) selftests/net: add nettest to .gitignore net: qlogic: Fix memory leak in ql_alloc_large_buffers nfc: fix memory leak in llcp_sock_bind() sch_dsmark: fix potential NULL deref in dsmark_init() net: phy: at803x: use operating parameters from PHY-specific status net: phy: extract pause mode net: phy: extract link partner advertisement reading net: phy: fix write to mii-ctrl1000 register ipv6: Handle missing host route in __ipv6_ifa_notify net: phy: allow for reset line to be tied to a sleepy GPIO controller net: ipv4: avoid mixed n_redirects and rate_tokens usage r8152: Set macpassthru in reset_resume callback cxgb4:Fix out-of-bounds MSI-X info array access Revert "ipv6: Handle race in addrconf_dad_work" net: make sock_prot_memory_pressure() return "const char *" rxrpc: Fix rxrpc_recvmsg tracepoint qmi_wwan: add support for Cinterion CLS8 devices tcp: fix slab-out-of-bounds in tcp_zerocopy_receive() lib: textsearch: fix escapes in example code udp: only do GSO if # of segs > 1 ...
| * | | | | | | selftests/net: add nettest to .gitignoreJakub Kicinski2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nettest is missing from gitignore. Fixes: acda655fefae ("selftests: Add nettest") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: qlogic: Fix memory leak in ql_alloc_large_buffersNavid Emamdoost2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb. This skb should be released if pci_dma_mapping_error fails. Fixes: 0f8ab89e825f ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | nfc: fix memory leak in llcp_sock_bind()Eric Dumazet2019-10-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysbot reported a memory leak after a bind() has failed. While we are at it, abort the operation if kmemdup() has failed. BUG: memory leak unreferenced object 0xffff888105d83ec0 (size 32): comm "syz-executor067", pid 7207, jiffies 4294956228 (age 19.430s) hex dump (first 32 bytes): 00 69 6c 65 20 72 65 61 64 00 6e 65 74 3a 5b 34 .ile read.net:[4 30 32 36 35 33 33 30 39 37 5d 00 00 00 00 00 00 026533097]...... backtrace: [<0000000036bac473>] kmemleak_alloc_recursive /./include/linux/kmemleak.h:43 [inline] [<0000000036bac473>] slab_post_alloc_hook /mm/slab.h:522 [inline] [<0000000036bac473>] slab_alloc /mm/slab.c:3319 [inline] [<0000000036bac473>] __do_kmalloc /mm/slab.c:3653 [inline] [<0000000036bac473>] __kmalloc_track_caller+0x169/0x2d0 /mm/slab.c:3670 [<000000000cd39d07>] kmemdup+0x27/0x60 /mm/util.c:120 [<000000008e57e5fc>] kmemdup /./include/linux/string.h:432 [inline] [<000000008e57e5fc>] llcp_sock_bind+0x1b3/0x230 /net/nfc/llcp_sock.c:107 [<000000009cb0b5d3>] __sys_bind+0x11c/0x140 /net/socket.c:1647 [<00000000492c3bbc>] __do_sys_bind /net/socket.c:1658 [inline] [<00000000492c3bbc>] __se_sys_bind /net/socket.c:1656 [inline] [<00000000492c3bbc>] __x64_sys_bind+0x1e/0x30 /net/socket.c:1656 [<0000000008704b2a>] do_syscall_64+0x76/0x1a0 /arch/x86/entry/common.c:296 [<000000009f4c57a4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 30cc4587659e ("NFC: Move LLCP code to the NFC top level diirectory") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | sch_dsmark: fix potential NULL deref in dsmark_init()Eric Dumazet2019-10-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure TCA_DSMARK_INDICES was provided by the user. syzbot reported : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline] RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline] RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339 Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09 RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004 RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159 R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940 R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000 FS: 0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237 tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653 rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x803/0x920 net/socket.c:2311 __sys_sendmsg+0x105/0x1d0 net/socket.c:2356 __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg net/socket.c:2363 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440369 Fixes: 758cc43c6d73 ("[PKT_SCHED]: Fix dsmark to apply changes consistent") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | Merge branch 'Fix-regression-with-AR8035-speed-downgrade'David S. Miller2019-10-046-32/+138
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Russell King says: ==================== Fix regression with AR8035 speed downgrade The following series attempts to address an issue spotted by tinywrkb with the AR8035 on the Cubox-i2 in a situation where the PHY downgrades the negotiated link. This is version 2, not much has changed other than rebasing on the current net tree. Changes have happend to patch 2 due to conflicts, so I dropped Andrew's reviewed-by. Minor context changes to patch 4 which I don't consider important enough to warrant dropping the reviewed-by. Before commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status"), we would read not only the link partner's advertisement, but also our own advertisement from the PHY registers, and use both to derive the PHYs current link mode. This works when the AR8035 downgrades the speed, because it appears that the AR8035 clears link mode bits in the advertisement registers as part of the downgrade. Commentary: what is not yet known is whether the AR8035 restores the advertisement register when the link goes down to the previous state. However, since the above referenced commit, we no longer use the PHYs advertisement registers, instead converting the link partner's advertisement to the ethtool link mode array, and combine that with phylib's cached version of our advertisement - which is not updated on speed downgrade. This results in phylib disagreeing with the actual operating mode of the PHY. Commentary: I wonder how many more PHY drivers are broken by this commit, but have yet to be discovered. The obvious way to address this would be to disable the downgrade feature, and indeed this does fix the problem in tinywrkb's case - his link partner instead downgrades the speed by reducing its advertisement, resulting in phylib correctly evaluating a slower speed. However, it has a serious drawback - the gigabit control register (MII register 9) appears to become read only. It seems the only way to update the register is to re-enable the downgrade feature, reset the PHY, changing register 9, disable the downgrade feature, and reset the PHY again. This series attempts to address the problem using a different approach, similar to the approach taken with Marvell PHYs. The AR8031, AR8033 and AR8035 have a PHY-Specific Status register which reports the actual operating mode of the PHY - both speed and duplex. This register correctly reports the operating mode irrespective of whether autoneg is enabled or not. We use this register to fill in phylib's speed and duplex parameters. In detail: Patch 1 fixes a bug where writing to register 9 does not update phylib's advertisement mask in the same way that writing register 4 does; this looks like an omission from when gigabit PHY support came into being. Patch 2 seperates the generic phylib code which reads the link partners advertisement from the PHY, so that we can re-use this in the Atheros PHY driver. Patch 3 seperates the generic phylib pause mode; phylib provides no help for MAC drivers to ascertain the negotiated pause mode, it merely copies the link partner's pause mode bits into its own variables. Commentary: Both the aforementioned Atheros PHYs and Marvell PHYs provide the resolved pause modes in terms of whether we should transmit pause frames, or whether we should allow reception of pause frames. Surely the resolution of this should be in phylib? Patch 4 provides the Atheros PHY driver with a private "read_status" implementation that fills in phylib's speed and duplex settings depending on the PHY-Specific status register. This ensures that phylib and the MAC driver match the operating mode that the PHY has decided to use. Since the register also gives us MDIX status, we can trivially fill that status in as well. Note that, although the bits mentioned in this patch for this register match those in th Marvell PHY driver, and it is located at the same address, the meaning of other register bits varies between the PHYs. Therefore, I do not feel that it would be appropriate to make this some kind of generic function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net: phy: at803x: use operating parameters from PHY-specific statusRussell King2019-10-041-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Read the PHY-specific status register for the current operating mode (speed and duplex) of the PHY. This register reflects the actual mode that the PHY has resolved depending on either the advertisements of autoneg is enabled, or the forced mode if autoneg is disabled. This ensures that phylib's software state always tracks the hardware state. It seems both AR8033 (which uses the AR8031 ID) and AR8035 support this status register. AR8030 is not known at the present time. This patch depends on "net: phy: extract pause mode" and "net: phy: extract link partner advertisement reading". Reported-by: tinywrkb <tinywrkb@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: tinywrkb <tinywrkb@gmail.com> Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net: phy: extract pause modeRussell King2019-10-042-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the update of phylib's software pause mode state from genphy_read_status(), so that we can re-use this functionality with PHYs that have alternative ways to read the negotiation results. Tested-by: tinywrkb <tinywrkb@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net: phy: extract link partner advertisement readingRussell King2019-10-042-25/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move reading the link partner advertisement out of genphy_read_status() into its own separate function. This will allow re-use of this code by PHY drivers that are able to read the resolved status from the PHY. Tested-by: tinywrkb <tinywrkb@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net: phy: fix write to mii-ctrl1000 registerRussell King2019-10-042-0/+14
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When userspace writes to the MII_ADVERTISE register, we update phylib's advertising mask and trigger a renegotiation. However, writing to the MII_CTRL1000 register, which contains the gigabit advertisement, does neither. This can lead to phylib's copy of the advertisement becoming de-synced with the values in the PHY register set, which can result in incorrect negotiation resolution. Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv6: Handle missing host route in __ipv6_ifa_notifyDavid Ahern2019-10-041-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rajendra reported a kernel panic when a link was taken down: [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 <snip> [ 6870.570501] Call Trace: [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 addrconf_dad_work is kicked to be scheduled when a device is brought up. There is a race between addrcond_dad_work getting scheduled and taking the rtnl lock and a process taking the link down (under rtnl). The latter removes the host route from the inet6_addr as part of addrconf_ifdown which is run for NETDEV_DOWN. The former attempts to use the host route in __ipv6_ifa_notify. If the down event removes the host route due to the race to the rtnl, then the BUG listed above occurs. Since the DAD sequence can not be aborted, add a check for the missing host route in __ipv6_ifa_notify. The only way this should happen is due to the previously mentioned race. The host route is created when the address is added to an interface; it is only removed on a down event where the address is kept. Add a warning if the host route is missing AND the device is up; this is a situation that should never happen. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: phy: allow for reset line to be tied to a sleepy GPIO controllerAndrea Merello2019-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mdio_device_reset() makes use of the atomic-pretending API flavor for handling the PHY reset GPIO line. I found no hint that mdio_device_reset() is called from atomic context and indeed it uses usleep_range() since long time, so I would assume that it is OK to sleep there. This patch switch to gpiod_set_value_cansleep() in mdio_device_reset(). This is relevant if e.g. the PHY reset line is tied to a I2C GPIO controller. This has been tested on a ZynqMP board running an upstream 4.19 kernel and then hand-ported on current kernel tree. Signed-off-by: Andrea Merello <andrea.merello@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: ipv4: avoid mixed n_redirects and rate_tokens usagePaolo Abeni2019-10-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit c09551c6ff7f ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets") we use 'n_redirects' to account for redirect packets, but we still use 'rate_tokens' to compute the redirect packets exponential backoff. If the device sent to the relevant peer any ICMP error packet after sending a redirect, it will also update 'rate_token' according to the leaking bucket schema; typically 'rate_token' will raise above BITS_PER_LONG and the redirect packets backoff algorithm will produce undefined behavior. Fix the issue using 'n_redirects' to compute the exponential backoff in ip_rt_send_redirect(). Note that we still clear rate_tokens after a redirect silence period, to avoid changing an established behaviour. The root cause predates git history; before the mentioned commit in the critical scenario, the kernel stopped sending redirects, after the mentioned commit the behavior more randomic. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: c09551c6ff7f ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | r8152: Set macpassthru in reset_resume callbackKai-Heng Feng2019-10-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r8152 may fail to establish network connection after resume from system suspend. If the USB port connects to r8152 lost its power during system suspend, the MAC address was written before is lost. The reason is that The MAC address doesn't get written again in its reset_resume callback. So let's set MAC address again in reset_resume callback. Also remove unnecessary lock as no other locking attempt will happen during reset_resume. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | cxgb4:Fix out-of-bounds MSI-X info array accessVishal Kulkarni2019-10-041-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When fetching free MSI-X vectors for ULDs, check for the error code before accessing MSI-X info array. Otherwise, an out-of-bounds access is attempted, which results in kernel panic. Fixes: 94cdb8bb993a ("cxgb4: Add support for dynamic allocation of resources for ULD") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | Revert "ipv6: Handle race in addrconf_dad_work"David Ahern2019-10-041-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a3ce2a21bb8969ae27917281244fa91bf5f286d7. Eric reported tests failings with commit. After digging into it, the bottom line is that the DAD sequence is not to be messed with. There are too many cases that are expected to proceed regardless of whether a device is up. Revert the patch and I will send a different solution for the problem Rajendra reported. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: make sock_prot_memory_pressure() return "const char *"Alexey Dobriyan2019-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function returns string literals which are "const char *". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | rxrpc: Fix rxrpc_recvmsg tracepointDavid Howells2019-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call parameter. Fixes: a25e21f0bcd2 ("rxrpc, afs: Use debug_ids rather than pointers in traces") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | qmi_wwan: add support for Cinterion CLS8 devicesReinhard Speyerer2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for Cinterion CLS8 devices. Use QMI_QUIRK_SET_DTR as required for Qualcomm MDM9x07 chipsets. T: Bus=01 Lev=03 Prnt=05 Port=01 Cnt=02 Dev#= 25 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1e2d ProdID=00b0 Rev= 3.18 S: Manufacturer=GEMALTO S: Product=USB Modem C:* #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | tcp: fix slab-out-of-bounds in tcp_zerocopy_receive()Eric Dumazet2019-10-031-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently a refactoring patch brought a bug, that was caught by syzbot [1] Original code was correct, do not try to be smarter than the compiler :/ [1] BUG: KASAN: slab-out-of-bounds in tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline] BUG: KASAN: slab-out-of-bounds in do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654 Read of size 4 at addr ffff8880943cf188 by task syz-executor.2/17508 CPU: 0 PID: 17508 Comm: syz-executor.2 Not tainted 5.3.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351 __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482 kasan_report+0x12/0x17 mm/kasan/common.c:618 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131 tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline] do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654 tcp_getsockopt+0xbf/0xe0 net/ipv4/tcp.c:3680 sock_common_getsockopt+0x94/0xd0 net/core/sock.c:3098 __sys_getsockopt+0x16d/0x310 net/socket.c:2129 __do_sys_getsockopt net/socket.c:2144 [inline] __se_sys_getsockopt net/socket.c:2141 [inline] __x64_sys_getsockopt+0xbe/0x150 net/socket.c:2141 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296 Fixes: d8e18a516f8f ("net: Use skb accessors in network core") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | lib: textsearch: fix escapes in example codeRandy Dunlap2019-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This textsearch code example does not need the '\' escapes and they can be misleading to someone reading the example. Also, gcc and sparse warn that the "\%d" is an unknown escape sequence. Fixes: 5968a70d7af5 ("textsearch: fix kernel-doc warnings and add kernel-api section") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | udp: only do GSO if # of segs > 1Josh Hunt2019-10-033-20/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change an application sending <= 1MSS worth of data and enabling UDP GSO would fail if the system had SW GSO enabled, but the same send would succeed if HW GSO offload is enabled. In addition to this inconsistency the error in the SW GSO case does not get back to the application if sending out of a real device so the user is unaware of this failure. With this change we only perform GSO if the # of segments is > 1 even if the application has enabled segmentation. I've also updated the relevant udpgso selftests. Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT") Signed-off-by: Josh Hunt <johunt@akamai.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | udp: fix gso_segs calculationsJosh Hunt2019-10-032-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit dfec0ee22c0a ("udp: Record gso_segs when supporting UDP segmentation offload") added gso_segs calculation, but incorrectly got sizeof() the pointer and not the underlying data type. In addition let's fix the v6 case. Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT") Fixes: dfec0ee22c0a ("udp: Record gso_segs when supporting UDP segmentation offload") Signed-off-by: Josh Hunt <johunt@akamai.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv6: drop incoming packets having a v4mapped source addressEric Dumazet2019-10-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This began with a syzbot report. syzkaller was injecting IPv6 TCP SYN packets having a v4mapped source address. After an unsuccessful 4-tuple lookup, TCP creates a request socket (SYN_RECV) and calls reqsk_queue_hash_req() reqsk_queue_hash_req() calls sk_ehashfn(sk) At this point we have AF_INET6 sockets, and the heuristic used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr) For the particular spoofed packet, we end up hashing V4 addresses which were not initialized by the TCP IPv6 stack, so KMSAN fired a warning. I first fixed sk_ehashfn() to test both source and destination addresses, but then faced various problems, including user-space programs like packetdrill that had similar assumptions. Instead of trying to fix the whole ecosystem, it is better to admit that we have a dual stack behavior, and that we can not build linux kernels without V4 stack anyway. The dual stack API automatically forces the traffic to be IPv4 if v4mapped addresses are used at bind() or connect(), so it makes no sense to allow IPv6 traffic to use the same v4mapped class. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: stmmac: Avoid deadlock on suspend/resumeThierry Reding2019-10-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stmmac driver will try to acquire its private mutex during suspend via phylink_resolve() -> stmmac_mac_link_down() -> stmmac_eee_init(). However, the phylink configuration is updated with the private mutex held already, which causes a deadlock during suspend. Fix this by moving the phylink configuration updates out of the region of code protected by the private mutex. Fixes: 19e13cb27b99 ("net: stmmac: Hold rtnl lock in suspend/resume callbacks") Suggested-by: Bitan Biswas <bbiswas@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: hisilicon: Fix usage of uninitialized variable in function ↵Yizhuo2019-10-021-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mdio_sc_cfg_reg_write() In function mdio_sc_cfg_reg_write(), variable "reg_value" could be uninitialized if regmap_read() fails. However, "reg_value" is used to decide the control flow later in the if statement, which is potentially unsafe. Signed-off-by: Yizhuo <yzhai003@ucr.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2019-10-0232-46/+46
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Remove the skb_ext_del from nf_reset, and renames it to a more fitting nf_reset_ct(). Patch from Florian Westphal. 2) Fix deadlock in nft_connlimit between packet path updates and the garbage collector. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | netfilter: nft_connlimit: disable bh on garbage collectionPablo Neira Ayuso2019-10-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BH must be disabled when invoking nf_conncount_gc_list() to perform garbage collection, otherwise deadlock might happen. nf_conncount_add+0x1f/0x50 [nf_conncount] nft_connlimit_eval+0x4c/0xe0 [nft_connlimit] nft_dynset_eval+0xb5/0x100 [nf_tables] nft_do_chain+0xea/0x420 [nf_tables] ? sch_direct_xmit+0x111/0x360 ? noqueue_init+0x10/0x10 ? __qdisc_run+0x84/0x510 ? tcp_packet+0x655/0x1610 [nf_conntrack] ? ip_finish_output2+0x1a7/0x430 ? tcp_error+0x130/0x150 [nf_conntrack] ? nf_conntrack_in+0x1fc/0x4c0 [nf_conntrack] nft_do_chain_ipv4+0x66/0x80 [nf_tables] nf_hook_slow+0x44/0xc0 ip_rcv+0xb5/0xd0 ? ip_rcv_finish_core.isra.19+0x360/0x360 __netif_receive_skb_one_core+0x52/0x70 netif_receive_skb_internal+0x34/0xe0 napi_gro_receive+0xba/0xe0 e1000_clean_rx_irq+0x1e9/0x420 [e1000e] e1000e_poll+0xbe/0x290 [e1000e] net_rx_action+0x149/0x3b0 __do_softirq+0xde/0x2d8 irq_exit+0xba/0xc0 do_IRQ+0x85/0xd0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:nf_conncount_gc_list+0x3b/0x130 [nf_conncount] Fixes: 2f971a8f4255 ("netfilter: nf_conncount: move all list iterations under spinlock") Reported-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | netfilter: drop bridge nf reset from nf_resetFlorian Westphal2019-10-0131-45/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 174e23810cd31 ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi recycle always drop skb extensions. The additional skb_ext_del() that is performed via nf_reset on napi skb recycle is not needed anymore. Most nf_reset() calls in the stack are there so queued skb won't block 'rmmod nf_conntrack' indefinitely. This removes the skb_ext_del from nf_reset, and renames it to a more fitting nf_reset_ct(). In a few selected places, add a call to skb_ext_reset to make sure that no active extensions remain. I am submitting this for "net", because we're still early in the release cycle. The patch applies to net-next too, but I think the rename causes needless divergence between those trees. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | | | ptp_qoriq: Initialize the registers' spinlock before calling ptp_qoriq_settimeVladimir Oltean2019-10-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because ptp_qoriq_settime is being called prior to spin_lock_init, the following stack trace can be seen at driver probe time: [ 2.269117] the code is fine but needs lockdep annotation. [ 2.274569] turning off the locking correctness validator. [ 2.280027] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc7-01478-g01eaa67a4797 #263 [ 2.288073] Hardware name: Freescale LS1021A [ 2.292337] [<c0313cb4>] (unwind_backtrace) from [<c030e11c>] (show_stack+0x10/0x14) [ 2.300045] [<c030e11c>] (show_stack) from [<c1219440>] (dump_stack+0xcc/0xf8) [ 2.307235] [<c1219440>] (dump_stack) from [<c03b9b44>] (register_lock_class+0x730/0x73c) [ 2.315372] [<c03b9b44>] (register_lock_class) from [<c03b6190>] (__lock_acquire+0x78/0x270c) [ 2.323856] [<c03b6190>] (__lock_acquire) from [<c03b90cc>] (lock_acquire+0xe0/0x22c) [ 2.331649] [<c03b90cc>] (lock_acquire) from [<c123c310>] (_raw_spin_lock_irqsave+0x54/0x68) [ 2.340048] [<c123c310>] (_raw_spin_lock_irqsave) from [<c0e73fe4>] (ptp_qoriq_settime+0x38/0x80) [ 2.348878] [<c0e73fe4>] (ptp_qoriq_settime) from [<c0e746d4>] (ptp_qoriq_init+0x1f8/0x484) [ 2.357189] [<c0e746d4>] (ptp_qoriq_init) from [<c0e74aac>] (ptp_qoriq_probe+0xd0/0x184) [ 2.365243] [<c0e74aac>] (ptp_qoriq_probe) from [<c0b0a07c>] (platform_drv_probe+0x48/0x9c) [ 2.373555] [<c0b0a07c>] (platform_drv_probe) from [<c0b07a14>] (really_probe+0x1c4/0x400) [ 2.381779] [<c0b07a14>] (really_probe) from [<c0b07e28>] (driver_probe_device+0x78/0x1b8) [ 2.390003] [<c0b07e28>] (driver_probe_device) from [<c0b081d0>] (device_driver_attach+0x58/0x60) [ 2.398832] [<c0b081d0>] (device_driver_attach) from [<c0b082d4>] (__driver_attach+0xfc/0x160) [ 2.407402] [<c0b082d4>] (__driver_attach) from [<c0b05a84>] (bus_for_each_dev+0x68/0xb4) [ 2.415539] [<c0b05a84>] (bus_for_each_dev) from [<c0b06b68>] (bus_add_driver+0x104/0x20c) [ 2.423763] [<c0b06b68>] (bus_add_driver) from [<c0b0909c>] (driver_register+0x78/0x10c) [ 2.431815] [<c0b0909c>] (driver_register) from [<c030313c>] (do_one_initcall+0x8c/0x3ac) [ 2.439954] [<c030313c>] (do_one_initcall) from [<c1f013f4>] (kernel_init_freeable+0x468/0x548) [ 2.448610] [<c1f013f4>] (kernel_init_freeable) from [<c12344d8>] (kernel_init+0x8/0x10c) [ 2.456745] [<c12344d8>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20) [ 2.464273] Exception stack(0xea89ffb0 to 0xea89fff8) [ 2.469297] ffa0: 00000000 00000000 00000000 00000000 [ 2.477432] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.485566] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 Fixes: ff54571a747b ("ptp_qoriq: convert to use ptp_qoriq_init/free") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Merge branch 'SJA1105-DSA-locking-fixes-for-PTP'David S. Miller2019-10-023-10/+26
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vladimir Oltean says: ==================== SJA1105 DSA locking fixes for PTP This series fixes the locking API usage problems spotted when compiling the kernel with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_DEBUG_SPINLOCK=y. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: dsa: sja1105: Fix sleeping while atomic in .port_hwtstamp_setVladimir Oltean2019-10-023-10/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y: [ 41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909 [ 41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l [ 41.583212] INFO: lockdep is turned off. [ 41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827 [ 41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14) [ 41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100) [ 41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4) [ 41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8) [ 41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24) [ 41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c) [ 41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc) [ 41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330) [ 41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8) [ 41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8) [ 41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10) [ 41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58) [ 41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28) [ 41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0) [ 41.707796] 5fa0: beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80 [ 41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000 [ 41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c [ 41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002 [ 41.735682] INFO: lockdep is turned off. Enabling RX timestamping will logically disturb the fastpath (processing of meta frames). Replace bool hwts_rx_en with a bit that is checked atomically from the fastpath and temporarily unset from the sleepable context during a change of the RX timestamping process (a destructive operation anyways, requires switch reset). If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any received meta frame and not take the meta_lock at all. Fixes: a602afd200f5 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: dsa: sja1105: Initialize the meta_lockVladimir Oltean2019-10-021-0/+1
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, with CONFIG_DEBUG_SPINLOCK=y, this stack trace gets printed when enabling RX timestamping and receiving a PTP frame: [ 318.537078] INFO: trying to register non-static key. [ 318.542040] the code is fine but needs lockdep annotation. [ 318.547500] turning off the locking correctness validator. [ 318.552972] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-13257-g0825b0669811-dirty #1962 [ 318.561283] Hardware name: Freescale LS1021A [ 318.565566] [<c03144bc>] (unwind_backtrace) from [<c030e164>] (show_stack+0x10/0x14) [ 318.573289] [<c030e164>] (show_stack) from [<c11b9f50>] (dump_stack+0xd4/0x100) [ 318.580579] [<c11b9f50>] (dump_stack) from [<c03b9b40>] (register_lock_class+0x728/0x734) [ 318.588731] [<c03b9b40>] (register_lock_class) from [<c03b60c4>] (__lock_acquire+0x78/0x25cc) [ 318.597227] [<c03b60c4>] (__lock_acquire) from [<c03b8ef8>] (lock_acquire+0xd8/0x234) [ 318.605033] [<c03b8ef8>] (lock_acquire) from [<c11db934>] (_raw_spin_lock+0x44/0x54) [ 318.612755] [<c11db934>] (_raw_spin_lock) from [<c1164370>] (sja1105_rcv+0x1f8/0x4e8) [ 318.620561] [<c1164370>] (sja1105_rcv) from [<c115d7cc>] (dsa_switch_rcv+0x80/0x204) [ 318.628283] [<c115d7cc>] (dsa_switch_rcv) from [<c0f58c80>] (__netif_receive_skb_one_core+0x50/0x6c) [ 318.637386] [<c0f58c80>] (__netif_receive_skb_one_core) from [<c0f58f04>] (netif_receive_skb_internal+0xac/0x264) [ 318.647611] [<c0f58f04>] (netif_receive_skb_internal) from [<c0f59e98>] (napi_gro_receive+0x1d8/0x338) [ 318.656887] [<c0f59e98>] (napi_gro_receive) from [<c0c298a4>] (gfar_clean_rx_ring+0x328/0x724) [ 318.665472] [<c0c298a4>] (gfar_clean_rx_ring) from [<c0c29e60>] (gfar_poll_rx_sq+0x34/0x94) [ 318.673795] [<c0c29e60>] (gfar_poll_rx_sq) from [<c0f5b40c>] (net_rx_action+0x128/0x4f8) [ 318.681860] [<c0f5b40c>] (net_rx_action) from [<c03022f0>] (__do_softirq+0x148/0x5ac) [ 318.689666] [<c03022f0>] (__do_softirq) from [<c0355af4>] (irq_exit+0x160/0x170) [ 318.697040] [<c0355af4>] (irq_exit) from [<c03c6818>] (__handle_domain_irq+0x60/0xb4) [ 318.704847] [<c03c6818>] (__handle_domain_irq) from [<c07e9440>] (gic_handle_irq+0x58/0x9c) [ 318.713172] [<c07e9440>] (gic_handle_irq) from [<c0301a70>] (__irq_svc+0x70/0x98) [ 318.720622] Exception stack(0xc2001f18 to 0xc2001f60) [ 318.725656] 1f00: 00000001 00000006 [ 318.733805] 1f20: 00000000 c20165c0 ffffe000 c2010cac c2010cf4 00000001 00000000 c2010c88 [ 318.741955] 1f40: c1f7a5a8 00000000 00000000 c2001f68 c03ba140 c030a288 200e0013 ffffffff [ 318.750110] [<c0301a70>] (__irq_svc) from [<c030a288>] (arch_cpu_idle+0x24/0x3c) [ 318.757486] [<c030a288>] (arch_cpu_idle) from [<c038a480>] (do_idle+0x1b8/0x2a4) [ 318.764859] [<c038a480>] (do_idle) from [<c038a94c>] (cpu_startup_entry+0x18/0x1c) [ 318.772407] [<c038a94c>] (cpu_startup_entry) from [<c1e00f10>] (start_kernel+0x4cc/0x4fc) Fixes: 844d7edc6a34 ("net: dsa: sja1105: Add a global sja1105_tagger_data structure") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net/rds: Fix error handling in rds_ib_add_one()Dotan Barak2019-10-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized after allocation some resources such as protection domain. If allocation of such resources fail, then these uninitialized variables are accessed in rds_ib_dev_free() in failure path. This can potentially crash the system. The code has been updated to initialize these variables very early in the function. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: dsa: rtl8366: Check VLAN ID and not portsLinus Walleij2019-10-021-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There has been some confusion between the port number and the VLAN ID in this driver. What we need to check for validity is the VLAN ID, nothing else. The current confusion came from assigning a few default VLANs for default routing and we need to rewrite that properly. Instead of checking if the port number is a valid VLAN ID, check the actual VLAN IDs passed in to the callback one by one as expected. Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | mlx5: avoid 64-bit division in dr_icm_pool_mr_create()Michal Kubecek2019-10-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently added code introduces 64-bit division in dr_icm_pool_mr_create() so that build on 32-bit architectures fails with ERROR: "__umoddi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined! As the divisor is always a power of 2, we can use bitwise operation instead. Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | tipc: fix unlimited bundling of small messagesTuong Lien2019-10-022-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have identified a problem with the "oversubscription" policy in the link transmission code. When small messages are transmitted, and the sending link has reached the transmit window limit, those messages will be bundled and put into the link backlog queue. However, bundles of data messages are counted at the 'CRITICAL' level, so that the counter for that level, instead of the counter for the real, bundled message's level is the one being increased. Subsequent, to-be-bundled data messages at non-CRITICAL levels continue to be tested against the unchanged counter for their own level, while contributing to an unrestrained increase at the CRITICAL backlog level. This leaves a gap in congestion control algorithm for small messages that can result in starvation for other users or a "real" CRITICAL user. Even that eventually can lead to buffer exhaustion & link reset. We fix this by keeping a 'target_bskb' buffer pointer at each levels, then when bundling, we only bundle messages at the same importance level only. This way, we know exactly how many slots a certain level have occupied in the queue, so can manage level congestion accurately. By bundling messages at the same level, we even have more benefits. Let consider this: - One socket sends 64-byte messages at the 'CRITICAL' level; - Another sends 4096-byte messages at the 'LOW' level; When a 64-byte message comes and is bundled the first time, we put the overhead of message bundle to it (+ 40-byte header, data copy, etc.) for later use, but the next message can be a 4096-byte one that cannot be bundled to the previous one. This means the last bundle carries only one payload message which is totally inefficient, as for the receiver also! Later on, another 64-byte message comes, now we make a new bundle and the same story repeats... With the new bundling algorithm, this will not happen, the 64-byte messages will be bundled together even when the 4096-byte message(s) comes in between. However, if the 4096-byte messages are sent at the same level i.e. 'CRITICAL', the bundling algorithm will again cause the same overhead. Also, the same will happen even with only one socket sending small messages at a rate close to the link transmit's one, so that, when one message is bundled, it's transmitted shortly. Then, another message comes, a new bundle is created and so on... We will solve this issue radically by another patch. Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Reported-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | xen-netfront: do not use ~0U as error return value for xennet_fill_frags()Dongli Zhang2019-10-011-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xennet_fill_frags() uses ~0U as return value when the sk_buff is not able to cache extra fragments. This is incorrect because the return type of xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for ring buffer index. In the situation when the rsp_cons is approaching 0xffffffff, the return value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the caller) would regard as error. As a result, queue->rx.rsp_cons is set incorrectly because it is updated only when there is error. If there is no error, xennet_poll() would be responsible to update queue->rx.rsp_cons. Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL. This leads to NULL pointer access in the next iteration to process rx ring buffer entries. The symptom is similar to the one fixed in commit 00b368502d18 ("xen-netfront: do not assume sk_buff_head list is empty in error handling"). This patch changes the return type of xennet_fill_frags() to indicate whether it is successful or failed. The queue->rx.rsp_cons will be always updated inside this function. Fixes: ad4f15dc2c70 ("xen/netfront: don't bug in case of too many frags") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | ipv6: Handle race in addrconf_dad_workDavid Ahern2019-10-011-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rajendra reported a kernel panic when a link was taken down: [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 <snip> [ 6870.570501] Call Trace: [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 addrconf_dad_work is kicked to be scheduled when a device is brought up. There is a race between addrcond_dad_work getting scheduled and taking the rtnl lock and a process taking the link down (under rtnl). The latter removes the host route from the inet6_addr as part of addrconf_ifdown which is run for NETDEV_DOWN. The former attempts to use the host route in ipv6_ifa_notify. If the down event removes the host route due to the race to the rtnl, then the BUG listed above occurs. This scenario does not occur when the ipv6 address is not kept (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the state of the ifp to DEAD. Handle when the addresses are kept by checking IF_READY which is reset by addrconf_ifdown. The 'dead' flag for an inet6_addr is set only under rtnl, in addrconf_ifdown and it means the device is getting removed (or IPv6 is disabled). The interesting cases for changing the idev flag are addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown (reset the flag). The former does not have the idev lock - only rtnl; the latter has both. Based on that the existing dead + IF_READY check can be moved to right after the rtnl_lock in addrconf_dad_work. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | tcp: adjust rto_base in retransmits_timed_out()Eric Dumazet2019-10-011-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cited commit exposed an old retransmits_timed_out() bug which assumed it could call tcp_model_timeout() with TCP_RTO_MIN as rto_base for all states. But flows in SYN_SENT or SYN_RECV state uses a different RTO base (1 sec instead of 200 ms, unless BPF choses another value) This caused a reduction of SYN retransmits from 6 to 4 with the default /proc/sys/net/ipv4/tcp_syn_retries value. Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Marek Majkowski <marek@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | vsock: Fix a lockdep warning in __vsock_release()Dexuan Cui2019-10-013-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep is unhappy if two locks from the same class are held. Fix the below warning for hyperv and virtio sockets (vmci socket code doesn't have the issue) by using lock_sock_nested() when __vsock_release() is called recursively: ============================================ WARNING: possible recursive locking detected 5.3.0+ #1 Not tainted -------------------------------------------- server/1795 is trying to acquire lock: ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock] but task is already holding lock: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_VSOCK); lock(sk_lock-AF_VSOCK); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by server/1795: #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0 #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock] stack backtrace: CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1 Call Trace: dump_stack+0x67/0x90 __lock_acquire.cold.67+0xd2/0x20b lock_acquire+0xb5/0x1c0 lock_sock_nested+0x6d/0x90 hvs_release+0x10/0x120 [hv_sock] __vsock_release+0x24/0xf0 [vsock] __vsock_release+0xa0/0xf0 [vsock] vsock_release+0x12/0x30 [vsock] __sock_release+0x37/0xa0 sock_close+0x14/0x20 __fput+0xc1/0x250 task_work_run+0x98/0xc0 do_exit+0x344/0xc60 do_group_exit+0x47/0xb0 get_signal+0x15c/0xc50 do_signal+0x30/0x720 exit_to_usermode_loop+0x50/0xa0 do_syscall_64+0x24e/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f4184e85f31 Tested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | hso: fix NULL-deref on tty openJohan Hovold2019-10-011-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix NULL-pointer dereference on tty open due to a failure to handle a missing interrupt-in endpoint when probing modem ports: BUG: kernel NULL pointer dereference, address: 0000000000000006 ... RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso] ... Call Trace: hso_start_serial_device+0xdc/0x140 [hso] hso_serial_open+0x118/0x1b0 [hso] tty_open+0xf1/0x490 Fixes: 542f54823614 ("tty: Modem functions for the HSO driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: ag71xx: fix mdio subnode supportOleksij Rempel2019-10-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is syncing driver with actual devicetree documentation: Documentation/devicetree/bindings/net/qca,ar71xx.txt |Optional subnodes: |- mdio : specifies the mdio bus, used as a container for phy nodes | according to phy.txt in the same directory The driver was working with fixed phy without any noticeable issues. This bug was uncovered by introducing dsa ar9331-switch driver. Since no one reported this bug until now, I assume no body is using it and this patch should not brake existing system. Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Merge branch 'stmmac-fixes'David S. Miller2019-10-016-21/+28
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jose Abreu says: ==================== net: stmmac: Fixes for -net Misc fixes for -net tree. More info in commit logs. v2 is just a rebase of v1 against -net and we added a new patch (09/09) to fix RSS feature. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: stmmac: xgmac: Fix RSS writing wrong keysJose Abreu2019-10-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b6b6cc9acd7b, changed the call to dwxgmac2_rss_write_reg() passing it the variable cfg->key[i]. As key is an u8 but we write 32 bits at a time we need to cast it into an u32 so that the correct key values are written. Notice that the for loop already takes this into account so we don't try to write past the keys size. Fixes: b6b6cc9acd7b ("net: stmmac: selftest: avoid large stack usage") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: stmmac: xgmac: Fix RSS not writing all Keys to HWJose Abreu2019-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sizeof(cfg->key) is != ARRAY_SIZE(cfg->key). Fix it. This warning is triggered when running with cc flag -Wsizeof-array-div. Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Fixes: 76067459c686 ("net: stmmac: Implement RSS and enable it in XGMAC core") Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: stmmac: xgmac: Disable the Timestamp interrupt by defaultJose Abreu2019-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't use it anyway as XGMAC only supports polling for timestamp (in current SW implementation). This greatly reduces the system load by reducing the number of interrupts. Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: stmmac: Do not stop PHY if WoL is enabledJose Abreu2019-10-011-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If WoL is enabled we can't really stop the PHY, otherwise we will not receive the WoL packet. Fix this by telling phylink that only the MAC is down and only stop the PHY if WoL is not enabled. Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: stmmac: Correctly take timestamp for PTPv2Jose Abreu2019-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The case for PTPV2_EVENT requires event packets to be captured so add this setting to the list of enabled captures. Fixes: 891434b18ec0 ("stmmac: add IEEE PTPv1 and PTPv2 support.") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>