summaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* clk: x86: Stop marking clocks as CLK_IS_CRITICALHans de Goede2018-09-171-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the firmware"), which added the code to mark clocks as CLK_IS_CRITICAL, causes all unclaimed PMC clocks on Cherry Trail devices to be on all the time, resulting on the device not being able to reach S0i3 when suspended. The reason for this commit is that on some Bay Trail / Cherry Trail devices the r8169 ethernet controller uses pmc_plt_clk_4. Now that the clk-pmc-atom driver exports an "ether_clk" alias for pmc_plt_clk_4 and the r8169 driver has been modified to get and enable this clock (if present) the marking of the clocks as CLK_IS_CRITICAL is no longer necessary. This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail devices not being able to reach S0i3 greatly decreasing their battery drain when suspended. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102 Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861 Cc: Johannes Stezenbach <js@sig21.net> Cc: Carlo Caione <carlo@endlessm.com> Reported-by: Johannes Stezenbach <js@sig21.net> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* r8169: Get and enable optional ether_clk clockHans de Goede2018-09-171-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | On some boards a platform clock is used as clock for the r8169 chip, this commit adds support for getting and enabling this clock (assuming it has an "ether_clk" alias set on it). This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the firmware") which is a previous attempt to fix this for some x86 boards, but this causes all Cherry Trail SoC using boards to not reach there lowest power states when suspending. This commit (together with an atom-pmc-clk driver commit adding the alias) fixes things properly by making the r8169 get the clock and enable it when it needs it. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102 Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861 Cc: Johannes Stezenbach <js@sig21.net> Cc: Carlo Caione <carlo@endlessm.com> Reported-by: Johannes Stezenbach <js@sig21.net> Acked-by: Stephen Boyd <sboyd@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* clk: x86: add "ether_clk" alias for Bay Trail / Cherry TrailHans de Goede2018-09-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the firmware") causes all unclaimed PMC clocks on Cherry Trail devices to be on all the time, resulting on the device not being able to reach S0i2 or S0i3 when suspended. The reason for this commit is that on some Bay Trail / Cherry Trail devices the ethernet controller uses pmc_plt_clk_4. This commit adds an "ether_clk" alias, so that the relevant ethernet drivers can try to (optionally) use this, without needing X86 specific code / hacks, thus fixing ethernet on these devices without breaking S0i3 support. This commit uses clkdev_hw_create() to create the alias, mirroring the code for the already existing "mclk" alias for pmc_plt_clk_3. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102 Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861 Cc: Johannes Stezenbach <js@sig21.net> Cc: Carlo Caione <carlo@endlessm.com> Reported-by: Johannes Stezenbach <js@sig21.net> Acked-by: Stephen Boyd <sboyd@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* r8169: enable ASPM on RTL8106EKai-Heng Feng2018-09-171-0/+3
| | | | | | | | | | | | The Intel SoC was prevented from entering lower idle state because of RTL8106E's ASPM was not enabled. So enable ASPM on RTL8106E (chip version 39). Now the Intel SoC can enter lower idle state, power consumption and temperature are much lower. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* r8169: Align ASPM/CLKREQ setting function with vendor driverKai-Heng Feng2018-09-171-1/+3
| | | | | | | | | | | | | There's a small delay after setting ASPM in vendor drivers, r8101 and r8168. In addition, those drivers enable ASPM before ClkReq, also change that to align with vendor driver. I haven't seen anything bad becasue of this, but I think it's better to keep in sync with vendor driver. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: Fix a unused function warning.zhong jiang2018-09-171-3/+3
| | | | | | | | | | | | Fix the following compile warning: drivers/net/ethernet/microchip/lan743x_main.c:2964:12: warning: ‘lan743x_pm_suspend’ defined but not used [-Wunused-function] static int lan743x_pm_suspend(struct device *dev) drivers/net/ethernet/microchip/lan743x_main.c:2987:12: warning: ‘lan743x_pm_resume’ defined but not used [-Wunused-function] static int lan743x_pm_resume(struct device *dev) Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: mv88e6xxx: Fix ATU Miss ViolationAndrew Lunn2018-09-172-2/+2
| | | | | | | | | Fix a cut/paste error and a typo which results in ATU miss violations not being reported. Fixes: 0977644c5005 ("net: dsa: mv88e6xxx: Decode ATU problem interrupt") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* hv_netvsc: pair VF based on serial numberStephen Hemminger2018-09-172-25/+36
| | | | | | | | | | Matching network device based on MAC address is problematic since a non VF network device can be creted with a duplicate MAC address causing confusion and problems. The VMBus API does provide a serial number that is a better matching method. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* PCI: hv: support reporting serial number as slot informationStephen Hemminger2018-09-171-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | The Hyper-V host API for PCI provides a unique "serial number" which can be used as basis for sysfs PCI slot table. This can be useful for cases where userspace wants to find the PCI device based on serial number. When an SR-IOV NIC is added, the host sends an attach message with serial number. The kernel doesn't use the serial number, but it is useful when doing the same thing in a userspace driver such as the DPDK. By having /sys/bus/pci/slots/N it provides a direct way to find the matching PCI device. There maybe some cases where serial number is not unique such as when using GPU's. But the PCI slot infrastructure will handle that. This has a side effect which may also be useful. The common udev network device naming policy uses the slot information (rather than PCI address). Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnxt_en: Fix VF mac address regression.Michael Chan2018-09-173-7/+13
| | | | | | | | | | | | | | | | | | | | The recent commit to always forward the VF MAC address to the PF for approval may not work if the PF driver or the firmware is older. This will cause the VF driver to fail during probe: bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF bnxt_en 0000:00:03.0: Unable to initialize mac address. bnxt_en: probe of 0000:00:03.0 failed with error -99 We fix it by treating the error as fatal only if the VF MAC address is locally generated by the VF. Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.") Reported-by: Seth Forshee <seth.forshee@canonical.com> Reported-by: Siwei Liu <loseweigh@gmail.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: hp100: fix always-true check for link up stateColin Ian King2018-09-171-1/+1
| | | | | | | | | | | | | The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value that is always non-zero and hence the wait for the link to drop always terminates prematurely. Fix this by using a logical not operator instead of a bitwise complement. This issue has been in the driver since pre-2.6.12-rc2. Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: macb: disable scatter-gather for macb on sama5d3Nicolas Ferre2018-09-171-0/+8
| | | | | | | | | Create a new configuration for the sama5d3-macb new compatibility string. This configuration disables scatter-gather because we experienced lock down of the macb interface of this particular SoC under very high load. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: mvpp2: let phylink manage the carrier stateAntoine Tenart2018-09-171-15/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Net drivers using phylink shouldn't mess with the link carrier themselves and should let phylink manage it. The mvpp2 driver wasn't following this best practice as the mac_config() function made calls to change the link carrier state. This led to wrongly reported carrier link state which then triggered other issues. This patch fixes this behaviour. But the PPv2 driver relied on this misbehaviour in two cases: for fixed links and when not using phylink (ACPI mode). The later was fixed by adding an explicit call to link_up(), which when the ACPI mode will use phylink should be removed. The fixed link case was relying on the mac_config() function to set the link up, as we found an issue in phylink_start() which assumes the carrier is off. If not, the link_up() function is never called. To fix this, a call to netif_carrier_off() is added just before phylink_start() so that we do not introduce a regression in the driver. Fixes: 4bb043262878 ("net: mvpp2: phylink support") Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pppoe: fix reception of frames with no mac headerGuillaume Nault2018-09-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pppoe_rcv() needs to look back at the Ethernet header in order to lookup the PPPoE session. Therefore we need to ensure that the mac header is big enough to contain an Ethernet header. Otherwise eth_hdr(skb)->h_source might access invalid data. ================================================================== BUG: KMSAN: uninit-value in __get_item drivers/net/ppp/pppoe.c:172 [inline] BUG: KMSAN: uninit-value in get_item drivers/net/ppp/pppoe.c:236 [inline] BUG: KMSAN: uninit-value in pppoe_rcv+0xcef/0x10e0 drivers/net/ppp/pppoe.c:450 CPU: 0 PID: 4543 Comm: syz-executor355 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 __get_item drivers/net/ppp/pppoe.c:172 [inline] get_item drivers/net/ppp/pppoe.c:236 [inline] pppoe_rcv+0xcef/0x10e0 drivers/net/ppp/pppoe.c:450 __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701 netif_receive_skb+0x230/0x240 net/core/dev.c:4725 tun_rx_batched drivers/net/tun.c:1555 [inline] tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 call_write_iter include/linux/fs.h:1782 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x7fb/0x9f0 fs/read_write.c:482 vfs_write+0x463/0x8d0 fs/read_write.c:544 SYSC_write+0x172/0x360 fs/read_write.c:589 SyS_write+0x55/0x80 fs/read_write.c:581 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x4447c9 RSP: 002b:00007fff64c8fc28 EFLAGS: 00000297 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004447c9 RDX: 000000000000fd87 RSI: 0000000020000600 RDI: 0000000000000004 RBP: 00000000006cf018 R08: 00007fff64c8fda8 R09: 00007fff00006bda R10: 0000000000005fe7 R11: 0000000000000297 R12: 00000000004020d0 R13: 0000000000402160 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 tun_alloc_skb drivers/net/tun.c:1532 [inline] tun_get_user+0x2242/0x7c60 drivers/net/tun.c:1829 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 call_write_iter include/linux/fs.h:1782 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x7fb/0x9f0 fs/read_write.c:482 vfs_write+0x463/0x8d0 fs/read_write.c:544 SYSC_write+0x172/0x360 fs/read_write.c:589 SyS_write+0x55/0x80 fs/read_write.c:581 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ================================================================== Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") Reported-by: syzbot+f5f6080811c849739212@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: ti: add missing GENERIC_ALLOCATOR dependencyCorentin Labbe2018-09-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch mades TI_DAVINCI_CPDMA select GENERIC_ALLOCATOR. without that, the following sparc64 build failure happen drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_check_free_tx_desc': (.text+0x278): undefined reference to `gen_pool_avail' drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_chan_submit': (.text+0x340): undefined reference to `gen_pool_alloc' (.text+0x5c4): undefined reference to `gen_pool_free' drivers/net/ethernet/ti/davinci_cpdma.o: In function `__cpdma_chan_free': davinci_cpdma.c:(.text+0x64c): undefined reference to `gen_pool_free' drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_desc_pool_destroy.isra.6': davinci_cpdma.c:(.text+0x17ac): undefined reference to `gen_pool_size' davinci_cpdma.c:(.text+0x17b8): undefined reference to `gen_pool_avail' davinci_cpdma.c:(.text+0x1824): undefined reference to `gen_pool_size' davinci_cpdma.c:(.text+0x1830): undefined reference to `gen_pool_avail' drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_ctlr_create': (.text+0x19f8): undefined reference to `devm_gen_pool_create' (.text+0x1a90): undefined reference to `gen_pool_add_virt' Makefile:1011: recipe for target 'vmlinux' failed Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* veth: Orphan skb before GROToshiaki Makita2018-09-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GRO expects skbs not to be owned by sockets, but when XDP is enabled veth passed skbs owned by sockets. It caused corrupted sk_wmem_alloc. Paolo Abeni reported the following splat: [ 362.098904] refcount_t overflow at skb_set_owner_w+0x5e/0xa0 in iperf3[1644], uid/euid: 0/0 [ 362.108239] WARNING: CPU: 0 PID: 1644 at kernel/panic.c:648 refcount_error_report+0xa0/0xa4 [ 362.117547] Modules linked in: tcp_diag inet_diag veth intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif iTCO_wdt sg ipmi_si iTCO_vendor_support ipmi_devintf mxm_wmi ipmi_msghandler pcspkr dcdbas mei_me wmi mei lpc_ich acpi_power_meter pcc_cpufreq xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe igb ttm ahci mdio libahci ptp crc32c_intel drm pps_core libata i2c_algo_bit dca dm_mirror dm_region_hash dm_log dm_mod [ 362.176622] CPU: 0 PID: 1644 Comm: iperf3 Not tainted 4.19.0-rc2.vanilla+ #2025 [ 362.184777] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 [ 362.193124] RIP: 0010:refcount_error_report+0xa0/0xa4 [ 362.198758] Code: 08 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 80 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 18 4d e7 9d 31 c0 e8 30 fa ff ff <0f> 0b eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc [ 362.219711] RSP: 0018:ffff9ee6ff603c20 EFLAGS: 00010282 [ 362.225538] RAX: 0000000000000000 RBX: ffffffff9de83e10 RCX: 0000000000000000 [ 362.233497] RDX: 0000000000000001 RSI: ffff9ee6ff6167d8 RDI: ffff9ee6ff6167d8 [ 362.241457] RBP: ffff9ee6ff603d78 R08: 0000000000000490 R09: 0000000000000004 [ 362.249416] R10: 0000000000000000 R11: ffff9ee6ff603990 R12: ffff9ee664b94500 [ 362.257377] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff9de615f9 [ 362.265337] FS: 00007f1d22d28740(0000) GS:ffff9ee6ff600000(0000) knlGS:0000000000000000 [ 362.274363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 362.280773] CR2: 00007f1d222f35d0 CR3: 0000001fddfec003 CR4: 00000000001606f0 [ 362.288733] Call Trace: [ 362.291459] <IRQ> [ 362.293702] ex_handler_refcount+0x4e/0x80 [ 362.298269] fixup_exception+0x35/0x40 [ 362.302451] do_trap+0x109/0x150 [ 362.306048] do_error_trap+0xd5/0x130 [ 362.315766] invalid_op+0x14/0x20 [ 362.319460] RIP: 0010:skb_set_owner_w+0x5e/0xa0 [ 362.324512] Code: ef ff ff 74 49 48 c7 43 60 20 7b 4a 9d 8b 85 f4 01 00 00 85 c0 75 16 8b 83 e0 00 00 00 f0 01 85 44 01 00 00 0f 88 d8 23 16 00 <5b> 5d c3 80 8b 91 00 00 00 01 8b 85 f4 01 00 00 89 83 a4 00 00 00 [ 362.345465] RSP: 0018:ffff9ee6ff603e20 EFLAGS: 00010a86 [ 362.351291] RAX: 0000000000001100 RBX: ffff9ee65deec700 RCX: ffff9ee65e829244 [ 362.359250] RDX: 0000000000000100 RSI: ffff9ee65e829100 RDI: ffff9ee65deec700 [ 362.367210] RBP: ffff9ee65e829100 R08: 000000000002a380 R09: 0000000000000000 [ 362.375169] R10: 0000000000000002 R11: fffff1a4bf77bb00 R12: ffffc0754661d000 [ 362.383130] R13: ffff9ee65deec200 R14: ffff9ee65f597000 R15: 00000000000000aa [ 362.391092] veth_xdp_rcv+0x4e4/0x890 [veth] [ 362.399357] veth_poll+0x4d/0x17a [veth] [ 362.403731] net_rx_action+0x2af/0x3f0 [ 362.407912] __do_softirq+0xdd/0x29e [ 362.411897] do_softirq_own_stack+0x2a/0x40 [ 362.416561] </IRQ> [ 362.418899] do_softirq+0x4b/0x70 [ 362.422594] __local_bh_enable_ip+0x50/0x60 [ 362.427258] ip_finish_output2+0x16a/0x390 [ 362.431824] ip_output+0x71/0xe0 [ 362.440670] __tcp_transmit_skb+0x583/0xab0 [ 362.445333] tcp_write_xmit+0x247/0xfb0 [ 362.449609] __tcp_push_pending_frames+0x2d/0xd0 [ 362.454760] tcp_sendmsg_locked+0x857/0xd30 [ 362.459424] tcp_sendmsg+0x27/0x40 [ 362.463216] sock_sendmsg+0x36/0x50 [ 362.467104] sock_write_iter+0x87/0x100 [ 362.471382] __vfs_write+0x112/0x1a0 [ 362.475369] vfs_write+0xad/0x1a0 [ 362.479062] ksys_write+0x52/0xc0 [ 362.482759] do_syscall_64+0x5b/0x180 [ 362.486841] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 362.492473] RIP: 0033:0x7f1d22293238 [ 362.496458] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 c5 54 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 [ 362.517409] RSP: 002b:00007ffebaef8008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 362.525855] RAX: ffffffffffffffda RBX: 0000000000002800 RCX: 00007f1d22293238 [ 362.533816] RDX: 0000000000002800 RSI: 00007f1d22d36000 RDI: 0000000000000005 [ 362.541775] RBP: 00007f1d22d36000 R08: 00000002db777a30 R09: 0000562b70712b20 [ 362.549734] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005 [ 362.557693] R13: 0000000000002800 R14: 00007ffebaef8060 R15: 0000562b70712260 In order to avoid this, orphan the skb before entering GRO. Fixes: 948d4f214fde ("veth: Add driver XDP") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* stmmac: fix valid numbers of unicast filter entriesJongsung Kim2018-09-161-3/+2
| | | | | | | | | | Synopsys DWC Ethernet MAC can be configured to have 1..32, 64, or 128 unicast filter entries. (Table 7-8 MAC Address Registers from databook) Fix dwmac1000_validate_ucast_entries() to accept values between 1 and 32 in addition. Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-upstream' of ↵David S. Miller2018-09-131-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2018-09-13 A few Bluetooth fixes for the 4.19-rc series: - Fixed rw_semaphore leak in hci_ldisc - Fixed local Out-of-Band pairing data handling Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * Bluetooth: hci_ldisc: Free rw_semaphore on closeHermes Zhang2018-09-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The percpu_rw_semaphore is not currently freed, and this leads to a crash when the stale rcu callback is invoked. DEBUG_OBJECTS detects this. ODEBUG: free active (active state 1) object type: rcu_head hint: (null) ------------[ cut here ]------------ WARNING: CPU: 1 PID: 2024 at debug_print_object+0xac/0xc8 PC is at debug_print_object+0xac/0xc8 LR is at debug_print_object+0xac/0xc8 Call trace: [<ffffff80082e2c2c>] debug_print_object+0xac/0xc8 [<ffffff80082e40b0>] debug_check_no_obj_freed+0x1e8/0x228 [<ffffff8008191254>] kfree+0x1cc/0x250 [<ffffff80083cc03c>] hci_uart_tty_close+0x54/0x108 [<ffffff800832e118>] tty_ldisc_close.isra.1+0x40/0x58 [<ffffff800832e14c>] tty_ldisc_kill+0x1c/0x40 [<ffffff800832e3dc>] tty_ldisc_release+0x94/0x170 [<ffffff8008325554>] tty_release_struct+0x1c/0x58 [<ffffff8008326400>] tty_release+0x3b0/0x490 [<ffffff80081a3fe8>] __fput+0x88/0x1d0 [<ffffff80081a418c>] ____fput+0xc/0x18 [<ffffff80080c0624>] task_work_run+0x9c/0xc0 [<ffffff80080a9e24>] do_exit+0x24c/0x8a0 [<ffffff80080aa4e0>] do_group_exit+0x38/0xa0 [<ffffff80080aa558>] __wake_up_parent+0x0/0x28 [<ffffff8008082c00>] el0_svc_naked+0x34/0x38 ---[ end trace bfe08cbd89098cdf ]--- Signed-off-by: Hermes Zhang <chenhuiz@axis.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* | net/appletalk: fix minor pointer leak to userspace in SIOCFINDIPDDPRTWilly Tarreau2018-09-131-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Fields ->dev and ->next of struct ipddp_route may be copied to userspace on the SIOCFINDIPDDPRT ioctl. This is only accessible to CAP_NET_ADMIN though. Let's manually copy the relevant fields instead of using memcpy(). BugLink: http://blog.infosectcbr.com.au/2018/09/linux-kernel-infoleaks.html Cc: Jann Horn <jannh@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | hv_netvsc: fix schedule in RCU contextStephen Hemminger2018-09-131-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When netvsc device is removed it can call reschedule in RCU context. This happens because canceling the subchannel setup work could (in theory) cause a reschedule when manipulating the timer. To reproduce, run with lockdep enabled kernel and unbind a network device from hv_netvsc (via sysfs). [ 160.682011] WARNING: suspicious RCU usage [ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted [ 160.709937] ----------------------------- [ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 160.723691] [ 160.723691] other info that might help us debug this: [ 160.723691] [ 160.730955] [ 160.730955] rcu_scheduler_active = 2, debug_locks = 1 [ 160.762813] 5 locks held by rebind-eth.sh/1812: [ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0 [ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0 [ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0 [ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250 [ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.828629] [ 160.828629] stack backtrace: [ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2 [ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 160.832952] Call Trace: [ 160.832952] dump_stack+0x85/0xcb [ 160.832952] ___might_sleep+0x1a3/0x240 [ 160.832952] __flush_work+0x57/0x2e0 [ 160.832952] ? __mutex_lock+0x83/0x990 [ 160.832952] ? __kernfs_remove+0x24f/0x2e0 [ 160.832952] ? __kernfs_remove+0x1b2/0x2e0 [ 160.832952] ? mark_held_locks+0x50/0x80 [ 160.832952] ? get_work_pool+0x90/0x90 [ 160.832952] __cancel_work_timer+0x13c/0x1e0 [ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.832952] ? __lock_is_held+0x55/0x90 [ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc] [ 160.832952] vmbus_remove+0x26/0x30 [ 160.832952] device_release_driver_internal+0x18a/0x250 [ 160.832952] unbind_store+0xb4/0x180 [ 160.832952] kernfs_fop_write+0x113/0x1a0 [ 160.832952] __vfs_write+0x36/0x1a0 [ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80 [ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 160.832952] ? __sb_start_write+0x141/0x1a0 [ 160.832952] ? vfs_write+0x184/0x1b0 [ 160.832952] vfs_write+0xbe/0x1b0 [ 160.832952] ksys_write+0x55/0xc0 [ 160.832952] do_syscall_64+0x60/0x1b0 [ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 160.832952] RIP: 0033:0x7fe48f4c8154 Resolve this by getting RTNL earlier. This is safe because the subchannel work queue does trylock on RTNL and will detect the race. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xen/netfront: don't bug in case of too many fragsJuergen Gross2018-09-131-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 57f230ab04d291 ("xen/netfront: raise max number of slots in xennet_get_responses()") raised the max number of allowed slots by one. This seems to be problematic in some configurations with netback using a larger MAX_SKB_FRAGS value (e.g. old Linux kernel with MAX_SKB_FRAGS defined as 18 instead of nowadays 17). Instead of BUG_ON() in this case just fall back to retransmission. Fixes: 57f230ab04d291 ("xen/netfront: raise max number of slots in xennet_get_responses()") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2018-09-1216-114/+297
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm nouveau fixes from Dave Airlie: "I'm sending this separately as it's a bit larger than I generally like for one driver, but it does contain a bunch of make my nvidia laptop not die (runpm) and a bunch to make my docking station and monitor display stuff (mst) fixes. Lyude has spent a lot of time on these, and we are putting the fixes into distro kernels as well asap, as it helps a bunch of standard Lenovo laptops, so I'm fairly happy things are better than they were before these patches, but I decided to split them out just for clarification" * tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm: drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP panels drm/nouveau/disp: fix DP disable race drm/nouveau/disp: move eDP panel power handling drm/nouveau/disp: remove unused struct member drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointer drm/nouveau: fix oops in client init failure path drm/nouveau: Fix nouveau_connector_ddc_detect() drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early drm/nouveau: Reset MST branching unit before enabling drm/nouveau: Only write DP_MSTM_CTRL when needed drm/nouveau: Remove useless poll_enable() call in drm_load() drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state() drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state() drm/nouveau: Fix deadlocks in nouveau_connector_detect() drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect() drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend() drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
| * \ Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-fixesDave Airlie2018-09-1116-114/+297
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bunch of fixes for MST/runpm problems and races, as well as fixes for issues that prevent more recent laptops from booting. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Ben Skeggs <bskeggs@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA==GF63dy8a9j611=-0x8G6FRu7uC-ZQypsLO_hqV4OAcA@mail.gmail.com
| | * | drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP ↵Ben Skeggs2018-09-074-3/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | panels Fixes eDP backlight issues on more recent laptops. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/disp: fix DP disable raceBen Skeggs2018-09-074-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a HPD pulse signalling the need to retrain the link occurs between the KMS driver releasing the output and the supervisor interrupt that finishes the teardown, it was possible get a NULL-ptr deref. Avoid this by marking the link as inactive earlier. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/disp: move eDP panel power handlingBen Skeggs2018-09-072-25/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to do this earlier to prevent aux channel timeouts in resume paths on certain systems. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/disp: remove unused struct memberBen Skeggs2018-09-072-2/+0
| | | | | | | | | | | | | | | | Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOSBen Skeggs2018-09-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This Falcon application doesn't appear to be present on some newer systems, so let's not fail init if we can't find it. TBD: is there a way to determine whether it *should* be there? Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointerBen Skeggs2018-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fixes oopses in certain failure paths. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: fix oops in client init failure pathBen Skeggs2018-09-071-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The NV_ERROR macro requires drm->client to be initialised, which it may not be at this stage of the init process. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Fix nouveau_connector_ddc_detect()Lyude Paul2018-09-071-21/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It looks like that when we moved over to using drm_connector_for_each_possible_encoder() in nouveau, that one rather important part of this function got dropped by accident: /* Right v here */ for (i = 0; nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER; i++) { int id = connector->encoder_ids[i]; if (id == 0) break; Since it's rather difficult to notice: the conditional in this loop is actually: nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER Meaning that all early breaks result in nv_encoder keeping it's value, otherwise nv_encoder = NULL. Ugh. Since this got dropped, nouveau_connector_ddc_detect() now returns an encoder for every single connector, regardless of whether or not it's detected: [ 1780.056185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-2 So: fix this to ensure we only return an encoder if we actually found one, and clean up the rest of the function while we're at it since it's nearly impossible to read properly. Changes since v1: - Don't skip ddc probing for LVDS if we can't switch DDC through vga-switcheroo, just do the DDC probing without calling vga_switcheroo_lock_ddc() - skeggsb Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: ddba766dd07e ("drm/nouveau: Use drm_connector_for_each_possible_encoder()") Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unloadLyude Paul2018-09-073-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there's nothing in nouveau that actually cancels this work struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky enough hpd_work might try to keep running up until the system is suspended. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too earlyLyude Paul2018-09-071-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On most systems with ACPI hotplugging support, it seems that we always receive a hotplug event once we re-enable EC interrupts even if the GPU hasn't even been resumed yet. This can cause problems since even though we schedule hpd_work to handle connector reprobing for us, hpd_work synchronizes on pm_runtime_get_sync() to wait until the device is ready to perform reprobing. Since runtime suspend/resume callbacks are disabled before the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during this period will grab a runtime PM ref and return immediately with -EACCES. Because we schedule hpd_work from our ACPI HPD handler, and hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch a connector reprobe immediately even if the GPU isn't actually resumed just yet. This causes various warnings in dmesg and occasionally, also prevents some displays connected to the dedicated GPU from coming back up after suspend. Example: usb 1-4: USB disconnect, device number 14 usb 1-4.1: USB disconnect, device number 15 WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau] CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1 Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018 Workqueue: events nouveau_display_hpd_work [nouveau] RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau] RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000 RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002 RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000 R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418 FS: 0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? _cond_resched+0x15/0x30 nouveau_connector_detect+0x2ce/0x520 [nouveau] ? _cond_resched+0x15/0x30 ? ww_mutex_lock+0x12/0x40 drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper] drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper] nouveau_display_hpd_work+0x2a/0x60 [nouveau] process_one_work+0x187/0x340 worker_thread+0x2e/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff ---[ end trace 55d811b38fc8e71a ]--- So, to fix this we attempt to grab a runtime PM reference in the ACPI handler itself asynchronously. If the GPU is already awake (it will have normal hotplugging at this point) or runtime PM callbacks are currently disabled on the device, we drop our reference without updating the autosuspend delay. We only schedule connector reprobes when we successfully managed to queue up a resume request with our asynchronous PM ref. This also has the added benefit of preventing redundant connector reprobes from ACPI while the GPU is runtime resumed! Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <kherbst@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41 Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Reset MST branching unit before enablingLyude Paul2018-09-071-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When probing a new MST device, it's not safe to make any assumptions about it's current state. While most well mannered MST hubs will just disable the branching unit on hotplug disconnects, this isn't enough to save us from various other scenarios that might have resulted in something writing to the MST branching unit before we got control of it. This could happen if a previous probe we tried failed, if we're booting in kexec context and the hub is still in the state the last kernel put it in, etc. Luckily; there is no reason we can't just reset the branching unit every time we enable a new topology. So, fix this by resetting it on enabling new topologies to ensure that we always start off with a clean, unmodified topology state on MST sinks. This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX times out all DPCD trasactions) observed after multiple docks, undocks, and module reloads. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Only write DP_MSTM_CTRL when neededLyude Paul2018-09-071-9/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST hub every time it receives a long HPD pulse on DP. This isn't actually necessary and additionally, has some unintended side effects. With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to make it rather likely (1 out of 5 times usually) that bringing up MST with it's ThinkPad dock will fail and result in sideband messages timing out in the middle. Afterwards, successive probes don't manage to get the dock to communicate properly over MST sideband properly. Many times sideband message timeouts from MST hubs are indicative of either the source or the sink dropping an ESI event, which can cause DRM's perspective of the topology's current state to go out of sync with reality. While it's tough to really know for sure what's happening to the dock, using userspace tools to write to DP_MSTM_CTRL in the middle of the MST link probing process does appear to make things flaky. It's possible that when we write to DP_MSTM_CTRL, the function that gets triggered to respond in the dock's firmware temporarily puts it in a state where it might end up not reporting an ESI to the source, or ends up dropping a sideband message we sent it. So, to fix this we make it so that when probing an MST topology, we respect it's current state. If the dock's already enabled, we simply read DP_MSTM_CTRL and disable the topology if it's value is not what we expected. Otherwise, we perform the normal MST probing dance. We avoid taking any action except if the state of the MST topology actually changes. This fixes MST sideband message timeouts and detection failures on my P50 with its ThinkPad dock. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Remove useless poll_enable() call in drm_load()Lyude Paul2018-09-071-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Again, this doesn't do anything. drm_kms_helper_poll_enable() will have already been called in nouveau_display_init() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()Lyude Paul2018-09-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This won't do anything but potentially make us miss hotplugs. We already call drm_kms_helper_poll_disable() in nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()Lyude Paul2018-09-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This doesn't do anything, drm_kms_helper_poll_enable() gets called in nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init() already. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Fix deadlocks in nouveau_connector_detect()Lyude Paul2018-09-071-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we disable hotplugging on the GPU, we need to be able to synchronize with each connector's hotplug interrupt handler before the interrupt is finally disabled. This can be a problem however, since nouveau_connector_detect() currently grabs a runtime power reference when handling connector probing. This will deadlock the runtime suspend handler like so: [ 861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds. [ 861.483290] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.486332] kworker/0:2 D 0 61 2 0x80000000 [ 861.487044] Workqueue: events nouveau_display_hpd_work [nouveau] [ 861.487737] Call Trace: [ 861.488394] __schedule+0x322/0xaf0 [ 861.489070] schedule+0x33/0x90 [ 861.489744] rpm_resume+0x19c/0x850 [ 861.490392] ? finish_wait+0x90/0x90 [ 861.491068] __pm_runtime_resume+0x4e/0x90 [ 861.491753] nouveau_display_hpd_work+0x22/0x60 [nouveau] [ 861.492416] process_one_work+0x231/0x620 [ 861.493068] worker_thread+0x44/0x3a0 [ 861.493722] kthread+0x12b/0x150 [ 861.494342] ? wq_pool_ids_show+0x140/0x140 [ 861.494991] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.495648] ret_from_fork+0x3a/0x50 [ 861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds. [ 861.496968] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.498341] kworker/6:2 D 0 320 2 0x80000080 [ 861.499045] Workqueue: pm pm_runtime_work [ 861.499739] Call Trace: [ 861.500428] __schedule+0x322/0xaf0 [ 861.501134] ? wait_for_completion+0x104/0x190 [ 861.501851] schedule+0x33/0x90 [ 861.502564] schedule_timeout+0x3a5/0x590 [ 861.503284] ? mark_held_locks+0x58/0x80 [ 861.503988] ? _raw_spin_unlock_irq+0x2c/0x40 [ 861.504710] ? wait_for_completion+0x104/0x190 [ 861.505417] ? trace_hardirqs_on_caller+0xf4/0x190 [ 861.506136] ? wait_for_completion+0x104/0x190 [ 861.506845] wait_for_completion+0x12c/0x190 [ 861.507555] ? wake_up_q+0x80/0x80 [ 861.508268] flush_work+0x1c9/0x280 [ 861.508990] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 861.509735] nvif_notify_put+0xb1/0xc0 [nouveau] [ 861.510482] nouveau_display_fini+0xbd/0x170 [nouveau] [ 861.511241] nouveau_display_suspend+0x67/0x120 [nouveau] [ 861.511969] nouveau_do_suspend+0x5e/0x2d0 [nouveau] [ 861.512715] nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau] [ 861.513435] pci_pm_runtime_suspend+0x6b/0x180 [ 861.514165] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.514897] __rpm_callback+0x7a/0x1d0 [ 861.515618] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.516313] rpm_callback+0x24/0x80 [ 861.517027] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.517741] rpm_suspend+0x142/0x6b0 [ 861.518449] pm_runtime_work+0x97/0xc0 [ 861.519144] process_one_work+0x231/0x620 [ 861.519831] worker_thread+0x44/0x3a0 [ 861.520522] kthread+0x12b/0x150 [ 861.521220] ? wq_pool_ids_show+0x140/0x140 [ 861.521925] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.522622] ret_from_fork+0x3a/0x50 [ 861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds. [ 861.523977] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.525349] kworker/6:0 D 0 1329 2 0x80000000 [ 861.526073] Workqueue: events nvif_notify_work [nouveau] [ 861.526751] Call Trace: [ 861.527411] __schedule+0x322/0xaf0 [ 861.528089] schedule+0x33/0x90 [ 861.528758] rpm_resume+0x19c/0x850 [ 861.529399] ? finish_wait+0x90/0x90 [ 861.530073] __pm_runtime_resume+0x4e/0x90 [ 861.530798] nouveau_connector_detect+0x7e/0x510 [nouveau] [ 861.531459] ? ww_mutex_lock+0x47/0x80 [ 861.532097] ? ww_mutex_lock+0x47/0x80 [ 861.532819] ? drm_modeset_lock+0x88/0x130 [drm] [ 861.533481] drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper] [ 861.534127] drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper] [ 861.534940] nouveau_connector_hotplug+0x98/0x120 [nouveau] [ 861.535556] nvif_notify_work+0x2d/0xb0 [nouveau] [ 861.536221] process_one_work+0x231/0x620 [ 861.536994] worker_thread+0x44/0x3a0 [ 861.537757] kthread+0x12b/0x150 [ 861.538463] ? wq_pool_ids_show+0x140/0x140 [ 861.539102] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.539815] ret_from_fork+0x3a/0x50 [ 861.540521] Showing all locks held in the system: [ 861.541696] 2 locks held by kworker/0:2/61: [ 861.542406] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543071] #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543814] 1 lock held by khungtaskd/64: [ 861.544535] #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 861.545160] 3 locks held by kworker/6:2/320: [ 861.545896] #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.546702] #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.547443] #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau] [ 861.548146] 1 lock held by dmesg/983: [ 861.548889] 2 locks held by zsh/1250: [ 861.549605] #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 861.550393] #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 861.551122] 6 locks held by kworker/6:0/1329: [ 861.551957] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.552765] #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.553582] #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper] [ 861.554357] #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper] [ 861.555227] #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper] [ 861.556133] #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm] [ 861.557864] ============================================= [ 861.559507] NMI backtrace for cpu 2 [ 861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 861.561948] Call Trace: [ 861.562757] dump_stack+0x8e/0xd3 [ 861.563516] nmi_cpu_backtrace.cold.3+0x14/0x5a [ 861.564269] ? lapic_can_unplug_cpu.cold.27+0x42/0x42 [ 861.565029] nmi_trigger_cpumask_backtrace+0xa1/0xae [ 861.565789] arch_trigger_cpumask_backtrace+0x19/0x20 [ 861.566558] watchdog+0x316/0x580 [ 861.567355] kthread+0x12b/0x150 [ 861.568114] ? reset_hung_task_detector+0x20/0x20 [ 861.568863] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.569598] ret_from_fork+0x3a/0x50 [ 861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7: [ 861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120 [ 861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120 [ 861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120 [ 861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120 [ 861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120 [ 861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120 [ 861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120 [ 861.572428] Kernel panic - not syncing: hung_task: blocked tasks So: fix this by making it so that normal hotplug handling /only/ happens so long as the GPU is currently awake without any pending runtime PM requests. In the event that a hotplug occurs while the device is suspending or resuming, we can simply defer our response until the GPU is fully runtime resumed again. Changes since v4: - Use a new trick I came up with using pm_runtime_get() instead of the hackish junk we had before Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()Lyude Paul2018-09-071-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's true we can't resume the device from poll workers in nouveau_connector_detect(). We can however, prevent the autosuspend timer from elapsing immediately if it hasn't already without risking any sort of deadlock with the runtime suspend/resume operations. So do that instead of entirely avoiding grabbing a power reference. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requestsLyude Paul2018-09-074-2/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, nouveau uses the generic drm_fb_helper_output_poll_changed() function provided by DRM as it's output_poll_changed callback. Unfortunately however, this function doesn't grab runtime PM references early enough and even if it did-we can't block waiting for the device to resume in output_poll_changed() since it's very likely that we'll need to grab the fb_helper lock at some point during the runtime resume process. This currently results in deadlocking like so: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================= After trying dozens of different solutions, I found one very simple one that should also have the benefit of preventing us from having to fight locking for the rest of our lives. So, we work around these deadlocks by deferring all fbcon hotplug events that happen after the runtime suspend process starts until after the device is resumed again. Changes since v7: - Fixup commit message - Daniel Vetter Changes since v6: - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia Changes since v5: - Come up with the (hopefully final) solution for solving this dumb problem, one that is a lot less likely to cause issues with locking in the future. This should work around all deadlock conditions with fbcon brought up thus far. Changes since v4: - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock condition that Lukas described - Just move all of this out of drm_fb_helper. It seems that other DRM drivers have already figured out other workarounds for this. If other drivers do end up needing this in the future, we can just move this back into drm_fb_helper again. Changes since v3: - Actually check if fb_helper is NULL in both new helpers - Actually check drm_fbdev_emulation in both new helpers - Don't fire off a fb_helper hotplug unconditionally; only do it if the following conditions are true (as otherwise, calling this in the wrong spot will cause Bad Things to happen): - fb_helper hotplug handling was actually inhibited previously - fb_helper actually has a delayed hotplug pending - fb_helper is actually bound - fb_helper is actually initialized - Add __must_check to drm_fb_helper_suspend_hotplug(). There's no situation where a driver would actually want to use this without checking the return value, so enforce that - Rewrite and clarify the documentation for both helpers. - Make sure to return true in the drm_fb_helper_suspend_hotplug() stub that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION isn't enabled - Actually grab the toplevel fb_helper lock in drm_fb_helper_resume_hotplug(), since it's possible other activity (such as a hotplug) could be going on at the same time the driver calls drm_fb_helper_resume_hotplug(). We need this to check whether or not drm_fb_helper_hotplug_event() needs to be called anyway Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()Lyude Paul2018-09-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since actual hotplug notifications don't get disabled until nouveau_display_fini() is called, all this will do is cause any hotplugs that happen between this drm_kms_helper_poll_disable() call and the actual hotplug disablement to potentially be dropped if ACPI isn't around to help us. Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | * | drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placementLyude Paul2018-09-071-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turns out this part is my fault for not noticing when reviewing 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work(). This makes basically no sense however, because that means we're calling drm_kms_helper_poll_enable() every time we schedule the hotplug detection work. This is also against the advice mentioned in drm_kms_helper_poll_enable()'s documentation: Note that calls to enable and disable polling must be strictly ordered, which is automatically the case when they're only call from suspend/resume callbacks. Of course, hotplugs can't really be ordered. They could even happen immediately after we called drm_kms_helper_poll_disable() in nouveau_display_fini(), which can lead to all sorts of issues. Additionally; enabling polling /after/ we call drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to probe connectors so long as polling is disabled. So; simply move this back into nouveau_display_init() again. The race condition that both of these patches attempted to work around has already been fixed properly in d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend") Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling") Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2018-09-1232-257/+323
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix up several Kconfig dependencies in netfilter, from Martin Willi and Florian Westphal. 2) Memory leak in be2net driver, from Petr Oros. 3) Memory leak in E-Switch handling of mlx5 driver, from Raed Salem. 4) mlx5_attach_interface needs to check for errors, from Huy Nguyen. 5) tipc_release() needs to orphan the sock, from Cong Wang. 6) Need to program TxConfig register after TX/RX is enabled in r8169 driver, not beforehand, from Maciej S. Szmigiero. 7) Handle 64K PAGE_SIZE properly in ena driver, from Netanel Belgazal. 8) Fix crash regression in ip_do_fragment(), from Taehee Yoo. 9) syzbot can create conditions where kernel log is flooded with synflood warnings due to creation of many listening sockets, fix that. From Willem de Bruijn. 10) Fix RCU issues in rds socket layer, from Cong Wang. 11) Fix vlan matching in nfp driver, from Pieter Jansen van Vuuren. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) nfp: flower: reject tunnel encap with ipv6 outer headers for offloading nfp: flower: fix vlan match by checking both vlan id and vlan pcp tipc: check return value of __tipc_dump_start() s390/qeth: don't dump past end of unknown HW header s390/qeth: use vzalloc for QUERY OAT buffer s390/qeth: switch on SG by default for IQD devices s390/qeth: indicate error when netdev allocation fails rds: fix two RCU related problems r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED erspan: fix error handling for erspan tunnel erspan: return PACKET_REJECT when the appropriate tunnel is not found tcp: rate limit synflood warnings further MIPS: lantiq: dma: add dev pointer netfilter: xt_hashlimit: use s->file instead of s->private netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT netfilter: conntrack: reset tcp maxwin on re-register qmi_wwan: Support dynamic config on Quectel EP06 ethernet: renesas: convert to SPDX identifiers ...
| * | | | nfp: flower: reject tunnel encap with ipv6 outer headers for offloadingLouis Peens2018-09-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug where ipv6 tunnels would report that it is getting offloaded to hardware but would actually be rejected by hardware. Fixes: b27d6a95a70d ("nfp: compile flower vxlan tunnel set actions") Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | nfp: flower: fix vlan match by checking both vlan id and vlan pcpPieter Jansen van Vuuren2018-09-123-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we only checked if the vlan id field is present when trying to match a vlan tag. The vlan id and vlan pcp field should be treated independently. Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | s390/qeth: don't dump past end of unknown HW headerJulian Wiedmann2018-09-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For inbound data with an unsupported HW header format, only dump the actual HW header. We have no idea how much payload follows it, and what it contains. Worst case, we dump past the end of the Inbound Buffer and access whatever is located next in memory. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | s390/qeth: use vzalloc for QUERY OAT bufferWenjia Zhang2018-09-121-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qeth_query_oat_command() currently allocates the kernel buffer for the SIOC_QETH_QUERY_OAT ioctl with kzalloc. So on systems with fragmented memory, large allocations may fail (eg. the qethqoat tool by default uses 132KB). Solve this issue by using vzalloc, backing the allocation with non-contiguous memory. Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | s390/qeth: switch on SG by default for IQD devicesJulian Wiedmann2018-09-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scatter-gather transmit brings a nice performance boost. Considering the rather large MTU sizes at play, it's also totally the Right Thing To Do. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>