summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* drm/shmem-helper: Check for purged buffers in fault handlerNeil Roberts2021-03-111-4/+14
| | | | | | | | | | | | | | When a buffer is madvised as not needed and then purged, any attempts to access the buffer from user-space should cause a bus fault. This patch adds a check for that. Cc: stable@vger.kernel.org Fixes: 17acb9f35ed7 ("drm/shmem: Add madvise state and purge helpers") Signed-off-by: Neil Roberts <nroberts@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210223155125.199577-2-nroberts@igalia.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* qxl: Fix uninitialised struct field head.surface_idColin Ian King2021-03-111-0/+1
| | | | | | | | | | | | | | The surface_id struct field in head is not being initialized and static analysis warns that this is being passed through to dev->monitors_config->heads[i] on an assignment. Clear up this warning by initializing it to zero. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: a6d3c4d79822 ("qxl: hook monitors_config updates into crtc, not encoder.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: http://patchwork.freedesktop.org/patch/msgid/20210304094928.2280722-1-colin.king@canonical.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* drm/ttm: Fix TTM page pool accountingAnthony DeRossi2021-03-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Freed pages are not subtracted from the allocated_pages counter in ttm_pool_type_fini(), causing a leak in the count on device removal. The next shrinker invocation loops forever trying to free pages that are no longer in the pool: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 3-....: (9998 ticks this GP) idle=54e/1/0x4000000000000000 softirq=434857/434857 fqs=2237 (t=10001 jiffies g=2194533 q=49211) NMI backtrace for cpu 3 CPU: 3 PID: 1034 Comm: kswapd0 Tainted: P O 5.11.0-com #1 Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 1405 11/19/2019 Call Trace: <IRQ> ... </IRQ> sysvec_apic_timer_interrupt+0x77/0x80 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:mutex_unlock+0x16/0x20 Code: e7 48 8b 70 10 e8 7a 53 77 ff eb aa e8 43 6c ff ff 0f 1f 00 65 48 8b 14 25 00 6d 01 00 31 c9 48 89 d0 f0 48 0f b1 0f 48 39 c2 <74> 05 e9 e3 fe ff ff c3 66 90 48 8b 47 20 48 85 c0 74 0f 8b 50 10 RSP: 0018:ffffbdb840797be8 EFLAGS: 00000246 RAX: ffff9ff445a41c00 RBX: ffffffffc02a9ef8 RCX: 0000000000000000 RDX: ffff9ff445a41c00 RSI: ffffbdb840797c78 RDI: ffffffffc02a9ac0 RBP: 0000000000000080 R08: 0000000000000000 R09: ffffbdb840797c80 R10: 0000000000000000 R11: fffffffffffffff5 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02a9a60 ttm_pool_shrink+0x7d/0x90 [ttm] ttm_pool_shrinker_scan+0x5/0x20 [ttm] do_shrink_slab+0x13a/0x1a0 ... debugfs shows the incorrect total: $ cat /sys/kernel/debug/dri/0/ttm_page_pool --- 0--- --- 1--- --- 2--- --- 3--- --- 4--- --- 5--- --- 6--- --- 7--- --- 8--- --- 9--- ---10--- wc : 0 0 0 0 0 0 0 0 0 0 0 uc : 0 0 0 0 0 0 0 0 0 0 0 wc 32 : 0 0 0 0 0 0 0 0 0 0 0 uc 32 : 0 0 0 0 0 0 0 0 0 0 0 DMA uc : 0 0 0 0 0 0 0 0 0 0 0 DMA wc : 0 0 0 0 0 0 0 0 0 0 0 DMA : 0 0 0 0 0 0 0 0 0 0 0 total : 3029 of 8244261 Using ttm_pool_type_take() to remove pages from the pool before freeing them correctly accounts for the freed pages. Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3") Signed-off-by: Anthony DeRossi <ajderossi@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210303011723.22512-1-ajderossi@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* drm/ttm: soften TTM warningsChristian König2021-03-111-2/+6
| | | | | | | | | | | | QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files. Make sure we warn only once until the QXL driver is fixed. Signed-off-by: Christian König <christian.koenig@amd.com> References: https://lore.kernel.org/lkml/YD+eYcMMcdlXB8PY@alley/ Link: https://patchwork.freedesktop.org/patch/422834/ Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* drm: Use USB controller's DMA mask when importing dmabufsThomas Zimmermann2021-03-117-8/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | USB devices cannot perform DMA and hence have no dma_mask set in their device structure. Therefore importing dmabuf into a USB-based driver fails, which breaks joining and mirroring of display in X11. For USB devices, pick the associated USB controller as attachment device. This allows the DRM import helpers to perform the DMA setup. If the DMA controller does not support DMA transfers, we're out of luck and cannot import. Our current USB-based DRM drivers don't use DMA, so the actual DMA device is not important. Tested by joining/mirroring displays of udl and radeon under Gnome/X11. v8: * release dmadev if device initialization fails (Noralf) * fix commit description (Noralf) v7: * fix use-before-init bug in gm12u320 (Dan) v6: * implement workaround in DRM drivers and hold reference to DMA device while USB device is in use * remove dev_is_usb() (Greg) * collapse USB helper into usb_intf_get_dma_device() (Alan) * integrate Daniel's TODO statement (Daniel) * fix typos (Greg) v5: * provide a helper for USB interfaces (Alan) * add FIXME item to documentation and TODO list (Daniel) v4: * implement workaround with USB helper functions (Greg) * use struct usb_device->bus->sysdev as DMA device (Takashi) v3: * drop gem_create_object * use DMA mask of USB controller, if any (Daniel, Christian, Noralf) v2: * move fix to importer side (Christian, Daniel) * update SHMEM and CMA helpers for new PRIME callbacks Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 6eb0233ec2d0 ("usb: don't inherity DMA properties for USB devices") Tested-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Noralf Trønnes <noralf@tronnes.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210303133229.3288-1-tzimmermann@suse.de Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* MAINTAINERS: update drm bug reporting URLPavel Turinský2021-03-111-1/+1
| | | | | | | | | | | The original bugzilla seems to be read-only now, linking to the gitlab for new bugs. Signed-off-by: Pavel Turinský <ledoian@kam.mff.cuni.cz> Cc: trivial@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210228163658.54962-1-ledoian@kam.mff.cuni.cz Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* fbdev: atyfb: use LCD management functions for PPC_PMAC alsoRandy Dunlap2021-03-111-4/+5
| | | | | | | | | | | | | | | | | | | | Include PPC_PMAC in the configs that use aty_ld_lcd() and aty_st_lcd() implementations so that the PM code may work correctly for PPC_PMAC. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210226173008.18236-1-rdunlap@infradead.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* fbdev: atyfb: always declare aty_{ld,st}_lcd()Randy Dunlap2021-03-111-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previously added stubs for aty_{ld,}st_lcd() make it so that these functions are used regardless of the config options that were guarding them, so remove the #ifdef/#endif lines and make their declarations always visible. This fixes build warnings that were reported by clang: drivers/video/fbdev/aty/atyfb_base.c:180:6: warning: no previous prototype for function 'aty_st_lcd' [-Wmissing-prototypes] void aty_st_lcd(int index, u32 val, const struct atyfb_par *par) ^ drivers/video/fbdev/aty/atyfb_base.c:180:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void aty_st_lcd(int index, u32 val, const struct atyfb_par *par) drivers/video/fbdev/aty/atyfb_base.c:183:5: warning: no previous prototype for function 'aty_ld_lcd' [-Wmissing-prototypes] u32 aty_ld_lcd(int index, const struct atyfb_par *par) ^ drivers/video/fbdev/aty/atyfb_base.c:183:1: note: declare 'static' if the function is not intended to be used outside of this translation unit u32 aty_ld_lcd(int index, const struct atyfb_par *par) They should not be marked as static since they are used in mach64_ct.c. Fixes: bfa5782b9caa ("fbdev: atyfb: add stubs for aty_{ld,st}_lcd()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210224215528.822-1-rdunlap@infradead.org Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* drm/qxl: fix lockdep issue in qxl_alloc_release_reservedGerd Hoffmann2021-03-111-3/+10
| | | | | | | | | | | | | Call qxl_bo_unpin (which does a reservation) without holding the release_mutex lock. Fixes lockdep (correctly) warning on a possible deadlock. Fixes: e8dd3506dcf3 ("drm/qxl: unpin release objects") Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: http://patchwork.freedesktop.org/patch/msgid/20210217123213.2199186-5-kraxel@redhat.com (cherry picked from commit 19089b760e56c97458c272e90e43da761b05cf12) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* drm/qxl: unpin release objectsGerd Hoffmann2021-03-111-0/+1
| | | | | | | | | | | Balances the qxl_create_bo(..., pinned=true, ...); call in qxl_release_bo_alloc(). Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: http://patchwork.freedesktop.org/patch/msgid/20210204145712.1531203-5-kraxel@redhat.com (cherry picked from commit 65ffea3c6e738f37bb15ff3ee480415c793df893) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* drm/fb-helper: only unmap if buffer not nullTong Zhang2021-03-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drm_fbdev_cleanup() can be called when fb_helper->buffer is null, hence fb_helper->buffer should be checked before calling drm_client_buffer_vunmap(). This buffer is also checked in drm_client_framebuffer_delete(), so we should also do the same thing for drm_client_buffer_vunmap(). [ 199.128742] RIP: 0010:drm_client_buffer_vunmap+0xd/0x20 [ 199.129031] Code: 43 18 48 8b 53 20 49 89 45 00 49 89 55 08 5b 44 89 e0 41 5c 41 5d 41 5e 5d c3 0f 1f 00 53 48 89 fb 48 8d 7f 10 e8 73 7d a1 ff <48> 8b 7b 10 48 8d 73 18 5b e9 75 53 fc ff 0 f 1f 44 00 00 48 b8 00 [ 199.130041] RSP: 0018:ffff888103f3fc88 EFLAGS: 00010282 [ 199.130329] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8214d46d [ 199.130733] RDX: 1ffffffff079c6b9 RSI: 0000000000000246 RDI: ffffffff83ce35c8 [ 199.131119] RBP: ffff888103d25458 R08: 0000000000000001 R09: fffffbfff0791761 [ 199.131505] R10: ffffffff83c8bb07 R11: fffffbfff0791760 R12: 0000000000000000 [ 199.131891] R13: ffff888103d25468 R14: ffff888103d25418 R15: ffff888103f18120 [ 199.132277] FS: 00007f36fdcbb6a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000 [ 199.132721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 199.133033] CR2: 0000000000000010 CR3: 0000000103d26000 CR4: 00000000000006f0 [ 199.133420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 199.133807] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 199.134195] Call Trace: [ 199.134333] drm_fbdev_cleanup+0x179/0x1a0 [ 199.134562] drm_fbdev_client_unregister+0x2b/0x40 [ 199.134828] drm_client_dev_unregister+0xa8/0x180 [ 199.135088] drm_dev_unregister+0x61/0x110 [ 199.135315] mgag200_pci_remove+0x38/0x52 [mgag200] [ 199.135586] pci_device_remove+0x62/0xe0 [ 199.135806] device_release_driver_internal+0x148/0x270 [ 199.136094] driver_detach+0x76/0xe0 [ 199.136294] bus_remove_driver+0x7e/0x100 [ 199.136521] pci_unregister_driver+0x28/0xf0 [ 199.136759] __x64_sys_delete_module+0x268/0x300 [ 199.137016] ? __ia32_sys_delete_module+0x300/0x300 [ 199.137285] ? call_rcu+0x3e4/0x580 [ 199.137481] ? fpregs_assert_state_consistent+0x4d/0x60 [ 199.137767] ? exit_to_user_mode_prepare+0x2f/0x130 [ 199.138037] do_syscall_64+0x33/0x40 [ 199.138237] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 199.138517] RIP: 0033:0x7f36fdc3dcf7 Signed-off-by: Tong Zhang <ztong0001@gmail.com> Fixes: 763aea17bf57 ("drm/fb-helper: Unmap client buffer during shutdown") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.11+ Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210228044625.171151-1-ztong0001@gmail.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
* Linux 5.12-rc2v5.12-rc2Linus Torvalds2021-03-051-1/+1
|
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2021-03-058-62/+76
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Jason Gunthorpe: "Nothing special here, though Bob's regression fixes for rxe would have made it before the rc cycle had there not been such strong winter weather! - Fix corner cases in the rxe reference counting cleanup that are causing regressions in blktests for SRP - Two kdoc fixes so W=1 is clean - Missing error return in error unwind for mlx5 - Wrong lock type nesting in IB CM" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/rxe: Fix errant WARN_ONCE in rxe_completer() RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt() RDMA/rxe: Fix missed IB reference counting in loopback RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc RDMA/mlx5: Set correct kernel-doc identifier IB/mlx5: Add missing error code RDMA/rxe: Fix missing kconfig dependency on CRYPTO RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
| * RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()Bob Pearson2021-03-051-32/+23
| | | | | | | | | | | | | | | | | | | | | | | | In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb which triggered a warning at 'done:' and could possibly at 'exit:'. The WARN_ONCE() calls are not actually needed. The call to free_pkt() is moved to the end to clearly show that all skbs are freed. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()Bob Pearson2021-03-051-24/+35
| | | | | | | | | | | | | | | | | | | | | | rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error occurred causing an underflow on the reference counter. This code is cleaned up to be clearer and easier to read. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/rxe: Fix missed IB reference counting in loopbackBob Pearson2021-03-051-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | When the noted patch below extending the reference taken by rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it was not matched by a reference in the loopback path resulting in underflows. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/uverbs: Fix kernel-doc warning of _uverbs_allocLeon Romanovsky2021-03-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | Fix the following W=1 compilation warning: drivers/infiniband/core/uverbs_ioctl.c:108: warning: expecting prototype for uverbs_alloc(). Prototype was for _uverbs_alloc() instead Fixes: 461bb2eee4e1 ("IB/uverbs: Add a simple allocator to uverbs_attr_bundle") Link: https://lore.kernel.org/r/20210302074214.1054299-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/mlx5: Set correct kernel-doc identifierLeon Romanovsky2021-03-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The W=1 allmodconfig build produces the following warning: drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line: * Parse a series of data segments for page fault handling. Fix it by changing /** to be /* as it is written in kernel-doc documentation. Fixes: 5e769e444d26 ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'") Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * IB/mlx5: Add missing error codeYueHaibing2021-03-011-1/+3
| | | | | | | | | | | | | | | | | | | | Set err to -ENOMEM if kzalloc fails instead of 0. Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/rxe: Fix missing kconfig dependency on CRYPTOJulian Braha2021-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When RDMA_RXE is enabled and CRYPTO is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for CRYPTO_CRC32 Depends on [n]: CRYPTO [=n] Selected by [y]: - RDMA_RXE [=y] && (INFINIBAND_USER_ACCESS [=y] || !INFINIBAND_USER_ACCESS [=y]) && INET [=y] && PCI [=y] && INFINIBAND [=y] && INFINIBAND_VIRT_DMA [=y] This is because RDMA_RXE selects CRYPTO_CRC32, without depending on or selecting CRYPTO, despite that config option being subordinate to CRYPTO. Fixes: cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible") Signed-off-by: Julian Braha <julianbraha@gmail.com> Link: https://lore.kernel.org/r/21525878.NYvzQUHefP@ubuntu-mate-laptop Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_repSaeed Mahameed2021-03-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ib_send_cm_sidr_rep() { spin_lock_irqsave() cm_send_sidr_rep_locked() { ... spin_lock_irq() .... spin_unlock_irq() <--- this will enable interrupts } spin_unlock_irqrestore() } spin_unlock_irqrestore() expects interrupts to be disabled but the internal spin_unlock_irq() will always enable hard interrupts. Fix this by replacing the internal spin_{lock,unlock}_irq() with irqsave/restore variants. It fixes the following kernel trace: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20 Call Trace: _raw_spin_unlock_irqrestore+0x4e/0x50 ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm] cma_send_sidr_rep+0xa1/0x160 [rdma_cm] rdma_accept+0x25e/0x350 [rdma_cm] ucma_accept+0x132/0x1cc [rdma_ucm] ucma_write+0xbf/0x140 [rdma_ucm] vfs_write+0xc1/0x340 ksys_write+0xb3/0xe0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 87c4c774cbef ("RDMA/cm: Protect access to remote_sidr_table") Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* | Merge tag 'gcc-plugins-v5.12-rc2' of ↵Linus Torvalds2021-03-052-3/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc-plugins fixes from Kees Cook: "Tiny gcc-plugin fixes for v5.12-rc2. These issues are small but have been reported a couple times now by static analyzers, so best to get them fixed to reduce the noise. :) - Fix coding style issues (Jason Yan)" * tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: latent_entropy: remove unneeded semicolon gcc-plugins: structleak: remove unneeded variable 'ret'
| * | gcc-plugins: latent_entropy: remove unneeded semicolonJason Yan2021-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following coccicheck warning: scripts/gcc-plugins/latent_entropy_plugin.c:539:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200418070521.10931-1-yanaijie@huawei.com
| * | gcc-plugins: structleak: remove unneeded variable 'ret'Jason Yan2021-03-011-2/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | Fix the following coccicheck warning: scripts/gcc-plugins/structleak_plugin.c:177:14-17: Unneeded variable: "ret". Return "0" on line 207 Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200418070505.10715-1-yanaijie@huawei.com
* | Merge tag 'pstore-v5.12-rc2' of ↵Linus Torvalds2021-03-052-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fixes from Kees Cook: - Rate-limit ECC warnings (Dmitry Osipenko) - Fix error path check for NULL (Tetsuo Handa) * tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Rate-limit "uncorrectable error in header" message pstore: Fix warning in pstore_kill_sb()
| * | pstore/ram: Rate-limit "uncorrectable error in header" messageDmitry Osipenko2021-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a quite huge "uncorrectable error in header" flood in KMSG on a clean system boot since there is no pstore buffer saved in RAM. Let's silence the redundant noisy messages by rate-limiting the printk message. Now there are maximum 10 messages printed repeatedly instead of 35+. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210302095850.30894-1-digetx@gmail.com
| * | pstore: Fix warning in pstore_kill_sb()Tetsuo Handa2021-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the assumption that pstore_sb != NULL is wrong because pstore_fill_super() will not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL (due to memory allocation fault injection). Since mount_single() calls pstore_kill_sb() when pstore_fill_super() failed, pstore_kill_sb() needs to be aware of such failure path. [1] https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a0 Reported-by: syzbot <syzbot+d0cf0ad6513e9a1da5df@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210214031307.57903-1-penguin-kernel@I-love.SAKURA.ne.jp
* | | Merge tag 'for-5.12/dm-fixes' of ↵Linus Torvalds2021-03-052-11/+16
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Fix DM verity target's optional Forward Error Correction (FEC) for Reed-Solomon roots that are unaligned to block size" * tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm verity: fix FEC for RS roots unaligned to block size dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
| * | | dm verity: fix FEC for RS roots unaligned to block sizeMilan Broz2021-03-041-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optional Forward Error Correction (FEC) code in dm-verity uses Reed-Solomon code and should support roots from 2 to 24. The error correction parity bytes (of roots lengths per RS block) are stored on a separate device in sequence without any padding. Currently, to access FEC device, the dm-verity-fec code uses dm-bufio client with block size set to verity data block (usually 4096 or 512 bytes). Because this block size is not divisible by some (most!) of the roots supported lengths, data repair cannot work for partially stored parity bytes. This fix changes FEC device dm-bufio block size to "roots << SECTOR_SHIFT" where we can be sure that the full parity data is always available. (There cannot be partial FEC blocks because parity must cover whole sectors.) Because the optional FEC starting offset could be unaligned to this new block size, we have to use dm_bufio_set_sector_offset() to configure it. The problem is easily reproduced using veritysetup, e.g. for roots=13: # create verity device with RS FEC dd if=/dev/urandom of=data.img bs=4096 count=8 status=none veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=13 | awk '/^Root hash/{ print $3 }' >roothash # create an erasure that should be always repairable with this roots setting dd if=/dev/zero of=data.img conv=notrunc bs=1 count=8 seek=4088 status=none # try to read it through dm-verity veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=13 $(cat roothash) dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer # wait for possible recursive recovery in kernel udevadm settle veritysetup close test With this fix, errors are properly repaired. device-mapper: verity-fec: 7:1: FEC 0: corrected 8 errors ... Without it, FEC code usually ends on unrecoverable failure in RS decoder: device-mapper: verity-fec: 7:1: FEC 0: failed to correct: -74 ... This problem is present in all kernels since the FEC code's introduction (kernel 4.5). It is thought that this problem is not visible in Android ecosystem because it always uses a default RS roots=2. Depends-on: a14e5ec66a7a ("dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size") Signed-off-by: Milan Broz <gmazyland@gmail.com> Tested-by: Jérôme Carretero <cJ-ko@zougloub.eu> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Cc: stable@vger.kernel.org # 4.5+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm bufio: subtract the number of initial sectors in dm_bufio_get_device_sizeMikulas Patocka2021-03-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm_bufio_get_device_size returns the device size in blocks. Before returning the value, we must subtract the nubmer of starting sectors. The number of starting sectors may not be divisible by block size. Note that currently, no target is using dm_bufio_set_sector_offset and dm_bufio_get_device_size simultaneously, so this change has no effect. However, an upcoming dm-verity-fec fix needs this change. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Milan Broz <gmazyland@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | | | Merge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-blockLinus Torvalds2021-03-0516-69/+75
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - NVMe fixes: - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan) - fix a hwmon error return (Daniel Wagner) - fix the keep alive timeout initialization (Martin George) - ensure the model_number can't be changed on a used subsystem (Max Gurtovoy) - rsxx missing -EFAULT on copy_to_user() failure (Dan) - rsxx remove unused linux.h include (Tian) - kill unused RQF_SORTED (Jean) - updated outdated BFQ comments (Joseph) - revert work-around commit for bd_size_lock, since we removed the offending user in this merge window (Damien) * tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block: nvmet: model_number must be immutable once set nvme-fabrics: fix kato initialization nvme-hwmon: Return error code when registration fails nvme-pci: add quirks for Lexar 256GB SSD nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST. rsxx: Return -EFAULT if copy_to_user() fails block/bfq: update comments and default value in docs for fifo_expire rsxx: remove unused including <linux/version.h> block: Drop leftover references to RQF_SORTED block: revert "block: fix bd_size_lock use"
| * \ \ \ Merge tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme into block-5.12Jens Axboe2021-03-057-47/+62
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe fixes from Christoph: "nvme fixes for 5.12: - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan) - fix a hwmon error return (Daniel Wagner) - fix the keep alive timeout initialization (Martin George) - ensure the model_number can't be changed on a used subsystem (Max Gurtovoy)" * tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme: nvmet: model_number must be immutable once set nvme-fabrics: fix kato initialization nvme-hwmon: Return error code when registration fails nvme-pci: add quirks for Lexar 256GB SSD nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
| | * | | | nvmet: model_number must be immutable once setMax Gurtovoy2021-03-054-45/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case we have already established connection to nvmf target, it shouldn't be allowed to change the model_number. E.g. if someone will identify ctrl and get model_number of "my_model" later on will change the model_numbel via configfs to "my_new_model" this will break the NVMe specification for "Get Log Page – Persistent Event Log" that refers to Model Number as: "This field contains the same value as reported in the Model Number field of the Identify Controller data structure, bytes 63:24." Although it doesn't mentioned explicitly that this field can't be changed, we can assume it. So allow setting this field only once: using configfs or in the first identify ctrl operation. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | nvme-fabrics: fix kato initializationMartin George2021-03-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently kato is initialized to NVME_DEFAULT_KATO for both discovery & i/o controllers. This is a problem specifically for non-persistent discovery controllers since it always ends up with a non-zero kato value. Fix this by initializing kato to zero instead, and ensuring various controllers are assigned appropriate kato values as follows: non-persistent controllers - kato set to zero persistent controllers - kato set to NVMF_DEV_DISC_TMO (or any positive int via nvme-cli) i/o controllers - kato set to NVME_DEFAULT_KATO (or any positive int via nvme-cli) Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | nvme-hwmon: Return error code when registration failsDaniel Wagner2021-03-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hwmon pointer wont be NULL if the registration fails. Though the exit code path will assign it to ctrl->hwmon_device. Later nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by returning the error code from hwmon_device_register_with_info(). Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation") Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | nvme-pci: add quirks for Lexar 256GB SSDPascal Terjan2021-03-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN quirks for this buggy device. Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417 Signed-off-by: Pascal Terjan <pterjan@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | nvme-pci: mark Kingston SKC2000 as not supporting the deepest power stateZoltán Böszörményi2021-03-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My 2TB SKC2000 showed the exact same symptoms that were provided in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed cold boot to get it back. According to some sources, the A2000 is simply a rebadged SKC2000 with a slightly optimized firmware. Adding the SKC2000 PCI ID to the quirk list with the same workaround as the A2000 made my laptop survive a 5 hours long Yocto bootstrap buildfest which reliably triggered the SSD lockup previously. Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.Julian Einwag2021-03-051-1/+2
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel fails to fully detect these SSDs, only the character devices are present: [ 10.785605] nvme nvme0: pci function 0000:04:00.0 [ 10.876787] nvme nvme1: pci function 0000:81:00.0 [ 13.198614] nvme nvme0: missing or invalid SUBNQN field. [ 13.198658] nvme nvme1: missing or invalid SUBNQN field. [ 13.206896] nvme nvme0: Shutdown timeout set to 20 seconds [ 13.215035] nvme nvme1: Shutdown timeout set to 20 seconds [ 13.225407] nvme nvme0: 16/0/0 default/read/poll queues [ 13.233602] nvme nvme1: 16/0/0 default/read/poll queues [ 13.239627] nvme nvme0: Identify Descriptors failed (8194) [ 13.246315] nvme nvme1: Identify Descriptors failed (8194) Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679 Signed-off-by: Julian Einwag <jeinwag-nvme@marcapo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
| * | | | rsxx: Return -EFAULT if copy_to_user() failsDan Carpenter2021-03-031-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT to the user if it can't complete the copy. The "st" variable only holds zero on success or negative error codes on failure so the type should be int. Fixes: 36f988e978f8 ("rsxx: Adding in debugfs entries.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block/bfq: update comments and default value in docs for fifo_expireJoseph Qi2021-03-022-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct the comments since bfq_fifo_expire[0] is for async request, while bfq_fifo_expire[1] is for sync request. Also update docs, according the source code, the default fifo_expire_async is 250ms, and fifo_expire_sync is 125ms. Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | rsxx: remove unused including <linux/version.h>Tian Tao2021-03-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove including <linux/version.h> that don't need it. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: Drop leftover references to RQF_SORTEDJean Delvare2021-03-013-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a1ce35fa49852db60fc6e268038530be533c5b15 ("block: remove dead elevator code") removed all users of RQF_SORTED. However it is still defined, and there is one reference left to it (which in effect is dead code). Clear it all up. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: revert "block: fix bd_size_lock use"Damien Le Moal2021-02-282-7/+4
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the removal of the skd driver, using IRQ safe locking of a bdev bd_size_lock spinlock to protect the bdev inode size is not necessary anymore as there is no other known driver using this lock under an IRQ disabled context (e.g. calling set_capacity() with IRQ disabled). Revert commit 0fe37724f8e7 ("block: fix bd_size_lock use") which introduced the IRQ safe change. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-blockLinus Torvalds2021-03-056-439/+361
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "A bit of a mix between fallout from the worker change, cleanups and reductions now possible from that change, and fixes in general. In detail: - Fully serialize manager and worker creation, fixing races due to that. - Clean up some naming that had gone stale. - SQPOLL fixes. - Fix race condition around task_work rework that went into this merge window. - Implement unshare. Used for when the original task does unshare(2) or setuid/seteuid and friends, drops the original workers and forks new ones. - Drop the only remaining piece of state shuffling we had left, which was cred. Move it into issue instead, and we can drop all of that code too. - Kill f_op->flush() usage. That was such a nasty hack that we had out of necessity, we no longer need it. - Following from ->flush() removal, we can also drop various bits of ctx state related to SQPOLL and cancelations. - Fix an issue with IOPOLL retry, which originally was fallout from a filemap change (removing iov_iter_revert()), but uncovered an issue with iovec re-import too late. - Fix an issue with system suspend. - Use xchg() for fallback work, instead of cmpxchg(). - Properly destroy io-wq on exec. - Add create_io_thread() core helper, and use that in io-wq and io_uring. This allows us to remove various silly completion events related to thread setup. - A few error handling fixes. This should be the grunt of fixes necessary for the new workers, next week should be quieter. We've got a pending series from Pavel on cancelations, and how tasks and rings are indexed. Outside of that, should just be minor fixes. Even with these fixes, we're still killing a net ~80 lines" * tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits) io_uring: don't restrict issue_flags for io_openat io_uring: make SQPOLL thread parking saner io-wq: kill hashed waitqueue before manager exits io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return io_uring: don't keep looping for more events if we can't flush overflow io_uring: move to using create_io_thread() kernel: provide create_io_thread() helper io_uring: reliably cancel linked timeouts io_uring: cancel-match based on flags io-wq: ensure all pending work is canceled on exit io_uring: ensure that threads freeze on suspend io_uring: remove extra in_idle wake up io_uring: inline __io_queue_async_work() io_uring: inline io_req_clean_work() io_uring: choose right tctx->io_wq for try cancel io_uring: fix -EAGAIN retry with IOPOLL io-wq: fix error path leak of buffered write hash map io_uring: remove sqo_task io_uring: kill sqo_dead and sqo submission halting io_uring: ignore double poll add on the same waitqueue head ...
| * | | | io_uring: don't restrict issue_flags for io_openatPavel Begunkov2021-03-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 45d189c606292 ("io_uring: replace force_nonblock with flags") did something strange for io_openat() slicing all issue_flags but IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: make SQPOLL thread parking sanerJens Axboe2021-03-051-35/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have this weird true/false return from parking, and then some of the callers decide to look at that. It can lead to unbalanced parks and sqd locking. Have the callers check the thread status once it's parked. We know we have the lock at that point, so it's either valid or it's NULL. Fix race with parking on thread exit. We need to be careful here with ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit. Rename sqd->completion to sqd->parked to reflect that this is the only thing this completion event doesn. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io-wq: kill hashed waitqueue before manager exitsJens Axboe2021-03-051-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we race with shutting down the io-wq context and someone queueing a hashed entry, then we can exit the manager with it armed. If it then triggers after the manager has exited, we can have a use-after-free where io_wqe_hash_wake() attempts to wake a now gone manager process. Move the killing of the hashed write queue into the manager itself, so that we know we've killed it before the task exits. Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED returnJens Axboe2021-03-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callback can only be armed, if we get -EIOCBQUEUED returned. It's important that we clear the WAITQ bit for other cases, otherwise we can queue for async retry and filemap will assume that we're armed and return -EAGAIN instead of just blocking for the IO. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: don't keep looping for more events if we can't flush overflowJens Axboe2021-03-051-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It doesn't make sense to wait for more events to come in, if we can't even flush the overflow we already have to the ring. Return -EBUSY for that condition, just like we do for attempts to submit with overflow pending. Cc: stable@vger.kernel.org # 5.11 Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: move to using create_io_thread()Jens Axboe2021-03-053-109/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows us to do task creation and setup without needing to use completions to try and synchronize with the starting thread. Get rid of the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup completion events - we can now do setup before the task is running. Signed-off-by: Jens Axboe <axboe@kernel.dk>