summaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* habanalabs: fix endianness handling for packets from userBen Segal2019-09-062-13/+32
| | | | | | | | | | | | | | | | [ Upstream commit 213ad5ad016a0da975b35f54e8cd236c3b04724b ] Packets that arrive from the user and need to be parsed by the driver are assumed to be in LE format. This patch fix all the places where the code handles these packets and use the correct endianness macros to handle them, as the driver handles the packets in CPU format (LE or BE depending on the arch). Signed-off-by: Ben Segal <bpsegal20@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* habanalabs: fix DRAM usage accounting on context tear downTomer Tayar2019-09-061-0/+2
| | | | | | | | | | | | | [ Upstream commit c8113756ba27298d6e95403c087dc5881b419a99 ] The patch fix the DRAM usage accounting by adding a missing update of the DRAM memory consumption, when a context is being torn down without an organized release of the allocated memory. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_altBenjamin Herrenschmidt2019-09-061-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4a56a478a525d6427be90753451c40e1327caa1a ] If fsg_disable() and fsg_set_alt() are called too closely to each other (for example due to a quick reset/reconnect), what can happen is that fsg_set_alt sets common->new_fsg from an interrupt while handle_exception is trying to process the config change caused by fsg_disable(): fsg_disable() ... handle_exception() sets state back to FSG_STATE_NORMAL hasn't yet called do_set_interface() or is inside it. ---> interrupt fsg_set_alt sets common->new_fsg queues a new FSG_STATE_CONFIG_CHANGE <--- Now, the first handle_exception can "see" the updated new_fsg, treats it as if it was a fsg_set_alt() response, call usb_composite_setup_continue() etc... But then, the thread sees the second FSG_STATE_CONFIG_CHANGE, and goes back down the same path, wipes and reattaches a now active fsg, and .. calls usb_composite_setup_continue() which at this point is wrong. Not only we get a backtrace, but I suspect the second set_interface wrecks some state causing the host to get upset in my case. This fixes it by replacing "new_fsg" by a "state argument" (same principle) which is set in the same lock section as the state update, and retrieved similarly. That way, there is never any discrepancy between the dequeued state and the observed value of it. We keep the ability to have the latest reconfig operation take precedence, but we guarantee that once "dequeued" the argument (new_fsg) will not be clobbered by any new event. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* usb: gadget: composite: Clear "suspended" on reset/disconnectBenjamin Herrenschmidt2019-09-061-0/+1
| | | | | | | | | | | | [ Upstream commit 602fda17c7356bb7ae98467d93549057481d11dd ] In some cases, one can get out of suspend with a reset or a disconnect followed by a reconnect. Previously we would leave a stale suspended flag set. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu/dma: Handle SG length overflow betterRobin Murphy2019-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ab2cbeb0ed301a9f0460078e91b09f39958212ef ] Since scatterlist dimensions are all unsigned ints, in the relatively rare cases where a device's max_segment_size is set to UINT_MAX, then the "cur_len + s_length <= max_len" check in __finalise_sg() will always return true. As a result, the corner case of such a device mapping an excessively large scatterlist which is mergeable to or beyond a total length of 4GB can lead to overflow and a bogus truncated dma_length in the resulting segment. As we already assume that any single segment must be no longer than max_len to begin with, this can easily be addressed by reshuffling the comparison. Fixes: 809eac54cdd6 ("iommu/dma: Implement scatterlist segment merging") Reported-by: Nicolin Chen <nicoleotsuka@gmail.com> Tested-by: Nicolin Chen <nicoleotsuka@gmail.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* omap-dma/omap_vout_vrfb: fix off-by-one fi valueHans Verkuil2019-09-062-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d555c34338cae844b207564c482e5a3fb089d25e ] The OMAP 4 TRM specifies that when using double-index addressing the address increases by the ES plus the EI value minus 1 within a frame. When a full frame is transferred, the address increases by the ES plus the frame index (FI) value minus 1. The omap-dma code didn't account for the 'minus 1' in the FI register. To get correct addressing, add 1 to the src_icg value. This was found when testing a hacked version of the media m2m-deinterlace.c driver on a Pandaboard. The only other source that uses this feature is omap_vout_vrfb.c, and that adds a + 1 when setting the dst_icg. This is a workaround for the broken omap-dma.c behavior. So remove the workaround at the same time that we fix omap-dma.c. I tested the omap_vout driver with a Beagle XM board to check that the '+ 1' in omap_vout_vrfb.c was indeed a workaround for the omap-dma bug. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Link: https://lore.kernel.org/r/952e7f51-f208-9333-6f58-b7ed20d2ea0b@xs4all.nl Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* dmaengine: stm32-mdma: Fix a possible null-pointer dereference in ↵Jia-Ju Bai2019-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | stm32_mdma_irq_handler() [ Upstream commit 39c71a5b8212f4b502d9a630c6706ac723abd422 ] In stm32_mdma_irq_handler(), chan is checked on line 1368. When chan is NULL, it is still used on line 1369: dev_err(chan2dev(chan), "MDMA channel not initialized\n"); Thus, a possible null-pointer dereference may occur. To fix this bug, "dev_dbg(mdma2dev(dmadev), ...)" is used instead. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Fixes: a4ffb13c8946 ("dmaengine: Add STM32 MDMA driver") Link: https://lore.kernel.org/r/20190729020849.17971-1-baijiaju1990@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* IB/mlx5: Fix implicit MR release flowYishai Hadas2019-09-062-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f591822c3cf314442819486f45ff7dc1f690e0c0 ] Once implicit MR is being called to be released by ib_umem_notifier_release() its leaves were marked as "dying". However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is called, it skips running the mr_leaf_free_action (i.e. umem_odp->work) when those leaves were marked as "dying". As such ib_umem_release() for the leaves won't be called and their MRs will be leaked as well. When an application exits/killed without calling dereg_mr we might hit the above flow. This fatal scenario is reported by WARN_ON() upon mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the call trace can be seen below. Originally the "dying" mark as part of ib_umem_notifier_release() was introduced to prevent pagefault_mr() from returning a success response once this happened. However, we already have today the completion mechanism so no need for that in those flows any more. Even in case a success response will be returned the firmware will not find the pages and an error will be returned in the following call as a released mm will cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero(). Fix the above issue by dropping the "dying" from the above flows. The other flows that are using "dying" are still needed it for their synchronization purposes. WARNING: CPU: 1 PID: 7218 at drivers/infiniband/hw/mlx5/main.c:2004 mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib] CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G E 5.2.0-rc6+ #13 Call Trace: uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs] ib_uverbs_close+0x1f/0x80 [ib_uverbs] __fput+0xbe/0x250 task_work_run+0x88/0xa0 do_exit+0x2cb/0xc30 ? __fput+0x14b/0x250 do_group_exit+0x39/0xb0 get_signal+0x191/0x920 ? _raw_spin_unlock_bh+0xa/0x20 ? inet_csk_accept+0x229/0x2f0 do_signal+0x36/0x5e0 ? put_unused_fd+0x5b/0x70 ? __sys_accept4+0x1a6/0x1e0 ? inet_hash+0x35/0x40 ? release_sock+0x43/0x90 ? _raw_spin_unlock_bh+0xa/0x20 ? inet_listen+0x9f/0x120 exit_to_usermode_loop+0x5c/0xc6 do_syscall_64+0x182/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* auxdisplay: panel: need to delete scan_timer when misc_register fails in ↵zhengbin2019-09-061-0/+2
| | | | | | | | | | | | | | panel_attach [ Upstream commit b33d567560c1aadf3033290d74d4fd67af47aa61 ] In panel_attach, if misc_register fails, we need to delete scan_timer, which was setup in keypad_init->init_scan_timer. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* soundwire: cadence_master: fix definitions for INTSTAT0/1Pierre-Louis Bossart2019-09-061-2/+2
| | | | | | | | | | | | [ Upstream commit 664b16589f882202b8fa8149d0074f3159bade76 ] Two off-by-one errors: INTSTAT0 missed BIT(31) and INTSTAT1 is only defined on first 16 bits. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190725234032.21152-15-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* soundwire: cadence_master: fix register definition for SLAVE_STATEPierre-Louis Bossart2019-09-061-2/+2
| | | | | | | | | | | [ Upstream commit b07dd9b400981f487940a4d84292d3a0e7cd9362 ] wrong prefix and wrong macro. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190725234032.21152-14-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme-pci: Fix async probe remove raceKeith Busch2019-09-061-1/+2
| | | | | | | | | | | | | | | [ Upstream commit bd46a90634302bfe791e93ad5496f98f165f7ae0 ] Ensure the controller is not in the NEW state when nvme_probe() exits. This will always allow a subsequent nvme_remove() to set the state to DELETING, fixing a potential race between the initial asynchronous probe and device removal. Reported-by: Li Zhong <lizhongfs@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: fix controller removal race with scan workSagi Grimberg2019-09-063-8/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0157ec8dad3c8fc9bc9790f76e0831ffdaf2e7f0 ] With multipath enabled, nvme_scan_work() can read from the device (through nvme_mpath_add_disk()) and hang [1]. However, with fabrics, once ctrl->state is set to NVME_CTRL_DELETING, the reads will hang (see nvmf_check_ready()) and the mpath stack device make_request will block if head->list is not empty. However, when the head->list consistst of only DELETING/DEAD controllers, we should actually not block, but rather fail immediately. In addition, before we go ahead and remove the namespaces, make sure to clear the current path and kick the requeue list so that the request will fast fail upon requeuing. [1]: -- INFO: task kworker/u4:3:166 blocked for more than 120 seconds. Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u4:3 D 0 166 2 0x80004000 Workqueue: nvme-wq nvme_scan_work Call Trace: __schedule+0x851/0x1400 schedule+0x99/0x210 io_schedule+0x21/0x70 do_read_cache_page+0xa57/0x1330 read_cache_page+0x4a/0x70 read_dev_sector+0xbf/0x380 amiga_partition+0xc4/0x1230 check_partition+0x30f/0x630 rescan_partitions+0x19a/0x980 __blkdev_get+0x85a/0x12f0 blkdev_get+0x2a5/0x790 __device_add_disk+0xe25/0x1250 device_add_disk+0x13/0x20 nvme_mpath_set_live+0x172/0x2b0 nvme_update_ns_ana_state+0x130/0x180 nvme_set_ns_ana_state+0x9a/0xb0 nvme_parse_ana_log+0x1c3/0x4a0 nvme_mpath_add_disk+0x157/0x290 nvme_validate_ns+0x1017/0x1bd0 nvme_scan_work+0x44d/0x6a0 process_one_work+0x7d7/0x1240 worker_thread+0x8e/0xff0 kthread+0x2c3/0x3b0 ret_from_fork+0x35/0x40 INFO: task kworker/u4:1:1034 blocked for more than 120 seconds. Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u4:1 D 0 1034 2 0x80004000 Workqueue: nvme-delete-wq nvme_delete_ctrl_work Call Trace: __schedule+0x851/0x1400 schedule+0x99/0x210 schedule_timeout+0x390/0x830 wait_for_completion+0x1a7/0x310 __flush_work+0x241/0x5d0 flush_work+0x10/0x20 nvme_remove_namespaces+0x85/0x3d0 nvme_do_delete_ctrl+0xb4/0x1e0 nvme_delete_ctrl_work+0x15/0x20 process_one_work+0x7d7/0x1240 worker_thread+0x8e/0xff0 kthread+0x2c3/0x3b0 ret_from_fork+0x35/0x40 -- Reported-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme-rdma: fix possible use-after-free in connect error flowSagi Grimberg2019-09-061-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d94211b8bad3787e0655a67284105f57db728cb1 ] When start_queue fails, we need to make sure to drain the queue cq before freeing the rdma resources because we might still race with the completion path. Have start_queue() error path safely stop the queue. -- [30371.808111] nvme nvme1: Failed reconnect attempt 11 [30371.808113] nvme nvme1: Reconnecting in 10 seconds... [...] [30382.069315] nvme nvme1: creating 4 I/O queues. [30382.257058] nvme nvme1: Connect Invalid SQE Parameter, qid 4 [30382.257061] nvme nvme1: failed to connect queue: 4 ret=386 [30382.305001] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [30382.305022] IP: qedr_poll_cq+0x8a3/0x1170 [qedr] [30382.305028] PGD 0 P4D 0 [30382.305037] Oops: 0000 [#1] SMP PTI [...] [30382.305153] Call Trace: [30382.305166] ? __switch_to_asm+0x34/0x70 [30382.305187] __ib_process_cq+0x56/0xd0 [ib_core] [30382.305201] ib_poll_handler+0x26/0x70 [ib_core] [30382.305213] irq_poll_softirq+0x88/0x110 [30382.305223] ? sort_range+0x20/0x20 [30382.305232] __do_softirq+0xde/0x2c6 [30382.305241] ? sort_range+0x20/0x20 [30382.305249] run_ksoftirqd+0x1c/0x60 [30382.305258] smpboot_thread_fn+0xef/0x160 [30382.305265] kthread+0x113/0x130 [30382.305273] ? kthread_create_worker_on_cpu+0x50/0x50 [30382.305281] ret_from_fork+0x35/0x40 -- Reported-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: fix a possible deadlock when passthru commands sent to a multipath deviceSagi Grimberg2019-09-063-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b9156daeb1601d69007b7e50efcf89d69d72ec1d ] When the user issues a command with side effects, we will end up freezing the namespace request queue when updating disk info (and the same for the corresponding mpath disk node). However, we are not freezing the mpath node request queue, which means that mpath I/O can still come in and block on blk_queue_enter (called from nvme_ns_head_make_request -> direct_make_request). This is a deadlock, because blk_queue_enter will block until the inner namespace request queue is unfroze, but that process is blocked because the namespace revalidation is trying to update the mpath disk info and freeze its request queue (which will never complete because of the I/O that is blocked on blk_queue_enter). Fix this by freezing all the subsystem nsheads request queues before executing the passthru command. Given that these commands are infrequent we should not worry about this temporary I/O freeze to keep things sane. Here is the matching hang traces: -- [ 374.465002] INFO: task systemd-udevd:17994 blocked for more than 122 seconds. [ 374.472975] Not tainted 5.2.0-rc3-mpdebug+ #42 [ 374.478522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 374.487274] systemd-udevd D 0 17994 1 0x00000000 [ 374.493407] Call Trace: [ 374.496145] __schedule+0x2ef/0x620 [ 374.500047] schedule+0x38/0xa0 [ 374.503569] blk_queue_enter+0x139/0x220 [ 374.507959] ? remove_wait_queue+0x60/0x60 [ 374.512540] direct_make_request+0x60/0x130 [ 374.517219] nvme_ns_head_make_request+0x11d/0x420 [nvme_core] [ 374.523740] ? generic_make_request_checks+0x307/0x6f0 [ 374.529484] generic_make_request+0x10d/0x2e0 [ 374.534356] submit_bio+0x75/0x140 [ 374.538163] ? guard_bio_eod+0x32/0xe0 [ 374.542361] submit_bh_wbc+0x171/0x1b0 [ 374.546553] block_read_full_page+0x1ed/0x330 [ 374.551426] ? check_disk_change+0x70/0x70 [ 374.556008] ? scan_shadow_nodes+0x30/0x30 [ 374.560588] blkdev_readpage+0x18/0x20 [ 374.564783] do_read_cache_page+0x301/0x860 [ 374.569463] ? blkdev_writepages+0x10/0x10 [ 374.574037] ? prep_new_page+0x88/0x130 [ 374.578329] ? get_page_from_freelist+0xa2f/0x1280 [ 374.583688] ? __alloc_pages_nodemask+0x179/0x320 [ 374.588947] read_cache_page+0x12/0x20 [ 374.593142] read_dev_sector+0x2d/0xd0 [ 374.597337] read_lba+0x104/0x1f0 [ 374.601046] find_valid_gpt+0xfa/0x720 [ 374.605243] ? string_nocheck+0x58/0x70 [ 374.609534] ? find_valid_gpt+0x720/0x720 [ 374.614016] efi_partition+0x89/0x430 [ 374.618113] ? string+0x48/0x60 [ 374.621632] ? snprintf+0x49/0x70 [ 374.625339] ? find_valid_gpt+0x720/0x720 [ 374.629828] check_partition+0x116/0x210 [ 374.634214] rescan_partitions+0xb6/0x360 [ 374.638699] __blkdev_reread_part+0x64/0x70 [ 374.643377] blkdev_reread_part+0x23/0x40 [ 374.647860] blkdev_ioctl+0x48c/0x990 [ 374.651956] block_ioctl+0x41/0x50 [ 374.655766] do_vfs_ioctl+0xa7/0x600 [ 374.659766] ? locks_lock_inode_wait+0xb1/0x150 [ 374.664832] ksys_ioctl+0x67/0x90 [ 374.668539] __x64_sys_ioctl+0x1a/0x20 [ 374.672732] do_syscall_64+0x5a/0x1c0 [ 374.676828] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 374.738474] INFO: task nvmeadm:49141 blocked for more than 123 seconds. [ 374.745871] Not tainted 5.2.0-rc3-mpdebug+ #42 [ 374.751419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 374.760170] nvmeadm D 0 49141 36333 0x00004080 [ 374.766301] Call Trace: [ 374.769038] __schedule+0x2ef/0x620 [ 374.772939] schedule+0x38/0xa0 [ 374.776452] blk_mq_freeze_queue_wait+0x59/0x100 [ 374.781614] ? remove_wait_queue+0x60/0x60 [ 374.786192] blk_mq_freeze_queue+0x1a/0x20 [ 374.790773] nvme_update_disk_info.isra.57+0x5f/0x350 [nvme_core] [ 374.797582] ? nvme_identify_ns.isra.50+0x71/0xc0 [nvme_core] [ 374.804006] __nvme_revalidate_disk+0xe5/0x110 [nvme_core] [ 374.810139] nvme_revalidate_disk+0xa6/0x120 [nvme_core] [ 374.816078] ? nvme_submit_user_cmd+0x11e/0x320 [nvme_core] [ 374.822299] nvme_user_cmd+0x264/0x370 [nvme_core] [ 374.827661] nvme_dev_ioctl+0x112/0x1d0 [nvme_core] [ 374.833114] do_vfs_ioctl+0xa7/0x600 [ 374.837117] ? __audit_syscall_entry+0xdd/0x130 [ 374.842184] ksys_ioctl+0x67/0x90 [ 374.845891] __x64_sys_ioctl+0x1a/0x20 [ 374.850082] do_syscall_64+0x5a/0x1c0 [ 374.854178] entry_SYSCALL_64_after_hwframe+0x44/0xa9 -- Reported-by: James Puthukattukaran <james.puthukattukaran@oracle.com> Tested-by: James Puthukattukaran <james.puthukattukaran@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme-core: Fix extra device_put() call on error pathLogan Gunthorpe2019-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8c36e66fb407ce076535a7db98ab9f6d720b866a ] In the error path for nvme_init_subsystem(), nvme_put_subsystem() will call device_put(), but it will get called again after the mutex_unlock(). The device_put() only needs to be called if device_add() fails. This bug caused a KASAN use-after-free error when adding and removing subsytems in a loop: BUG: KASAN: use-after-free in device_del+0x8d9/0x9a0 Read of size 8 at addr ffff8883cdaf7120 by task multipathd/329 CPU: 0 PID: 329 Comm: multipathd Not tainted 5.2.0-rc6-vmlocalyes-00019-g70a2b39005fd-dirty #314 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x7b/0xb5 print_address_description+0x6f/0x280 ? device_del+0x8d9/0x9a0 __kasan_report+0x148/0x199 ? device_del+0x8d9/0x9a0 ? class_release+0x100/0x130 ? device_del+0x8d9/0x9a0 kasan_report+0x12/0x20 __asan_report_load8_noabort+0x14/0x20 device_del+0x8d9/0x9a0 ? device_platform_notify+0x70/0x70 nvme_destroy_subsystem+0xf9/0x150 nvme_free_ctrl+0x280/0x3a0 device_release+0x72/0x1d0 kobject_put+0x144/0x410 put_device+0x13/0x20 nvme_free_ns+0xc4/0x100 nvme_release+0xb3/0xe0 __blkdev_put+0x549/0x6e0 ? kasan_check_write+0x14/0x20 ? bd_set_size+0xb0/0xb0 ? kasan_check_write+0x14/0x20 ? mutex_lock+0x8f/0xe0 ? __mutex_lock_slowpath+0x20/0x20 ? locks_remove_file+0x239/0x370 blkdev_put+0x72/0x2c0 blkdev_close+0x8d/0xd0 __fput+0x256/0x770 ? _raw_read_lock_irq+0x40/0x40 ____fput+0xe/0x10 task_work_run+0x10c/0x180 ? filp_close+0xf7/0x140 exit_to_usermode_loop+0x151/0x170 do_syscall_64+0x240/0x2e0 ? prepare_exit_to_usermode+0xd5/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f5a79af05d7 Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 06 fc ff ff 8b 44 24 RSP: 002b:00007f5a7799c810 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007f5a79af05d7 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008 RBP: 00007f5a58000f98 R08: 0000000000000002 R09: 00007f5a7935ee80 R10: 0000000000000000 R11: 0000000000000293 R12: 000055e432447240 R13: 0000000000000000 R14: 0000000000000001 R15: 000055e4324a9cf0 Allocated by task 1236: save_stack+0x21/0x80 __kasan_kmalloc.constprop.6+0xab/0xe0 kasan_kmalloc+0x9/0x10 kmem_cache_alloc_trace+0x102/0x210 nvme_init_identify+0x13c3/0x3820 nvme_loop_configure_admin_queue+0x4fa/0x5e0 nvme_loop_create_ctrl+0x469/0xf40 nvmf_dev_write+0x19a3/0x21ab __vfs_write+0x66/0x120 vfs_write+0x154/0x490 ksys_write+0x104/0x240 __x64_sys_write+0x73/0xb0 do_syscall_64+0xa5/0x2e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 329: save_stack+0x21/0x80 __kasan_slab_free+0x129/0x190 kasan_slab_free+0xe/0x10 kfree+0xa7/0x200 nvme_release_subsystem+0x49/0x60 device_release+0x72/0x1d0 kobject_put+0x144/0x410 put_device+0x13/0x20 klist_class_dev_put+0x31/0x40 klist_put+0x8f/0xf0 klist_del+0xe/0x10 device_del+0x3a7/0x9a0 nvme_destroy_subsystem+0xf9/0x150 nvme_free_ctrl+0x280/0x3a0 device_release+0x72/0x1d0 kobject_put+0x144/0x410 put_device+0x13/0x20 nvme_free_ns+0xc4/0x100 nvme_release+0xb3/0xe0 __blkdev_put+0x549/0x6e0 blkdev_put+0x72/0x2c0 blkdev_close+0x8d/0xd0 __fput+0x256/0x770 ____fput+0xe/0x10 task_work_run+0x10c/0x180 exit_to_usermode_loop+0x151/0x170 do_syscall_64+0x240/0x2e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 32fd90c40768 ("nvme: change locking for the per-subsystem controller list") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvmet-file: fix nvmet_file_flush() always returning an errorLogan Gunthorpe2019-09-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cfc1a1af56200362d1508b82b9a3cc3acb2eae0c ] Presently, nvmet_file_flush() always returns a call to errno_to_nvme_status() but that helper doesn't take into account the case when errno=0. So nvmet_file_flush() always returns an error code. All other callers of errno_to_nvme_status() check for success before calling it. To fix this, ensure errno_to_nvme_status() returns success if the errno is zero. This should prevent future mistakes like this from happening. Fixes: c6aa3542e010 ("nvmet: add error log support for file backend") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvmet-loop: Flush nvme_delete_wq when removing the portLogan Gunthorpe2019-09-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 86b9a63e595ff03f9d0a7b92b6acc231fecefc29 ] After calling nvme_loop_delete_ctrl(), the controllers will not yet be deleted because nvme_delete_ctrl() only schedules work to do the delete. This means a race can occur if a port is removed but there are still active controllers trying to access that memory. To fix this, flush the nvme_delete_wq before returning from nvme_loop_remove_port() so that any controllers that might be in the process of being deleted won't access a freed port. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvmet: Fix use-after-free bug when a port is removedLogan Gunthorpe2019-09-063-0/+16
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 3aed86731ee2b23e4dc4d2c6d943d33992cd551b ] When a port is removed through configfs, any connected controllers are still active and can still send commands. This causes a use-after-free bug which is detected by KASAN for any admin command that dereferences req->port (like in nvmet_execute_identify_ctrl). To fix this, disconnect all active controllers when a subsystem is removed from a port. This ensures there are no active controllers when the port is eventually removed. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_nsAnthony Iliopoulos2019-09-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit fab7772bfbcfe8fb8e3e352a6a8fcaf044cded17 ] When CONFIG_NVME_MULTIPATH is set, only the hidden gendisk associated with the per-controller ns is run through revalidate_disk when a rescan is triggered, while the visible blockdev never gets its size (bdev->bd_inode->i_size) updated to reflect any capacity changes that may have occurred. This prevents online resizing of nvme block devices and in extension of any filesystems atop that will are unable to expand while mounted, as userspace relies on the blockdev size for obtaining the disk capacity (via BLKGETSIZE/64 ioctls). Fix this by explicitly revalidating the actual namespace gendisk in addition to the per-controller gendisk, when multipath is enabled. Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* dmaengine: ste_dma40: fix unneeded variable warningArnd Bergmann2019-09-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5d6fb560729a5d5554e23db8d00eb57cd0021083 ] clang-9 points out that there are two variables that depending on the configuration may only be used in an ARRAY_SIZE() expression but not referenced: drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] static u32 d40_backup_regs[] = { ^ drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] static u32 d40_backup_regs_chan[] = { Mark these __maybe_unused to shut up the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20190712091357.744515-1-arnd@arndb.de Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* dm zoned: fix potential NULL dereference in dmz_do_reclaim()Dan Carpenter2019-08-291-2/+2
| | | | | | | | | | | | | | [ Upstream commit e0702d90b79d430b0ccc276ead4f88440bb51352 ] This function is supposed to return error pointers so it matches the dmz_get_rnd_zone_for_reclaim() function. The current code could lead to a NULL dereference in dmz_do_reclaim() Fixes: b234c6d7a703 ("dm zoned: improve error handling in reclaim") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* IB/hfi1: Drop stale TID RDMA packetsKaike Wan2019-08-291-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | commit d58c1834bf0d218a0bc00f8fb44874551b21da84 upstream. In a congested fabric with adaptive routing enabled, traces show that the sender could receive stale TID RDMA NAK packets that contain newer KDETH PSNs and older Verbs PSNs. If not dropped, these packets could cause the incorrect rewinding of the software flows and the incorrect completion of TID RDMA WRITE requests, and eventually leading to memory corruption and kernel crash. The current code drops stale TID RDMA ACK/NAK packets solely based on KDETH PSNs, which may lead to erroneous processing. This patch fixes the issue by also checking the Verbs PSN. Addition checks are added before rewinding the TID RDMA WRITE DATA packets. Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm zoned: properly handle backing device failureDmitry Fomichev2019-08-294-14/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 75d66ffb48efb30f2dd42f041ba8b39c5b2bd115 upstream. dm-zoned is observed to lock up or livelock in case of hardware failure or some misconfiguration of the backing zoned device. This patch adds a new dm-zoned target function that checks the status of the backing device. If the request queue of the backing device is found to be in dying state or the SCSI backing device enters offline state, the health check code sets a dm-zoned target flag prompting all further incoming I/O to be rejected. In order to detect backing device failures timely, this new function is called in the request mapping path, at the beginning of every reclaim run and before performing any metadata I/O. The proper way out of this situation is to do dmsetup remove <dm-zoned target> and recreate the target when the problem with the backing device is resolved. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm zoned: improve error handling in i/o map codeDmitry Fomichev2019-08-291-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | commit d7428c50118e739e672656c28d2b26b09375d4e0 upstream. Some errors are ignored in the I/O path during queueing chunks for processing by chunk works. Since at least these errors are transient in nature, it should be possible to retry the failed incoming commands. The fix - Errors that can happen while queueing chunks are carried upwards to the main mapping function and it now returns DM_MAPIO_REQUEUE for any incoming requests that can not be properly queued. Error logging/debug messages are added where needed. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm zoned: improve error handling in reclaimDmitry Fomichev2019-08-292-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b234c6d7a703661b5045c5bf569b7c99d2edbf88 upstream. There are several places in reclaim code where errors are not propagated to the main function, dmz_reclaim(). This function is responsible for unlocking zones that might be still locked at the end of any failed reclaim iterations. As the result, some device zones may be left permanently locked for reclaim, degrading target's capability to reclaim zones. This patch fixes these issues as follows - Make sure that dmz_reclaim_buf(), dmz_reclaim_seq_data() and dmz_reclaim_rnd_data() return error codes to the caller. dmz_reclaim() function is renamed to dmz_do_reclaim() to avoid clashing with "struct dmz_reclaim" and is modified to return the error to the caller. dmz_get_zone_for_reclaim() now returns an error instead of NULL pointer and reclaim code checks for that error. Error logging/debug messages are added where necessary. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm table: fix invalid memory accesses with too high sector numberMikulas Patocka2019-08-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | commit 1cfd5d3399e87167b7f9157ef99daa0e959f395d upstream. If the sector number is too high, dm_table_find_target() should return a pointer to a zeroed dm_target structure (the caller should test it with dm_target_is_valid). However, for some table sizes, the code in dm_table_find_target() that performs btree lookup will access out of bound memory structures. Fix this bug by testing the sector number at the beginning of dm_table_find_target(). Also, add an "inline" keyword to the function dm_table_get_size() because this is a hot path. Fixes: 512875bd9661 ("dm: table detect io beyond device") Cc: stable@vger.kernel.org Reported-by: Zhang Tao <kontais@zoho.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm space map metadata: fix missing store of apply_bops() return valueZhangXiaoxu2019-08-291-1/+1
| | | | | | | | | | | | | | | | | commit ae148243d3f0816b37477106c05a2ec7d5f32614 upstream. In commit 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize"), we refactor the commit logic to a new function 'apply_bops'. But when that logic was replaced in out() the return value was not stored. This may lead out() returning a wrong value to the caller. Fixes: 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize") Cc: stable@vger.kernel.org Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm raid: add missing cleanup in raid_ctr()Wenwen Wang2019-08-291-1/+1
| | | | | | | | | | | | | | | | commit dc1a3e8e0cc6b2293b48c044710e63395aeb4fb4 upstream. If rs_prepare_reshape() fails, no cleanup is executed, leading to leak of the raid_set structure allocated at the beginning of raid_ctr(). To fix this issue, go to the label 'bad' if the error occurs. Fixes: 11e4723206683 ("dm raid: stop keeping raid set frozen altogether") Cc: stable@vger.kernel.org Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm integrity: fix a crash due to BUG_ON in __journal_read_write()Mikulas Patocka2019-08-291-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5729b6e5a1bcb0bbc28abe82d749c7392f66d2c7 upstream. Fix a crash that was introduced by the commit 724376a04d1a. The crash is reported here: https://gitlab.com/cryptsetup/cryptsetup/issues/468 When reading from the integrity device, the function dm_integrity_map_continue calls find_journal_node to find out if the location to read is present in the journal. Then, it calculates how many sectors are consecutively stored in the journal. Then, it locks the range with add_new_range and wait_and_add_new_range. The problem is that during wait_and_add_new_range, we hold no locks (we don't hold ic->endio_wait.lock and we don't hold a range lock), so the journal may change arbitrarily while wait_and_add_new_range sleeps. The code then goes to __journal_read_write and hits BUG_ON(journal_entry_get_sector(je) != logical_sector); because the journal has changed. In order to fix this bug, we need to re-check the journal location after wait_and_add_new_range. We restrict the length to one block in order to not complicate the code too much. Fixes: 724376a04d1a ("dm integrity: implement fair range locks") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm btree: fix order of block initialization in btree_split_beneathZhangXiaoxu2019-08-291-15/+16
| | | | | | | | | | | | | | | | | | | | | | commit e4f9d6013820d1eba1432d51dd1c5795759aa77f upstream. When btree_split_beneath() splits a node to two new children, it will allocate two blocks: left and right. If right block's allocation failed, the left block will be unlocked and marked dirty. If this happened, the left block'ss content is zero, because it wasn't initialized with the btree struct before the attempot to allocate the right block. Upon return, when flushing the left block to disk, the validator will fail when check this block. Then a BUG_ON is raised. Fix this by completely initializing the left block before allocating and initializing the right block. Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path") Cc: stable@vger.kernel.org Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm dust: use dust block size for badblocklist indexBryan Gurney2019-08-291-3/+8
| | | | | | | | | | | | | | | | | | | | commit 08c04c84a5cde3af9baac0645a7496d6dcd76822 upstream. Change the "frontend" dust_remove_block, dust_add_block, and dust_query_block functions to store the "dust block number", instead of the sector number corresponding to the "dust block number". For the "backend" functions dust_map_read and dust_map_write, right-shift by sect_per_block_shift. This fixes the inability to emulate failure beyond the first sector of each "dust block" (for devices with a "dust block size" larger than 512 bytes). Fixes: e4f3fabd67480bf ("dm: add dust target") Cc: stable@vger.kernel.org Signed-off-by: Bryan Gurney <bgurney@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dm kcopyd: always complete failed jobsDmitry Fomichev2019-08-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d1fef41465f0e8cae0693fb184caa6bfafb6cd16 upstream. This patch fixes a problem in dm-kcopyd that may leave jobs in complete queue indefinitely in the event of backing storage failure. This behavior has been observed while running 100% write file fio workload against an XFS volume created on top of a dm-zoned target device. If the underlying storage of dm-zoned goes to offline state under I/O, kcopyd sometimes never issues the end copy callback and dm-zoned reclaim work hangs indefinitely waiting for that completion. This behavior was traced down to the error handling code in process_jobs() function that places the failed job to complete_jobs queue, but doesn't wake up the job handler. In case of backing device failure, all outstanding jobs may end up going to complete_jobs queue via this code path and then stay there forever because there are no more successful I/O jobs to wake up the job handler. This patch adds a wake() call to always wake up kcopyd job wait queue for all I/O jobs that fail before dm_io() gets called for that job. The patch also sets the write error status in all sub jobs that are failed because their master job has failed. Fixes: b73c67c2cbb00 ("dm kcopyd: add sequential write feature") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* IB/hfi1: Drop stale TID RDMA packets that cause TIDErrKaike Wan2019-08-291-44/+3
| | | | | | | | | | | | | | | | | | | | | | | | commit d9d1f5e7bb82415591e8b62b222cbb88c4797ef3 upstream. In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order. A stale TID RDMA data packet could lead to TidErr if the TID entries have been released by duplicate data packets generated from retries, and subsequently erroneously force the qp into error state in the current implementation. Since the payload has already been dropped by hardware, the packet can be simply dropped and it is no longer necessary to put the qp into error state. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packetKaike Wan2019-08-291-0/+7
| | | | | | | | | | | | | | | | | | | | commit 90fdae66e72bf0381d168f12dca0259617927895 upstream. In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA WRITE DATA packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: d72fe7d5008b ("IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192051.105923.69979.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* IB/hfi1: Add additional checks when handling TID RDMA READ RESP packetKaike Wan2019-08-291-1/+4
| | | | | | | | | | | | | | | | | | | | commit a8adbf7d0d0a6e3bf7f99da461a06039364e028b upstream. In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA READ RESP packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192045.105923.59813.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packetKaike Wan2019-08-291-2/+2
| | | | | | | | | | | | | | | | | | | | commit 35d5c8b82e2c32e8e29ca195bb4dac60ba7d97fc upstream. When processing a TID RDMA READ RESP packet that causes KDETH EFLAGS errors, the packet's IB PSN is checked against qp->s_last_psn and qp->s_psn without the protection of qp->s_lock, which is not safe. This patch fixes the issue by acquiring qp->s_lock first. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192039.105923.7852.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAEDexuan Cui2019-08-291-1/+1
| | | | | | | | | | | | | | | | | | | commit a9fc4340aee041dd186d1fb8f1b5d1e9caf28212 upstream. In the case of X86_PAE, unsigned long is u32, but the physical address type should be u64. Due to the bug here, the netvsc driver can not load successfully, and sometimes the VM can panic due to memory corruption (the hypervisor writes data to the wrong location). Fixes: 6ba34171bcbd ("Drivers: hv: vmbus: Remove use of slow_virt_to_phys()") Cc: stable@vger.kernel.org Cc: Michael Kelley <mikelley@microsoft.com> Reported-and-tested-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* gpiolib: never report open-drain/source lines as 'input' to user-spaceBartosz Golaszewski2019-08-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 2c60e6b5c9241b24b8b523fefd3e44fb85622cda upstream. If the driver doesn't support open-drain/source config options, we emulate this behavior when setting the direction by calling gpiod_direction_input() if the default value is 0 (open-source) or 1 (open-drain), thus not actively driving the line in those cases. This however clears the FLAG_IS_OUT bit for the GPIO line descriptor and makes the LINEINFO ioctl() incorrectly report this line's mode as 'input' to user-space. This commit modifies the ioctl() to always set the GPIOLINE_FLAG_IS_OUT bit in the lineinfo structure's flags field. Since it's impossible to use the input mode and open-drain/source options at the same time, we can be sure the reported information will be correct. Fixes: 521a2ad6f862 ("gpio: add userspace ABI for GPIO line information") Cc: stable <stable@vger.kernel.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Link: https://lore.kernel.org/r/20190806114151.17652-1-brgl@bgdev.pl Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()Adrian Hunter2019-08-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7c7cfdcf7f1777c7376fc9a239980de04b6b5ea1 upstream. Fix the following BUG: [ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c [ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core] [ 187.065938] Call Trace: [ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core] [ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core] [ 187.065993] ? pci_pm_restore+0xb0/0xb0 [ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci] [ 187.066017] pci_pm_thaw+0x4c/0x90 [ 187.066030] dpm_run_callback+0x5b/0x150 [ 187.066043] device_resume+0x11b/0x220 Voltage regulators are optional, so functions must check they exist before dereferencing. Note this issue is hidden if CONFIG_REGULATORS is not set, because the offending code is optimised away. Notes for stable: The issue first appears in commit 57d104c153d3 ("ufs: add UFS power management support") but is inadvertently fixed in commit 60f0187031c0 ("scsi: ufs: disable vccq if it's not needed by UFS device") which in turn was reverted by commit 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and v5.1+ Fixes: 57d104c153d3 ("ufs: add UFS power management support") Fixes: 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"") Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUXLyude Paul2019-08-291-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c358ebf59634f06d8ed176da651ec150df3c8686 upstream. While I had thought I had fixed this issue in: commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after ->fini()") It turns out that while I did fix the error messages I was seeing on my P50 when trying to access i2c busses with the GPU in runtime suspend, I accidentally had missed one important detail that was mentioned on the bug report this commit was supposed to fix: that the CPU would only lock up when trying to access i2c busses _on connected devices_ _while the GPU is not in runtime suspend_. Whoops. That definitely explains why I was not able to get my machine to hang with i2c bus interactions until now, as plugging my P50 into it's dock with an HDMI monitor connected allowed me to finally reproduce this locally. Now that I have managed to reproduce this issue properly, it looks like the problem is much simpler then it looks. It turns out that some connected devices, such as MST laptop docks, will actually ACK i2c reads even if no data was actually read: [ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1 [ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000 [ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001 [ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 [ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 [ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 Because we don't handle the situation of i2c ack without any data, we end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the value of cnt always remains at 0. This finally properly explains how this could result in a CPU hang like the ones observed in the aforementioned commit. So, fix this by retrying transactions if no data is written or received, and give up and fail the transaction if we continue to not write or receive any data after 32 retries. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm/amdgpu/gfx9: update pg_flags after determining if gfx off is possibleAlex Deucher2019-08-292-5/+4
| | | | | | | | | | | | | | | | | | | | commit 98f58ada2d37e68125c056f1fc005748251879c2 upstream. We need to set certain power gating flags after we determine if the firmware version is sufficient to support gfxoff. Previously we set the pg flags in early init, but we later we might have disabled gfxoff if the firmware versions didn't support it. Move adding the additional pg flags after we determine whether or not to support gfxoff. Fixes: 005440066f92 ("drm/amdgpu: enable gfxoff again on raven series (v2)") Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Tom St Denis <tom.stdenis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* clk: socfpga: stratix10: fix rate caclulationg for cnt_clksDinh Nguyen2019-08-291-1/+1
| | | | | | | | | | | | | | | commit c7ec75ea4d5316518adc87224e3cff47192579e7 upstream. Checking bypass_reg is incorrect for calculating the cnt_clk rates. Instead we should be checking that there is a proper hardware register that holds the clock divider. Cc: stable@vger.kernel.org Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lkml.kernel.org/r/20190814153014.12962-1-dinguyen@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "dm bufio: fix deadlock with loop device"Mikulas Patocka2019-08-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream. Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper fix has been made available with commit d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread"). Note that the fix offered by commit bd293d071ffe doesn't really prevent the deadlock from occuring - if we look at the stacktrace reported by Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex - i.e. it has already successfully taken the mutex. Changing the mutex from mutex_lock to mutex_trylock won't help with deadlocks that happen afterwards. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device") Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* HID: wacom: Correct distance scale for 2nd-gen Intuos devicesJason Gerecke2019-08-291-0/+2
| | | | | | | | | | | | | | | | | commit b72fb1dcd2ea9d29417711cb302cef3006fa8d5a upstream. Distance values reported by 2nd-gen Intuos tablets are on an inverted scale (0 == far, 63 == near). We need to change them over to a normal scale before reporting to userspace or else userspace drivers and applications can get confused. Ref: https://github.com/linuxwacom/input-wacom/issues/98 Fixes: eda01dab53 ("HID: wacom: Add four new Intuos devices") Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* HID: wacom: correct misreported EKR ring valuesAaron Armstrong Skomra2019-08-291-1/+1
| | | | | | | | | | | | | | | | | commit fcf887e7caaa813eea821d11bf2b7619a37df37a upstream. The EKR ring claims a range of 0 to 71 but actually reports values 1 to 72. The ring is used in relative mode so this change should not affect users. Signed-off-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com> Fixes: 72b236d60218f ("HID: wacom: Add support for Express Key Remote.") Cc: <stable@vger.kernel.org> # v4.3+ Reviewed-by: Ping Cheng <ping.cheng@wacom.com> Reviewed-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iwlwifi: mvm: disable TX-AMSDU on older NICsJohannes Berg2019-08-291-1/+13
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cfb21b11b891b08b79be07be57c40a85bb926668 ] On older NICs, we occasionally see issues with A-MSDU support, where the commands in the FIFO get confused and then we see an assert EDC because the next command in the FIFO isn't TX. We've tried to isolate this issue and understand where it comes from, but haven't found any errors in building the A-MSDU in software. At least for now, disable A-MSDU support on older hardware so that users can use it again without fearing the assert. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=203315. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* block: aoe: Fix kernel crash due to atomic sleep when exitingHe Zhe2019-08-291-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 430380b4637aec646996b4aef67ad417593923b2 ] Since commit 3582dd291788 ("aoe: convert aoeblk to blk-mq"), aoedev_downdev has had the possibility of sleeping and causing the following crash. BUG: scheduling while atomic: rmmod/2242/0x00000003 Modules linked in: aoe Preemption disabled at: [<ffffffffc01d95e5>] flush+0x95/0x4a0 [aoe] CPU: 7 PID: 2242 Comm: rmmod Tainted: G I 5.2.3 #1 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009 Call Trace: dump_stack+0x4f/0x6a ? flush+0x95/0x4a0 [aoe] __schedule_bug.cold+0x44/0x54 __schedule+0x44f/0x680 schedule+0x44/0xd0 blk_mq_freeze_queue_wait+0x46/0xb0 ? wait_woken+0x80/0x80 blk_mq_freeze_queue+0x1b/0x20 aoedev_downdev+0x111/0x160 [aoe] flush+0xff/0x4a0 [aoe] aoedev_exit+0x23/0x30 [aoe] aoe_exit+0x35/0x948 [aoe] __se_sys_delete_module+0x183/0x210 __x64_sys_delete_module+0x16/0x20 do_syscall_64+0x4d/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f24e0043b07 Code: 73 01 c3 48 8b 0d 89 73 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 73 0b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe18f7f1e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f24e0043b07 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555c3ecf87c8 RBP: 00007ffe18f7f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f24e00b4ac0 R11: 0000000000000206 R12: 00007ffe18f7f238 R13: 00007ffe18f7f410 R14: 00007ffe18f80e73 R15: 0000555c3ecf8760 This patch, handling in the same way of pass two, unlocks the locks and restart pass one after aoedev_downdev is done. Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq") Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* drm/vmwgfx: fix memory leak when too many retries have occurredColin Ian King2019-08-291-1/+3
| | | | | | | | | | | | | | | | [ Upstream commit 6b7c3b86f0b63134b2ab56508921a0853ffa687a ] Currently when too many retries have occurred there is a memory leak on the allocation for reply on the error return path. Fix this by kfree'ing reply before returning. Addresses-Coverity: ("Resource leak") Fixes: a9cd9c044aa9 ("drm/vmwgfx: Add a check to handle host message failure") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Deepak Rawat <drawat@vmware.com> Signed-off-by: Deepak Rawat <drawat@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* libata: add SG safety checks in SFF pio transfersJens Axboe2019-08-291-0/+6
| | | | | | | | | | | | | [ Upstream commit 752ead44491e8c91e14d7079625c5916b30921c5 ] Abort processing of a command if we run out of mapped data in the SG list. This should never happen, but a previous bug caused it to be possible. Play it safe and attempt to abort nicely if we don't have more SG segments left. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>