From 88ce2a530cc9865a894454b2e40eba5957a60e1a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 8 Sep 2020 16:15:06 +0200 Subject: block: restore a specific error code in bdev_del_partition mdadm relies on the fact that deleting an invalid partition returns -ENXIO or -ENOTTY to detect if a block device is a partition or a whole device. Fixes: 08fc1ab6d748 ("block: fix locking in bdev_del_partition") Reported-by: kernel test robot Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- block/partitions/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/partitions/core.c b/block/partitions/core.c index 5b4869c08fb3..722406b841df 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -537,7 +537,7 @@ int bdev_del_partition(struct block_device *bdev, int partno) bdevp = bdget_disk(bdev->bd_disk, partno); if (!bdevp) - return -ENOMEM; + return -ENXIO; mutex_lock(&bdevp->bd_mutex); mutex_lock_nested(&bdev->bd_mutex, 1); -- cgit v1.2.3 From b63de8400a6e1001b5732286cf6f5ec27799b7b4 Mon Sep 17 00:00:00 2001 From: James Smart Date: Fri, 28 Aug 2020 12:01:50 -0700 Subject: nvme: Revert: Fix controller creation races with teardown flow The indicated patch introduced a barrier in the sysfs_delete attribute for the controller that rejects the request if the controller isn't created. "Created" is defined as at least 1 call to nvme_start_ctrl(). This is problematic in error-injection testing. If an error occurs on the initial attempt to create an association and the controller enters reconnect(s) attempts, the admin cannot delete the controller until either there is a successful association created or ctrl_loss_tmo times out. Where this issue is particularly hurtful is when the "admin" is the nvme-cli, it is performing a connection to a discovery controller, and it is initiated via auto-connect scripts. With the FC transport, if the first connection attempt fails, the controller enters a normal reconnect state but returns control to the cli thread that created the controller. In this scenario, the cli attempts to read the discovery log via ioctl, which fails, causing the cli to see it as an empty log and then proceeds to delete the discovery controller. The delete is rejected and the controller is left live. If the discovery controller reconnect then succeeds, there is no action to delete it, and it sits live doing nothing. Cc: # v5.7+ Fixes: ce1518139e69 ("nvme: Fix controller creation races with teardown flow") Signed-off-by: James Smart CC: Israel Rukshin CC: Max Gurtovoy CC: Christoph Hellwig CC: Keith Busch CC: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 5 ----- drivers/nvme/host/nvme.h | 1 - 2 files changed, 6 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5702a3843746..8b75f6ca0b61 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3525,10 +3525,6 @@ static ssize_t nvme_sysfs_delete(struct device *dev, { struct nvme_ctrl *ctrl = dev_get_drvdata(dev); - /* Can't delete non-created controllers */ - if (!ctrl->created) - return -EBUSY; - if (device_remove_file_self(dev, attr)) nvme_delete_ctrl_sync(ctrl); return count; @@ -4403,7 +4399,6 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl) nvme_queue_scan(ctrl); nvme_start_queues(ctrl); } - ctrl->created = true; } EXPORT_SYMBOL_GPL(nvme_start_ctrl); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 2910f6caab7d..9fd45ff656da 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -307,7 +307,6 @@ struct nvme_ctrl { struct nvme_command ka_cmd; struct work_struct fw_act_work; unsigned long events; - bool created; #ifdef CONFIG_NVME_MULTIPATH /* asymmetric namespace access: */ -- cgit v1.2.3 From e126e8210e950bb83414c4f57b3120ddb8450742 Mon Sep 17 00:00:00 2001 From: David Milburn Date: Wed, 2 Sep 2020 17:42:54 -0500 Subject: nvme-fc: cancel async events before freeing event struct Cancel async event work in case async event has been queued up, and nvme_fc_submit_async_event() runs after event has been freed. Signed-off-by: David Milburn Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/fc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index a7f474ddfff7..e8ef42b9d50c 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2160,6 +2160,7 @@ nvme_fc_term_aen_ops(struct nvme_fc_ctrl *ctrl) struct nvme_fc_fcp_op *aen_op; int i; + cancel_work_sync(&ctrl->ctrl.async_event_work); aen_op = ctrl->aen_ops; for (i = 0; i < NVME_NR_AEN_COMMANDS; i++, aen_op++) { __nvme_fc_exit_request(ctrl, aen_op); -- cgit v1.2.3 From 925dd04c1f9825194b9e444c12478084813b2b5d Mon Sep 17 00:00:00 2001 From: David Milburn Date: Wed, 2 Sep 2020 17:42:52 -0500 Subject: nvme-rdma: cancel async events before freeing event struct Cancel async event work in case async event has been queued up, and nvme_rdma_submit_async_event() runs after event has been freed. Signed-off-by: David Milburn Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/rdma.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 1d8ea5305d60..3247d4ee46b1 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -835,6 +835,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, blk_mq_free_tag_set(ctrl->ctrl.admin_tagset); } if (ctrl->async_event_sqe.data) { + cancel_work_sync(&ctrl->ctrl.async_event_work); nvme_rdma_free_qe(ctrl->device->dev, &ctrl->async_event_sqe, sizeof(struct nvme_command), DMA_TO_DEVICE); ctrl->async_event_sqe.data = NULL; -- cgit v1.2.3 From ceb1e0874dba5cbfc4e0b4145796a4bfb3716e6a Mon Sep 17 00:00:00 2001 From: David Milburn Date: Wed, 2 Sep 2020 17:42:53 -0500 Subject: nvme-tcp: cancel async events before freeing event struct Cancel async event work in case async event has been queued up, and nvme_tcp_submit_async_event() runs after event has been freed. Signed-off-by: David Milburn Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/tcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 873d6afcbc9e..245a73d0a1ca 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1597,6 +1597,7 @@ static struct blk_mq_tag_set *nvme_tcp_alloc_tagset(struct nvme_ctrl *nctrl, static void nvme_tcp_free_admin_queue(struct nvme_ctrl *ctrl) { if (to_tcp_ctrl(ctrl)->async_req.pdu) { + cancel_work_sync(&ctrl->async_event_work); nvme_tcp_free_async_req(to_tcp_ctrl(ctrl)); to_tcp_ctrl(ctrl)->async_req.pdu = NULL; } -- cgit v1.2.3 From e8a8a185051a460e3eb0617dca33f996f4e31516 Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Tue, 8 Sep 2020 13:46:37 -0700 Subject: block: only call sched requeue_request() for scheduled requests Yang Yang reported the following crash caused by requeueing a flush request in Kyber: [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 ... [ 2.517468] pc : clear_bit+0x18/0x2c [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 ... [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) [ 2.517602] Call trace: [ 2.517606] clear_bit+0x18/0x2c [ 2.517619] kyber_finish_request+0x74/0x80 [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 [ 2.517637] __scsi_queue_insert+0x11c/0x148 [ 2.517640] scsi_softirq_done+0x114/0x130 [ 2.517643] blk_done_softirq+0x7c/0xb0 [ 2.517651] __do_softirq+0x208/0x3bc [ 2.517657] run_ksoftirqd+0x34/0x60 [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 [ 2.517667] kthread+0x110/0x120 [ 2.517669] ret_from_fork+0x10/0x18 This happens because Kyber doesn't track flush requests, so kyber_finish_request() reads a garbage domain token. Only call the scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for the finish_request() hook in blk_mq_free_request()). Now that we're handling it in blk-mq, also remove the check from BFQ. Reported-by: Yang Yang Signed-off-by: Omar Sandoval Signed-off-by: Jens Axboe --- block/bfq-iosched.c | 12 ------------ block/blk-mq-sched.h | 2 +- 2 files changed, 1 insertion(+), 13 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index a4c0bec920cb..ee767fa000e4 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5895,18 +5895,6 @@ static void bfq_finish_requeue_request(struct request *rq) struct bfq_queue *bfqq = RQ_BFQQ(rq); struct bfq_data *bfqd; - /* - * Requeue and finish hooks are invoked in blk-mq without - * checking whether the involved request is actually still - * referenced in the scheduler. To handle this fact, the - * following two checks make this function exit in case of - * spurious invocations, for which there is nothing to do. - * - * First, check whether rq has nothing to do with an elevator. - */ - if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) - return; - /* * rq either is not associated with any icq, or is an already * requeued request that has not (yet) been re-inserted into diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 126021fc3a11..e81ca1bf6e10 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -66,7 +66,7 @@ static inline void blk_mq_sched_requeue_request(struct request *rq) struct request_queue *q = rq->q; struct elevator_queue *e = q->elevator; - if (e && e->type->ops.requeue_request) + if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request) e->type->ops.requeue_request(rq); } -- cgit v1.2.3 From 73a5379937ec89b91e907bb315e2434ee9696a2c Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Tue, 8 Sep 2020 12:56:08 -0700 Subject: nvme-fabrics: allow to queue requests for live queues Right now we are failing requests based on the controller state (which is checked inline in nvmf_check_ready) however we should definitely accept requests if the queue is live. When entering controller reset, we transition the controller into NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath requests (have blk_noretry_request set). This is also the case for NVME_REQ_USER for the wrong reason. There shouldn't be any reason for us to reject this I/O in a controller reset. We do want to prevent passthru commands on the admin queue because we need the controller to fully initialize first before we let user passthru admin commands to be issued. In a non-mpath setup, this means that the requests will simply be requeued over and over forever not allowing the q_usage_counter to drop its final reference, causing controller reset to hang if running concurrently with heavy I/O. Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready") Reviewed-by: James Smart Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/fabrics.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 32f61fc5f4c5..8575724734e0 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -565,10 +565,14 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq, struct nvme_request *req = nvme_req(rq); /* - * If we are in some state of setup or teardown only allow - * internally generated commands. + * currently we have a problem sending passthru commands + * on the admin_q if the controller is not LIVE because we can't + * make sure that they are going out after the admin connect, + * controller enable and/or other commands in the initialization + * sequence. until the controller will be LIVE, fail with + * BLK_STS_RESOURCE so that they will be rescheduled. */ - if (!blk_rq_is_passthrough(rq) || (req->flags & NVME_REQ_USERCMD)) + if (rq->q == ctrl->admin_q && (req->flags & NVME_REQ_USERCMD)) return false; /* @@ -577,7 +581,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq, */ switch (ctrl->state) { case NVME_CTRL_CONNECTING: - if (nvme_is_fabrics(req->cmd) && + if (blk_rq_is_passthrough(rq) && nvme_is_fabrics(req->cmd) && req->cmd->fabrics.fctype == nvme_fabrics_type_connect) return true; break; -- cgit v1.2.3 From 2cd896a5e86fc326bda8614b96c0401dcc145868 Mon Sep 17 00:00:00 2001 From: Ritesh Harjani Date: Wed, 9 Sep 2020 08:44:25 +0530 Subject: block: Set same_page to false in __bio_try_merge_page if ret is false If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway not merging this page in this bio, then it make sense to make same_page also as false before returning. Without this patch, we hit below WARNING in iomap. This mostly happens with very large memory system and / or after tweaking vm dirty threshold params to delay writeback of dirty data. WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150 CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6 Call Trace: __remove_mapping+0x154/0x320 (unreliable) iomap_releasepage+0x80/0x180 try_to_release_page+0x94/0xe0 invalidate_inode_page+0xc8/0x110 invalidate_mapping_pages+0x1dc/0x540 generic_fadvise+0x3c8/0x450 xfs_file_fadvise+0x2c/0xe0 [xfs] vfs_fadvise+0x3c/0x60 ksys_fadvise64_64+0x68/0xe0 sys_fadvise64+0x28/0x40 system_call_exception+0xf8/0x1c0 system_call_common+0xf0/0x278 Fixes: cc90bc68422 ("block: fix "check bi_size overflow before merge"") Reported-by: Shivaprasad G Bhat Suggested-by: Christoph Hellwig Signed-off-by: Anju T Sudhakar Signed-off-by: Ritesh Harjani Reviewed-by: Ming Lei Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- block/bio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/bio.c b/block/bio.c index a9931f23d933..e865ea55b9f9 100644 --- a/block/bio.c +++ b/block/bio.c @@ -879,8 +879,10 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; if (page_is_mergeable(bv, page, len, off, same_page)) { - if (bio->bi_iter.bi_size > UINT_MAX - len) + if (bio->bi_iter.bi_size > UINT_MAX - len) { + *same_page = false; return false; + } bv->bv_len += len; bio->bi_iter.bi_size += len; return true; -- cgit v1.2.3