From 0cc61e64e21cfc24fa0d938fd148aba4a595163b Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 19 Jun 2018 18:40:14 +0200 Subject: block: fix timeout changes for legacy request drivers blk_mq_complete_request can only be called for blk-mq drivers, but when removing the BLK_EH_HANDLED return value, two legacy request timeout methods incorrectly got switched to call blk_mq_complete_request. Call __blk_complete_request instead to reinstance the previous behavior. For that __blk_complete_request needs to be exported. Fixes: 1fc2b62e ("scsi_transport_fc: complete requests from ->timeout") Fixes: 0df0bb08 ("null_blk: complete requests from ->timeout") Reported-by: Jianchao Wang Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- block/blk-softirq.c | 1 + 1 file changed, 1 insertion(+) (limited to 'block') diff --git a/block/blk-softirq.c b/block/blk-softirq.c index 01e2b353a2b9..15c1f5e12eb8 100644 --- a/block/blk-softirq.c +++ b/block/blk-softirq.c @@ -144,6 +144,7 @@ do_local: local_irq_restore(flags); } +EXPORT_SYMBOL(__blk_complete_request); /** * blk_complete_request - end I/O on a request -- cgit v1.2.3 From 9c24c10a2c1e1bb478b6bb70612d9e885aee044f Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 19 Jun 2018 10:26:40 -0700 Subject: Revert "block: Add warning for bi_next not NULL in bio_endio()" Commit 0ba99ca4838b ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit 0ba99ca4838b clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit 0ba99ca4838b. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet Fixes: 0ba99ca4838b ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by: Mike Snitzer Signed-off-by: Bart Van Assche Signed-off-by: Jens Axboe --- block/bio.c | 3 --- block/blk-core.c | 8 +------- 2 files changed, 1 insertion(+), 10 deletions(-) (limited to 'block') diff --git a/block/bio.c b/block/bio.c index db9a40e9a136..f7e3d88bd0b6 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1807,9 +1807,6 @@ again: if (!bio_integrity_endio(bio)) return; - if (WARN_ONCE(bio->bi_next, "driver left bi_next not NULL")) - bio->bi_next = NULL; - /* * Need to have a real endio function for chained bios, otherwise * various corner cases will break (like stacking block devices that diff --git a/block/blk-core.c b/block/blk-core.c index cf0ee764b908..afd2596ea3d3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -273,10 +273,6 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_advance(bio, nbytes); /* don't actually finish bio if it's part of flush sequence */ - /* - * XXX this code looks suspicious - it's not consistent with advancing - * req->bio in caller - */ if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ)) bio_endio(bio); } @@ -3081,10 +3077,8 @@ bool blk_update_request(struct request *req, blk_status_t error, struct bio *bio = req->bio; unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes); - if (bio_bytes == bio->bi_iter.bi_size) { + if (bio_bytes == bio->bi_iter.bi_size) req->bio = bio->bi_next; - bio->bi_next = NULL; - } /* Completion has already been traced */ bio_clear_flag(bio, BIO_TRACE_COMPLETION); -- cgit v1.2.3 From a1e79188628580465ac6d7a93a313336ee3364f1 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 20 Jun 2018 13:45:05 +0300 Subject: blk-mq-debugfs: Off by one in blk_mq_rq_state_name() If rq_state == ARRAY_SIZE() then we read one element beyond the end of the blk_mq_rq_state_name_array[] array. Fixes: ec6dcf63c55c ("blk-mq-debugfs: Show more request state information") Reviewed-by: Bart Van Assche Signed-off-by: Dan Carpenter Signed-off-by: Jens Axboe --- block/blk-mq-debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index ffa622366922..1c4532e92938 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -356,7 +356,7 @@ static const char *const blk_mq_rq_state_name_array[] = { static const char *blk_mq_rq_state_name(enum mq_rq_state rq_state) { - if (WARN_ON_ONCE((unsigned int)rq_state > + if (WARN_ON_ONCE((unsigned int)rq_state >= ARRAY_SIZE(blk_mq_rq_state_name_array))) return "(?)"; return blk_mq_rq_state_name_array[rq_state]; -- cgit v1.2.3 From ce042c183bcb94eb2919e8036473a1fc203420f9 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 20 Jun 2018 13:41:51 +0300 Subject: block: sed-opal: Fix a couple off by one bugs resp->num is the number of tokens in resp->tok[]. It gets set in response_parse(). So if n == resp->num then we're reading beyond the end of the data. Fixes: 455a7b238cd6 ("block: Add Sed-opal library") Reviewed-by: Scott Bauer Tested-by: Scott Bauer Signed-off-by: Dan Carpenter Signed-off-by: Jens Axboe --- block/sed-opal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/sed-opal.c b/block/sed-opal.c index 945f4b8610e0..e0de4dd448b3 100644 --- a/block/sed-opal.c +++ b/block/sed-opal.c @@ -877,7 +877,7 @@ static size_t response_get_string(const struct parsed_resp *resp, int n, return 0; } - if (n > resp->num) { + if (n >= resp->num) { pr_debug("Response has %d tokens. Can't access %d\n", resp->num, n); return 0; @@ -916,7 +916,7 @@ static u64 response_get_u64(const struct parsed_resp *resp, int n) return 0; } - if (n > resp->num) { + if (n >= resp->num) { pr_debug("Response has %d tokens. Can't access %d\n", resp->num, n); return 0; -- cgit v1.2.3 From f5e350f021e04ea41d2e5d58487c33b05ba3d25b Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 22 Jun 2018 13:18:09 -0700 Subject: blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE Make sure that RQF_TIMED_OUT is cleared when a request is reused after a block driver timeout handler has returned BLK_EH_DONE. Fixes: da6612673988 ("blk-mq: don't time out requests again that are in the timeout handler") Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Jianchao Wang Cc: Andrew Randrianasulu Signed-off-by: Jens Axboe --- block/blk-mq.c | 1 - block/blk-timeout.c | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-mq.c b/block/blk-mq.c index 8e57b84e50e9..b6888ff556cf 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -781,7 +781,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) WARN_ON_ONCE(ret != BLK_EH_RESET_TIMER); } - req->rq_flags &= ~RQF_TIMED_OUT; blk_add_timer(req); } diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 4b8a48d48ba1..f2cfd56e1606 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -210,6 +210,7 @@ void blk_add_timer(struct request *req) if (!req->timeout) req->timeout = q->rq_timeout; + req->rq_flags &= ~RQF_TIMED_OUT; blk_rq_set_deadline(req, jiffies + req->timeout); /* -- cgit v1.2.3 From 297ba57dcdec7ea37e702bcf1a577ac32a034e21 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 27 Jun 2018 12:55:18 -0700 Subject: block: Fix cloning of requests with a special payload This patch avoids that removing a path controlled by the dm-mpath driver while mkfs is running triggers the following kernel bug: kernel BUG at block/blk-core.c:3347! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 20 PID: 24369 Comm: mkfs.ext4 Not tainted 4.18.0-rc1-dbg+ #2 RIP: 0010:blk_end_request_all+0x68/0x70 Call Trace: dm_softirq_done+0x326/0x3d0 [dm_mod] blk_done_softirq+0x19b/0x1e0 __do_softirq+0x128/0x60d irq_exit+0x100/0x110 smp_call_function_single_interrupt+0x90/0x330 call_function_single_interrupt+0xf/0x20 Fixes: f9d03f96b988 ("block: improve handling of the magic discard payload") Reviewed-by: Ming Lei Reviewed-by: Christoph Hellwig Acked-by: Mike Snitzer Signed-off-by: Bart Van Assche Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Signed-off-by: Jens Axboe --- block/blk-core.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index afd2596ea3d3..f84a9b7b6f5a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -3473,6 +3473,10 @@ static void __blk_rq_prep_clone(struct request *dst, struct request *src) dst->cpu = src->cpu; dst->__sector = blk_rq_pos(src); dst->__data_len = blk_rq_bytes(src); + if (src->rq_flags & RQF_SPECIAL_PAYLOAD) { + dst->rq_flags |= RQF_SPECIAL_PAYLOAD; + dst->special_vec = src->special_vec; + } dst->nr_phys_segments = src->nr_phys_segments; dst->ioprio = src->ioprio; dst->extra_len = src->extra_len; -- cgit v1.2.3 From 1f57f8d442f8017587eeebd8617913bfc3661d3d Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 28 Jun 2018 11:54:01 -0600 Subject: blk-mq: don't queue more if we get a busy return Some devices have different queue limits depending on the type of IO. A classic case is SATA NCQ, where some commands can queue, but others cannot. If we have NCQ commands inflight and encounter a non-queueable command, the driver returns busy. Currently we attempt to dispatch more from the scheduler, if we were able to queue some commands. But for the case where we ended up stopping due to BUSY, we should not attempt to retrieve more from the scheduler. If we do, we can get into a situation where we attempt to queue a non-queueable command, get BUSY, then successfully retrieve more commands from that scheduler and queue those. This can repeat forever, starving the non-queuable command indefinitely. Fix this by NOT attempting to pull more commands from the scheduler, if we get a BUSY return. This should also be more optimal in terms of letting requests stay in the scheduler for as long as possible, if we get a BUSY due to the regular out-of-tags condition. Reviewed-by: Omar Sandoval Reviewed-by: Ming Lei Signed-off-by: Jens Axboe --- block/blk-mq.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'block') diff --git a/block/blk-mq.c b/block/blk-mq.c index b6888ff556cf..d394cdd8d8c6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1075,6 +1075,9 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, #define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ +/* + * Returns true if we did some work AND can potentially do more. + */ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool got_budget) { @@ -1205,8 +1208,17 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, blk_mq_run_hw_queue(hctx, true); else if (needs_restart && (ret == BLK_STS_RESOURCE)) blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY); + + return false; } + /* + * If the host/device is unable to accept more work, inform the + * caller of that. + */ + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) + return false; + return (queued + errors) != 0; } -- cgit v1.2.3 From 70dbcc2254fa2a9add74a122b9dac954c4736e01 Mon Sep 17 00:00:00 2001 From: Tony Battersby Date: Wed, 11 Jul 2018 10:46:03 -0400 Subject: bsg: fix bogus EINVAL on non-data commands Fix a regression introduced in Linux kernel 4.17 where sending a SCSI command that does not transfer data (such as TEST UNIT READY) via /dev/bsg/* results in EINVAL. Fixes: 17cb960f29c2 ("bsg: split handling of SCSI CDBs vs transport requeues") Cc: # 4.17+ Reviewed-by: Christoph Hellwig Signed-off-by: Tony Battersby Signed-off-by: Jens Axboe --- block/bsg.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'block') diff --git a/block/bsg.c b/block/bsg.c index 66602c489956..3da540faf673 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -267,8 +267,6 @@ bsg_map_hdr(struct request_queue *q, struct sg_io_v4 *hdr, fmode_t mode) } else if (hdr->din_xfer_len) { ret = blk_rq_map_user(q, rq, NULL, uptr64(hdr->din_xferp), hdr->din_xfer_len, GFP_KERNEL); - } else { - ret = blk_rq_map_user(q, rq, NULL, NULL, 0, GFP_KERNEL); } if (ret) -- cgit v1.2.3 From 0fc09f920983f61be625658c62cc40ac25a7b3a5 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Mon, 23 Jul 2018 08:37:50 -0600 Subject: blk-mq: export setting request completion state This is preparing for drivers that want to directly alter the state of their requests. No functional change here. Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Jens Axboe --- block/blk-mq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk-mq.c b/block/blk-mq.c index d394cdd8d8c6..5291a95ba362 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -558,10 +558,8 @@ static void __blk_mq_complete_request(struct request *rq) bool shared = false; int cpu; - if (cmpxchg(&rq->state, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE) != - MQ_RQ_IN_FLIGHT) + if (!blk_mq_mark_complete(rq)) return; - if (rq->internal_tag != -1) blk_mq_sched_completed_request(rq); -- cgit v1.2.3 From b403ea2404889e1227812fa9657667a1deb9c694 Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Wed, 25 Jul 2018 23:15:07 +0200 Subject: block: bio_iov_iter_get_pages: fix size of last iovec If the last page of the bio is not "full", the length of the last vector slot needs to be corrected. This slot has the index (bio->bi_vcnt - 1), but only in bio->bi_io_vec. In the "bv" helper array, which is shifted by the value of bio->bi_vcnt at function invocation, the correct index is (nr_pages - 1). v2: improved readability following suggestions from Ming Lei. v3: followed a formatting suggestion from Christoph Hellwig. Fixes: 2cefe4dbaadf ("block: add bio_iov_iter_get_pages()") Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Signed-off-by: Martin Wilck Signed-off-by: Jens Axboe --- block/bio.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) (limited to 'block') diff --git a/block/bio.c b/block/bio.c index f7e3d88bd0b6..cd55ea6bd47c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -912,16 +912,16 @@ EXPORT_SYMBOL(bio_add_page); */ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { - unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; + unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt, idx; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; - size_t offset, diff; + size_t offset; ssize_t size; size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset); if (unlikely(size <= 0)) return size ? size : -EFAULT; - nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE; + idx = nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE; /* * Deep magic below: We need to walk the pinned pages backwards @@ -934,17 +934,15 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) bio->bi_iter.bi_size += size; bio->bi_vcnt += nr_pages; - diff = (nr_pages * PAGE_SIZE - offset) - size; - while (nr_pages--) { - bv[nr_pages].bv_page = pages[nr_pages]; - bv[nr_pages].bv_len = PAGE_SIZE; - bv[nr_pages].bv_offset = 0; + while (idx--) { + bv[idx].bv_page = pages[idx]; + bv[idx].bv_len = PAGE_SIZE; + bv[idx].bv_offset = 0; } bv[0].bv_offset += offset; bv[0].bv_len -= offset; - if (diff) - bv[bio->bi_vcnt - 1].bv_len -= diff; + bv[nr_pages - 1].bv_len -= nr_pages * PAGE_SIZE - offset - size; iov_iter_advance(iter, size); return 0; -- cgit v1.2.3 From 17d51b10d7773e4618bcac64648f30f12d4078fb Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Wed, 25 Jul 2018 23:15:09 +0200 Subject: block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs bio_iov_iter_get_pages() currently only adds pages for the next non-zero segment from the iov_iter to the bio. That's suboptimal for callers, which typically try to pin as many pages as fit into the bio. This patch converts the current bio_iov_iter_get_pages() into a static helper, and introduces a new helper that allocates as many pages as 1) fit into the bio, 2) are present in the iov_iter, 3) and can be pinned by MM. Error is returned only if zero pages could be pinned. Because of 3), a zero return value doesn't necessarily mean all pages have been pinned. Callers that have to pin every page in the iov_iter must still call this function in a loop (this is currently the case). This change matters most for __blkdev_direct_IO_simple(), which calls bio_iov_iter_get_pages() only once. If it obtains less pages than requested, it returns a "short write" or "short read", and __generic_file_write_iter() falls back to buffered writes, which may lead to data corruption. Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Reviewed-by: Christoph Hellwig Signed-off-by: Martin Wilck Signed-off-by: Jens Axboe --- block/bio.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/bio.c b/block/bio.c index cd55ea6bd47c..dc07a427e782 100644 --- a/block/bio.c +++ b/block/bio.c @@ -903,14 +903,16 @@ int bio_add_page(struct bio *bio, struct page *page, EXPORT_SYMBOL(bio_add_page); /** - * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio + * __bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio * @bio: bio to add pages to * @iter: iov iterator describing the region to be mapped * - * Pins as many pages from *iter and appends them to @bio's bvec array. The + * Pins pages from *iter and appends them to @bio's bvec array. The * pages will have to be released using put_page() when done. + * For multi-segment *iter, this function only adds pages from the + * the next non-empty segment of the iov iterator. */ -int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) +static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt, idx; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; @@ -947,6 +949,33 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_advance(iter, size); return 0; } + +/** + * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be mapped + * + * Pins pages from *iter and appends them to @bio's bvec array. The + * pages will have to be released using put_page() when done. + * The function tries, but does not guarantee, to pin as many pages as + * fit into the bio, or are requested in *iter, whatever is smaller. + * If MM encounters an error pinning the requested pages, it stops. + * Error is returned only if 0 pages could be pinned. + */ +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short orig_vcnt = bio->bi_vcnt; + + do { + int ret = __bio_iov_iter_get_pages(bio, iter); + + if (unlikely(ret)) + return bio->bi_vcnt > orig_vcnt ? 0 : ret; + + } while (iov_iter_count(iter) && !bio_full(bio)); + + return 0; +} EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); static void submit_bio_wait_endio(struct bio *bio) -- cgit v1.2.3 From 5151842b9d8732d4cbfa8400b40bff894f501b2f Mon Sep 17 00:00:00 2001 From: Greg Edwards Date: Thu, 26 Jul 2018 14:39:37 -0400 Subject: block: reset bi_iter.bi_done after splitting bio After the bio has been updated to represent the remaining sectors, reset bi_done so bio_rewind_iter() does not rewind further than it should. This resolves a bio_integrity_process() failure on reads where the original request was split. Fixes: 63573e359d05 ("bio-integrity: Restore original iterator on verify stage") Signed-off-by: Greg Edwards Signed-off-by: Jens Axboe --- block/bio.c | 1 + 1 file changed, 1 insertion(+) (limited to 'block') diff --git a/block/bio.c b/block/bio.c index dc07a427e782..05d81912870b 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1893,6 +1893,7 @@ struct bio *bio_split(struct bio *bio, int sectors, bio_integrity_trim(split); bio_advance(bio, split->bi_iter.bi_size); + bio->bi_iter.bi_done = 0; if (bio_flagged(bio, BIO_TRACE_COMPLETION)) bio_set_flag(split, BIO_TRACE_COMPLETION); -- cgit v1.2.3 From 2d5ba0e2de24ec87636244a01d4e78d095cc1b20 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Fri, 3 Aug 2018 01:49:37 +0800 Subject: blk-mq: fix blk_mq_tagset_busy_iter Commit d250bf4e776ff09d5("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' to replace 'blk_mq_request_started(req)', this way is wrong, and causes lots of test system hang during booting. Fix the issue by using blk_mq_request_started(req) inside bt_tags_iter(). Fixes: d250bf4e776ff09d5 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter") Cc: Josef Bacik Cc: Christoph Hellwig Cc: Guenter Roeck Cc: Mark Brown Cc: Matt Hart Cc: Johannes Thumshirn Cc: John Garry Cc: Hannes Reinecke , Cc: "Martin K. Petersen" , Cc: James Bottomley Cc: linux-scsi@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Bart Van Assche Tested-by: Guenter Roeck Reported-by: Mark Brown Reported-by: Guenter Roeck Signed-off-by: Ming Lei Signed-off-by: Jens Axboe --- block/blk-mq-tag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 09b2ee6694fb..3de0836163c2 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -271,7 +271,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) * test and set the bit before assining ->rqs[]. */ rq = tags->rqs[bitnr]; - if (rq && blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) + if (rq && blk_mq_request_started(rq)) iter_data->fn(rq, iter_data->data, reserved); return true; -- cgit v1.2.3 From a32e236eb93e62a0f692e79b7c3c9636689559b9 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 3 Aug 2018 12:22:09 -0700 Subject: Partially revert "block: fail op_is_write() requests to read-only partitions" It turns out that commit 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions"), while obviously correct, causes problems for some older lvm2 installations. The reason is that the lvm snapshotting will continue to write to the snapshow COW volume, even after the volume has been marked read-only. End result: snapshot failure. This has actually been fixed in newer version of the lvm2 tool, but the old tools still exist, and the breakage was reported both in the kernel bugzilla and in the Debian bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200439 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442 The lvm2 fix is here https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3 but until everybody has updated to recent versions, we'll have to weaken the "never write to read-only partitions" check. It now allows the write to happen, but causes a warning, something like this: generic_make_request: Trying to write to read-only block-device dm-3 (partno X) Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3 Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016 Workqueue: ksnaphd do_metadata RIP: 0010:generic_make_request_checks+0x4ac/0x600 ... Call Trace: generic_make_request+0x64/0x400 submit_bio+0x6c/0x140 dispatch_io+0x287/0x430 sync_io+0xc3/0x120 dm_io+0x1f8/0x220 do_metadata+0x1d/0x30 process_one_work+0x1b9/0x3e0 worker_thread+0x2b/0x3c0 kthread+0x113/0x130 ret_from_fork+0x35/0x40 Note that this is a "revert" in behavior only. I'm leaving alone the actual code cleanups in commit 721c7fc701c7, but letting the previously uncaught request go through with a warning instead of stopping it. Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions") Reported-and-tested-by: WGH Acked-by: Mike Snitzer Cc: Sagi Grimberg Cc: Ilya Dryomov Cc: Jens Axboe Cc: Zdenek Kabelac Signed-off-by: Linus Torvalds --- block/blk-core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/blk-core.c b/block/blk-core.c index f84a9b7b6f5a..ee33590f54eb 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2155,11 +2155,12 @@ static inline bool bio_check_ro(struct bio *bio, struct hd_struct *part) if (part->policy && op_is_write(bio_op(bio))) { char b[BDEVNAME_SIZE]; - printk(KERN_ERR + WARN_ONCE(1, "generic_make_request: Trying to write " "to read-only block-device %s (partno %d)\n", bio_devname(bio, b), part->partno); - return true; + /* Older lvm-tools actually trigger this */ + return false; } return false; -- cgit v1.2.3