summaryrefslogtreecommitdiffstats
path: root/block
Commit message (Collapse)AuthorAgeFilesLines
* blk-mq: fix filesystem I/O request allocationMing Lei2021-11-122-20/+45
| | | | | | | | | | | | | | | | | submit_bio_checks() may update bio->bi_opf, so we have to initialize blk_mq_alloc_data.cmd_flags with bio->bi_opf after submit_bio_checks() returns when allocating new request. In case of using cached request, fallback to allocate new request if cached rq isn't compatible with the incoming bio, otherwise change rq->cmd_flags with incoming bio->bi_opf. Fixes: 900e080752025f00 ("block: move queue enter logic into blk_mq_submit_bio()") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blkcg: Remove extra blkcg_bio_issue_initLaibin Qiu2021-11-121-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN reports a use-after-free report when doing block test: ================================================================== [10050.967049] BUG: KASAN: use-after-free in submit_bio_checks+0x1539/0x1550 [10050.977638] Call Trace: [10050.978190] dump_stack+0x9b/0xce [10050.979674] print_address_description.constprop.6+0x3e/0x60 [10050.983510] kasan_report.cold.9+0x22/0x3a [10050.986089] submit_bio_checks+0x1539/0x1550 [10050.989576] submit_bio_noacct+0x83/0xc80 [10050.993714] submit_bio+0xa7/0x330 [10050.994435] mpage_readahead+0x380/0x500 [10050.998009] read_pages+0x1c1/0xbf0 [10051.002057] page_cache_ra_unbounded+0x4c2/0x6f0 [10051.007413] do_page_cache_ra+0xda/0x110 [10051.008207] force_page_cache_ra+0x23d/0x3d0 [10051.009087] page_cache_sync_ra+0xca/0x300 [10051.009970] generic_file_buffered_read+0xbea/0x2130 [10051.012685] generic_file_read_iter+0x315/0x490 [10051.014472] blkdev_read_iter+0x113/0x1b0 [10051.015300] aio_read+0x2ad/0x450 [10051.023786] io_submit_one+0xc8e/0x1d60 [10051.029855] __se_sys_io_submit+0x125/0x350 [10051.033442] do_syscall_64+0x2d/0x40 [10051.034156] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [10051.048733] Allocated by task 18598: [10051.049482] kasan_save_stack+0x19/0x40 [10051.050263] __kasan_kmalloc.constprop.1+0xc1/0xd0 [10051.051230] kmem_cache_alloc+0x146/0x440 [10051.052060] mempool_alloc+0x125/0x2f0 [10051.052818] bio_alloc_bioset+0x353/0x590 [10051.053658] mpage_alloc+0x3b/0x240 [10051.054382] do_mpage_readpage+0xddf/0x1ef0 [10051.055250] mpage_readahead+0x264/0x500 [10051.056060] read_pages+0x1c1/0xbf0 [10051.056758] page_cache_ra_unbounded+0x4c2/0x6f0 [10051.057702] do_page_cache_ra+0xda/0x110 [10051.058511] force_page_cache_ra+0x23d/0x3d0 [10051.059373] page_cache_sync_ra+0xca/0x300 [10051.060198] generic_file_buffered_read+0xbea/0x2130 [10051.061195] generic_file_read_iter+0x315/0x490 [10051.062189] blkdev_read_iter+0x113/0x1b0 [10051.063015] aio_read+0x2ad/0x450 [10051.063686] io_submit_one+0xc8e/0x1d60 [10051.064467] __se_sys_io_submit+0x125/0x350 [10051.065318] do_syscall_64+0x2d/0x40 [10051.066082] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [10051.067455] Freed by task 13307: [10051.068136] kasan_save_stack+0x19/0x40 [10051.068931] kasan_set_track+0x1c/0x30 [10051.069726] kasan_set_free_info+0x1b/0x30 [10051.070621] __kasan_slab_free+0x111/0x160 [10051.071480] kmem_cache_free+0x94/0x460 [10051.072256] mempool_free+0xd6/0x320 [10051.072985] bio_free+0xe0/0x130 [10051.073630] bio_put+0xab/0xe0 [10051.074252] bio_endio+0x3a6/0x5d0 [10051.074984] blk_update_request+0x590/0x1370 [10051.075870] scsi_end_request+0x7d/0x400 [10051.076667] scsi_io_completion+0x1aa/0xe50 [10051.077503] scsi_softirq_done+0x11b/0x240 [10051.078344] blk_mq_complete_request+0xd4/0x120 [10051.079275] scsi_mq_done+0xf0/0x200 [10051.080036] virtscsi_vq_done+0xbc/0x150 [10051.080850] vring_interrupt+0x179/0x390 [10051.081650] __handle_irq_event_percpu+0xf7/0x490 [10051.082626] handle_irq_event_percpu+0x7b/0x160 [10051.083527] handle_irq_event+0xcc/0x170 [10051.084297] handle_edge_irq+0x215/0xb20 [10051.085122] asm_call_irq_on_stack+0xf/0x20 [10051.085986] common_interrupt+0xae/0x120 [10051.086830] asm_common_interrupt+0x1e/0x40 ================================================================== Bio will be checked at beginning of submit_bio_noacct(). If bio needs to be throttled, it will start the timer and stop submit bio directly. Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires. But in the current process, if bio is throttled, it will still set bio issue->value by blkcg_bio_issue_init(). This is redundant and may cause the above use-after-free. CPU0 CPU1 submit_bio submit_bio_noacct submit_bio_checks blk_throtl_bio() <=mod_timer(&sq->pending_timer blk_throtl_dispatch_work_fn submit_bio_noacct() <= bio have throttle tag, will throw directly and bio issue->value will be set here bio_endio() bio_put() bio_free() <= free this bio blkcg_bio_issue_init(bio) <= bio has been freed and will lead to UAF return BLK_QC_T_NONE Fix this by remove extra blkcg_bio_issue_init. Fixes: e439bedf6b24 (blkcg: consolidate bio_issue_init() to be a part of core) Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Hold invalidate_lock in BLKRESETZONE ioctlShin'ichiro Kawasaki2021-11-111-10/+5
| | | | | | | | | | | | | | | | | When BLKRESETZONE ioctl and data read race, the data read leaves stale page cache. The commit e5113505904e ("block: Discard page cache of zone reset target range") added page cache truncation to avoid stale page cache after the ioctl. However, the stale page cache still can be read during the reset zone operation for the ioctl. To avoid the stale page cache completely, hold invalidate_lock of the block device file mapping. Fixes: e5113505904e ("block: Discard page cache of zone reset target range") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq: rename blk_attempt_bio_mergeMing Lei2021-11-111-4/+6
| | | | | | | | | | | | It is very annoying to have two block layer functions which share same name, so rename blk_attempt_bio_merge in blk-mq.c as blk_mq_attempt_bio_merge. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_mergeMing Lei2021-11-111-4/+0
| | | | | | | | | | | | | | | | | | | blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(), which is called when queue usage counter is grabbed already: 1) blk_mq_get_new_requests() 2) blk_mq_get_request() - cached request in current plug owns one queue usage counter So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more importantly this nest way causes hang in blk_mq_freeze_queue_wait(). Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: fix kerneldoc for disk_register_independent_access__ranges()Jens Axboe2021-11-111-2/+2
| | | | | | | | | The naming got changed as part of a revision of the patchset, but the kerneldoc apparently never got updated. Fix it. Reported-by: kernel test robot <lkp@intel.com> Fixes: a2247f19ee1c ("block: Add independent access ranges support") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add __must_check for *add_disk*() callersLuis Chamberlain2021-11-091-3/+3
| | | | | | | | | | | | Now that we have done a spring cleaning on all drivers and added error checking / handling, let's keep it that way and ensure no new drivers fail to stick with it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20211110002949.999380-1-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: use enum type for blk_mq_alloc_data->rq_flagsJens Axboe2021-11-091-1/+1
| | | | | | | | | | | | | | | | | | kernel test robot reports that we now trigger some sparse warnings: block/blk-mq.h:169:32: sparse: sparse: restricted req_flags_t degrades to integer block/blk-mq.h:169:32: sparse: sparse: restricted req_flags_t degrades to integer block/blk-mq.h:169:32: sparse: sparse: restricted req_flags_t degrades to integer which is due to ->rq_flags being an unsigned int, rather than the stronger type req_flags_t enum. Change the type to req_flags_t to silence this warning. Fixes: 56f8da642bd8 ("block: add rq_flags to struct blk_mq_alloc_data") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Hold invalidate_lock in BLKZEROOUT ioctlShin'ichiro Kawasaki2021-11-091-3/+9
| | | | | | | | | | | | | | | | | | When BLKZEROOUT ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is modified to call "blkdiscard -z" command and repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y. Rework is required for older stable kernels. Fixes: 22dd6d356628 ("block: invalidate the page cache when issuing BLKZEROOUT") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Hold invalidate_lock in BLKDISCARD ioctlShin'ichiro Kawasaki2021-11-091-3/+9
| | | | | | | | | | | | | | | | | When BLKDISCARD ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y with slight patch edit. Rework is required for older stable kernels. Fixes: 351499a172c0 ("block: Invalidate cache on discard v2") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-2-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-091-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block driver updates from Jens Axboe: - Last series adding error handling support for add_disk() in drivers. After this one, and once the SCSI side has been merged, we can finally annotate add_disk() as must_check. (Luis) - bcache fixes (Coly) - zram fixes (Ming) - ataflop locking fix (Tetsuo) - nbd fixes (Ye, Yu) - MD merge via Song - Cleanup (Yang) - sysfs fix (Guoqing) - Misc fixes (Geert, Wu, luo) * tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits) bcache: Revert "bcache: use bvec_virt" ataflop: Add missing semicolon to return statement floppy: address add_disk() error handling on probe ataflop: address add_disk() error handling on probe block: update __register_blkdev() probe documentation ataflop: remove ataflop_probe_lock mutex mtd/ubi/block: add error handling support for add_disk() block/sunvdc: add error handling support for add_disk() z2ram: add error handling support for add_disk() nvdimm/pmem: use add_disk() error handling nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned nvdimm/blk: add error handling support for add_disk() nvdimm/blk: avoid calling del_gendisk() on early failures nvdimm/btt: add error handling support for add_disk() nvdimm/btt: use goto error labels on btt_blk_init() loop: Remove duplicate assignments drbd: Fix double free problem in drbd_create_device nvdimm/btt: do not call del_gendisk() if not needed bcache: fix use-after-free problem in bcache_device_free() zram: replace fsync_bdev with sync_blockdev ...
| * block: update __register_blkdev() probe documentationLuis Chamberlain2021-11-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __register_blkdev() is used to register a probe callback, and that callback is typically used to call add_disk(). Now that we are able to capture errors for add_disk(), we need to fix those probe calls where add_disk() fails and clean up resources. We don't extend the probe call to return the error given: 1) we'd have to always special-case the case where the disk was already present, as otherwise concurrent requests to open an existing block device would fail, and this would be a userspace visible change 2) the error from ilookup() on blkdev_get_no_open() is sufficient 3) The only thing the probe call is used for is to support pre-devtmpfs, pre-udev semantics that want to create disks when their pre-created device node is accessed, and so we don't care for failures on probe there. Expand documentation for the probe callback to ensure users cleanup resources if add_disk() is used and to clarify this interface may be removed in the future. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-12-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-098-108/+217
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - Set of fixes for the batched tag allocation (Ming, me) - add_disk() error handling fix (Luis) - Nested queue quiesce fixes (Ming) - Shared tags init error handling fix (Ye) - Misc cleanups (Jean, Ming, me) * tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block: nvme: wait until quiesce is done scsi: make sure that request queue queiesce and unquiesce balanced scsi: avoid to quiesce sdev->request_queue two times blk-mq: add one API for waiting until quiesce is done blk-mq: don't free tags if the tag_set is used by other device in queue initialztion block: fix device_add_disk() kobject_create_and_add() error handling block: ensure cached plug request matches the current queue block: move queue enter logic into blk_mq_submit_bio() block: make bio_queue_enter() fast-path available inline block: split request allocation components into helpers block: have plug stored requests hold references to the queue blk-mq: update hctx->nr_active in blk_mq_end_request_batch() blk-mq: add RQF_ELV debug entry blk-mq: only try to run plug merge if request has same queue with incoming bio block: move RQF_ELV setting into allocators dm: don't stop request queue after the dm device is suspended block: replace always false argument with 'false' block: assign correct tag before doing prefetch of request blk-mq: fix redundant check of !e expression
| * | blk-mq: add one API for waiting until quiesce is doneMing Lei2021-11-091-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it is hard to switch to this style, so these drivers need one atomic flag for helping to balance quiesce and unquiesce. When quiesce is in-progress, the driver still needs to wait until the quiesce is done, so add API of blk_mq_wait_quiesce_done() for these drivers. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: don't free tags if the tag_set is used by other device in queue ↵Ye Bin2021-11-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | initialztion We got UAF report on v5.10 as follows: [ 1446.674930] ================================================================== [ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348 [ 1446.677851] [ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2 [ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn [ 1446.681448] Call Trace: [ 1446.681800] dump_stack+0x9b/0xce [ 1446.682916] print_address_description.constprop.6+0x3e/0x60 [ 1446.685999] kasan_report.cold.9+0x22/0x3a [ 1446.687186] blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.687785] blk_mq_dispatch_rq_list+0x21a/0x1d40 [ 1446.692576] __blk_mq_do_dispatch_sched+0x394/0x830 [ 1446.695758] __blk_mq_sched_dispatch_requests+0x398/0x4f0 [ 1446.698279] blk_mq_sched_dispatch_requests+0xdf/0x140 [ 1446.698967] __blk_mq_run_hw_queue+0xc0/0x270 [ 1446.699561] __blk_mq_delay_run_hw_queue+0x4cc/0x550 [ 1446.701407] blk_mq_run_hw_queue+0x13b/0x2b0 [ 1446.702593] blk_mq_sched_insert_requests+0x1de/0x390 [ 1446.703309] blk_mq_flush_plug_list+0x4b4/0x760 [ 1446.705408] blk_flush_plug_list+0x2c5/0x480 [ 1446.708471] blk_finish_plug+0x55/0xa0 [ 1446.708980] blk_throtl_dispatch_work_fn+0x23b/0x2e0 [ 1446.711236] process_one_work+0x6d4/0xfe0 [ 1446.711778] worker_thread+0x91/0xc80 [ 1446.713400] kthread+0x32d/0x3f0 [ 1446.714362] ret_from_fork+0x1f/0x30 [ 1446.714846] [ 1446.715062] Allocated by task 1: [ 1446.715509] kasan_save_stack+0x19/0x40 [ 1446.716026] __kasan_kmalloc.constprop.1+0xc1/0xd0 [ 1446.716673] blk_mq_init_tags+0x6d/0x330 [ 1446.717207] blk_mq_alloc_rq_map+0x50/0x1c0 [ 1446.717769] __blk_mq_alloc_map_and_request+0xe5/0x320 [ 1446.718459] blk_mq_alloc_tag_set+0x679/0xdc0 [ 1446.719050] scsi_add_host_with_dma.cold.3+0xa0/0x5db [ 1446.719736] virtscsi_probe+0x7bf/0xbd0 [ 1446.720265] virtio_dev_probe+0x402/0x6c0 [ 1446.720808] really_probe+0x276/0xde0 [ 1446.721320] driver_probe_device+0x267/0x3d0 [ 1446.721892] device_driver_attach+0xfe/0x140 [ 1446.722491] __driver_attach+0x13a/0x2c0 [ 1446.723037] bus_for_each_dev+0x146/0x1c0 [ 1446.723603] bus_add_driver+0x3fc/0x680 [ 1446.724145] driver_register+0x1c0/0x400 [ 1446.724693] init+0xa2/0xe8 [ 1446.725091] do_one_initcall+0x9e/0x310 [ 1446.725626] kernel_init_freeable+0xc56/0xcb9 [ 1446.726231] kernel_init+0x11/0x198 [ 1446.726714] ret_from_fork+0x1f/0x30 [ 1446.727212] [ 1446.727433] Freed by task 26992: [ 1446.727882] kasan_save_stack+0x19/0x40 [ 1446.728420] kasan_set_track+0x1c/0x30 [ 1446.728943] kasan_set_free_info+0x1b/0x30 [ 1446.729517] __kasan_slab_free+0x111/0x160 [ 1446.730084] kfree+0xb8/0x520 [ 1446.730507] blk_mq_free_map_and_requests+0x10b/0x1b0 [ 1446.731206] blk_mq_realloc_hw_ctxs+0x8cb/0x15b0 [ 1446.731844] blk_mq_init_allocated_queue+0x374/0x1380 [ 1446.732540] blk_mq_init_queue_data+0x7f/0xd0 [ 1446.733155] scsi_mq_alloc_queue+0x45/0x170 [ 1446.733730] scsi_alloc_sdev+0x73c/0xb20 [ 1446.734281] scsi_probe_and_add_lun+0x9a6/0x2d90 [ 1446.734916] __scsi_scan_target+0x208/0xc50 [ 1446.735500] scsi_scan_channel.part.3+0x113/0x170 [ 1446.736149] scsi_scan_host_selected+0x25a/0x360 [ 1446.736783] store_scan+0x290/0x2d0 [ 1446.737275] dev_attr_store+0x55/0x80 [ 1446.737782] sysfs_kf_write+0x132/0x190 [ 1446.738313] kernfs_fop_write_iter+0x319/0x4b0 [ 1446.738921] new_sync_write+0x40e/0x5c0 [ 1446.739429] vfs_write+0x519/0x720 [ 1446.739877] ksys_write+0xf8/0x1f0 [ 1446.740332] do_syscall_64+0x2d/0x40 [ 1446.740802] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1446.741462] [ 1446.741670] The buggy address belongs to the object at ffff8880185afd00 [ 1446.741670] which belongs to the cache kmalloc-256 of size 256 [ 1446.743276] The buggy address is located 16 bytes inside of [ 1446.743276] 256-byte region [ffff8880185afd00, ffff8880185afe00) [ 1446.744765] The buggy address belongs to the page: [ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac [ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0 [ 1446.747719] flags: 0x1fffff80010200(slab|head) [ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240 [ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 1446.750455] page dumped because: kasan: bad access detected [ 1446.751227] [ 1446.751445] Memory state around the buggy address: [ 1446.752102] ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.753090] ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.755065] ^ [ 1446.755589] ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.756574] ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.757566] ================================================================== Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the same host initializes it's queue successfully. However, if the second device failed to allocate memory in blk_mq_alloc_and_init_hctx() from blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(), __blk_mq_free_map_and_rqs() will be called on error path, and if 'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed while it's still used by the first device. To fix this issue we move release newly allocated hardware context from blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to release hardware context in blk_mq_init_allocated_queue. Fixes: 868f2f0b7206 ("blk-mq: dynamic h/w context count") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: fix device_add_disk() kobject_create_and_add() error handlingLuis Chamberlain2021-11-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 83cbce957446 ("block: add error handling for device_add_disk / add_disk") added error handling to device_add_disk(), however the goto label for the kobject_create_and_add() failure did not set the return value correctly, and so we can end up in a situation where kobject_create_and_add() fails but we report success. Fixes: 83cbce957446 ("block: add error handling for device_add_disk / add_disk") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211103164023.1384821-1-mcgrof@kernel.org [axboe: fold in followup fix from Wu Bo <wubo40@huawei.com>] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: ensure cached plug request matches the current queueJens Axboe2021-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're driving multiple devices, we could have pre-populated the cache for a different device. Ensure that the empty request matches the current queue. Fixes: 47c122e35d7e ("block: pre-allocate requests if plug is started and is a batch") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: move queue enter logic into blk_mq_submit_bio()Jens Axboe2021-11-044-34/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Retain the old logic for the fops based submit, but for our internal blk_mq_submit_bio(), move the queue entering logic into the core function itself. We need to be a bit careful if going into the scheduler, as a scheduler or queue mappings can arbitrarily change before we have entered the queue. Have the bio scheduler mapping do that separately, it's a very cheap operation compared to actually doing merging locking and lookups. Reviewed-by: Christoph Hellwig <hch@lst.de> [axboe: update to check merge post submit_bio_checks() doing remap...] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: make bio_queue_enter() fast-path available inlineJens Axboe2021-11-042-27/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just a prep patch for shifting the queue enter logic. This moves the expected fast path inline, and leaves __bio_queue_enter() as an out-of-line function call. We don't want to inline the latter, as it's mostly slow path code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: split request allocation components into helpersJens Axboe2021-11-041-23/+48
| | | | | | | | | | | | | | | | | | | | | | | | This is in preparation for a fix, but serves as a cleanup as well moving the cached vs regular alloc logic out of blk_mq_submit_bio(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: have plug stored requests hold references to the queueJens Axboe2021-11-042-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Requests that were stored in the cache deliberately didn't hold an enter reference to the queue, instead we grabbed one every time we pulled a request out of there. That made for awkward logic on freeing the remainder of the cached list, if needed, where we had to artificially raise the queue usage count before each free. Grab references up front for cached plug requests. That's safer, and also more efficient. Fixes: 47c122e35d7e ("block: pre-allocate requests if plug is started and is a batch") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: update hctx->nr_active in blk_mq_end_request_batch()Ming Lei2021-11-032-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of shared tags and none io sched, batched completion still may be run into, and hctx->nr_active is accounted when getting driver tag, so it has to be updated in blk_mq_end_request_batch(). Otherwise, hctx->nr_active may become same with queue depth, then hctx_may_queue() always return false, then io hang is caused. Fixes the issue by updating the counter in batched way. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102153619.3627505-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: add RQF_ELV debug entryMing Lei2021-11-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Looks it is missed so add it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: only try to run plug merge if request has same queue with incoming bioMing Lei2021-11-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | It is obvious that io merge can't be done between two different queues, so just try to run io merge in case of same queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: move RQF_ELV setting into allocatorsJens Axboe2021-11-031-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not safe to do this before blk_queue_enter(), as the scheduler state could have changed in between. Hence move the RQF_ELV setting into the allocators, where we know the queue is already entered. Suggested-by: Ming Lei <ming.lei@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: replace always false argument with 'false'Jens Axboe2021-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A previous commit fixed up the condition for doing direct issue, but that left the 'from_schedule' argument dead inside the branch. Replace it with 'false'. Fixes: ff1552232b36 ("blk-mq: don't issue request directly in case that current is to be blocked") Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: assign correct tag before doing prefetch of requestJens Axboe2021-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that current tag is correctly assigned before attempting to prefetch the first cacheline of the request. Fixes: 92aff191cc5b ("block: prefetch request to be initialized") Reported-and-tested-by: syzbot+cd20829ac44b92bf6ed0@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: fix redundant check of !e expressionJean Sacren2021-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the if branch, e is checked. In the else branch, ->dispatch_busy is merely a number and has no effect on !e. We should remove the check of !e since it is always true. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Link: https://lore.kernel.org/r/20211029202945.3052-1-sakiwit@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-091-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more bdev size updates from Jens Axboe: "Two followup changes for the bdev-size series from this merge window: - Add loff_t cast to bdev_nr_bytes() (Christoph) - Use bdev_nr_bytes() consistently for the block parts at least (me)" * tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-block: block: use new bdev_nr_bytes() helper for blkdev_{read,write}_iter() block: add a loff_t cast to bdev_nr_bytes
| * | | block: use new bdev_nr_bytes() helper for blkdev_{read,write}_iter()Jens Axboe2021-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have new helpers for this, use them rather than the slower inode size reads. This makes the read/write path consistent with most of the rest of block as well. Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/a72767cd-3c6d-47f7-80f4-aa025a17b2cb@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-011-8/+20
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block inode sync updates from Jens Axboe: "This contains improvements to how bdev inode syncing is handled, unifying the API" * tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block: block: simplify the block device syncing code ntfs3: use sync_blockdev_nowait fat: use sync_blockdev_nowait btrfs: use sync_blockdev xen-blkback: use sync_blockdev block: remove __sync_blockdev fs: remove __sync_filesystem
| * | | | block: simplify the block device syncing codeChristoph Hellwig2021-10-221-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of the indirections and just provide a sync_bdevs helper for the generic sync code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: remove __sync_blockdevChristoph Hellwig2021-10-221-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead offer a new sync_blockdev_nowait helper for the !wait case. This new helper is exported as it will grow modular callers in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | Merge tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-011-2/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kiocb->ki_complete() cleanup from Jens Axboe: "This removes the res2 argument from kiocb->ki_complete(). Only the USB gadget code used it, everybody else passes 0. The USB guys checked the user gadget code they could find, and everybody just uses res as expected for the async interface" * tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block: fs: get rid of the res2 iocb->ki_complete argument usb: remove res2 argument from gadget code completions
| * | | | | fs: get rid of the res2 iocb->ki_complete argumentJens Axboe2021-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | Merge tag 'for-5.16/passthrough-flag-2021-10-29' of ↵Linus Torvalds2021-11-013-48/+13
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.dk/linux-block Pull QUEUE_FLAG_SCSI_PASSTHROUGH removal from Jens Axboe: "This contains a series leading to the removal of the QUEUE_FLAG_SCSI_PASSTHROUGH queue flag" * tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block: block: remove blk_{get,put}_request block: remove QUEUE_FLAG_SCSI_PASSTHROUGH block: remove the initialize_rq_fn blk_mq_ops method scsi: add a scsi_alloc_request helper bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fn nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands sd: implement ->get_unique_id block: add a ->get_unique_id method
| * | | | | | block: remove blk_{get,put}_requestChristoph Hellwig2021-10-291-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are now pointless wrappers around blk_mq_{alloc,free}_request, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | block: remove QUEUE_FLAG_SCSI_PASSTHROUGHChristoph Hellwig2021-10-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export scsi_device_from_queue for use with pktcdvd and use that instead of the otherwise unused QUEUE_FLAG_SCSI_PASSTHROUGH queue flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | block: remove the initialize_rq_fn blk_mq_ops methodChristoph Hellwig2021-10-221-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entirely unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fnChristoph Hellwig2021-10-221-19/+13
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Directly initialize the bsg_job structure instead of relying on the ->.initialize_rq_fn indirection. This also removes the superflous initialization of the second request used for BIDI requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-016-23/+22
|\ \ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bdev size cleanups from Jens Axboe: "Clean up the bdev size handling with new bdev_nr_bytes() helper" * tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits) partitions/ibm: use bdev_nr_sectors instead of open coding it partitions/efi: use bdev_nr_bytes instead of open coding it block/ioctl: use bdev_nr_sectors and bdev_nr_bytes block: cache inode size in bdev udf: use sb_bdev_nr_blocks reiserfs: use sb_bdev_nr_blocks ntfs: use sb_bdev_nr_blocks jfs: use sb_bdev_nr_blocks ext4: use sb_bdev_nr_blocks block: add a sb_bdev_nr_blocks helper block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate squashfs: use bdev_nr_bytes instead of open coding it reiserfs: use bdev_nr_bytes instead of open coding it pstore/blk: use bdev_nr_bytes instead of open coding it ntfs3: use bdev_nr_bytes instead of open coding it nilfs2: use bdev_nr_bytes instead of open coding it nfs/blocklayout: use bdev_nr_bytes instead of open coding it jfs: use bdev_nr_bytes instead of open coding it hfsplus: use bdev_nr_sectors instead of open coding it hfs: use bdev_nr_sectors instead of open coding it ...
| * | | | | partitions/ibm: use bdev_nr_sectors instead of open coding itChristoph Hellwig2021-10-191-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the proper helper to read the block device size and switch various places to pass the size in terms of sectors which is more practical. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-4-hch@lst.de [axboe: fix comment typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | partitions/efi: use bdev_nr_bytes instead of open coding itChristoph Hellwig2021-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block/ioctl: use bdev_nr_sectors and bdev_nr_bytesChristoph Hellwig2021-10-191-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block: cache inode size in bdevJens Axboe2021-10-182-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reading the inode size brings in a new cacheline for IO submit, and it's in the hot path being checked for every single IO. When doing millions of IOs per core per second, this is noticeable overhead. Cache the nr_sectors in the bdev itself. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block: use bdev_nr_bytes instead of open coding it in blkdev_fallocateChristoph Hellwig2021-10-181-1/+1
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211018101130.1838532-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-0147-2272/+3127
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
| * | | | blk-mq-debugfs: Show active requests per queue for shared tagsJohn Garry2021-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we show the hctx.active value for the per-hctx "active" file. However this is not maintained for shared tags, and we instead keep a record of the number active requests per request queue - see commit f1b49fdc1c64 ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap). Change for the case of shared tags to show the active requests per request queue by using __blk_mq_active_requests() helper. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1635496823-33515-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: improve readability of blk_mq_end_request_batch()Jens Axboe2021-10-281-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's faster and easier to read if we tolerate cur_hctx being NULL in the "when to flush" condition. Rename last_hctx to cur_hctx while at it, as it better describes the role of that variable. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: re-flow blk_mq_rq_ctx_init()Jens Axboe2021-10-271-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have flags passed in, we can do a final re-arrange of the flow of blk_mq_rq_ctx_init() so we're always writing request in the order in which it is laid out. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>