summaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAgeFilesLines
* xen/blkfront: harden blkfront against event channel stormsJuergen Gross2021-12-161-3/+12
| | | | | | | | | | | The Xen blkfront driver is still vulnerable for an attack via excessive number of events sent by the backend. Fix that by using lateeoi event channels. This is part of XSA-391 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
* Merge tag 'block-5.16-2021-12-03' of git://git.kernel.dk/linux-blockLinus Torvalds2021-12-041-1/+1
|\ | | | | | | | | | | | | | | Pull block fix from Jens Axboe: "A single fix for repeated printk spam from loop" * tag 'block-5.16-2021-12-03' of git://git.kernel.dk/linux-block: loop: Use pr_warn_once() for loop_control_remove() warning
| * loop: Use pr_warn_once() for loop_control_remove() warningTetsuo Handa2021-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | kernel test robot reported that RCU stall via printk() flooding is possible [1] when stress testing. Link: https://lkml.kernel.org/r/20211129073709.GA18483@xsang-OptiPlex-9020 [1] Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2021-11-281-2/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin: "Misc fixes all over the place. Revert of virtio used length validation series: the approach taken does not seem to work, breaking too many guests in the process. We'll need to do length validation using some other approach" [ This merge also ends up reverting commit f7a36b03a732 ("vsock/virtio: suppress used length validation"), which came in through the networking tree in the meantime, and was part of that whole used length validation series - Linus ] * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa_sim: avoid putting an uninitialized iova_domain vhost-vdpa: clean irqs before reseting vdpa device virtio-blk: modify the value type of num in virtio_queue_rq() vhost/vsock: cleanup removing `len` variable vhost/vsock: fix incorrect used length reported to the guest Revert "virtio_ring: validate used buffer length" Revert "virtio-net: don't let virtio core to validate used length" Revert "virtio-blk: don't let virtio core to validate used length" Revert "virtio-scsi: don't let virtio core to validate used buffer length"
| * virtio-blk: modify the value type of num in virtio_queue_rq()Ye Guojin2021-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was found by coccicheck: ./drivers/block/virtio_blk.c, 334, 14-17, WARNING Unsigned expression compared with zero num < 0 Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Link: https://lore.kernel.org/r/20211117063955.160777-1-ye.guojin@zte.com.cn Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 02746e26c39e ("virtio-blk: avoid preallocating big SGL for data") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
| * Revert "virtio-blk: don't let virtio core to validate used length"Michael S. Tsirkin2021-11-241-1/+0
| | | | | | | | | | | | | | | | | | This reverts commit a40392edf1b2c7822bc0ce68413106661a9d4232. Attempts to validate length in the core did not work out. We'll drop them, so revert the dependent changes in drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACKJens Axboe2021-11-261-0/+2
|/ | | | | | | | | | | | | | If writeback isn't configured, then we get the following warning when compiling zram: drivers/block/zram/zram_drv.c:1824:45: warning: unused variable 'zram_wb_devops' [-Wunused-const-variable] Make sure we only define the block_device_operations if that option is enabled. Link: https://lore.kernel.org/lkml/202111261614.gCJMqcyh-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-0911-74/+143
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block driver updates from Jens Axboe: - Last series adding error handling support for add_disk() in drivers. After this one, and once the SCSI side has been merged, we can finally annotate add_disk() as must_check. (Luis) - bcache fixes (Coly) - zram fixes (Ming) - ataflop locking fix (Tetsuo) - nbd fixes (Ye, Yu) - MD merge via Song - Cleanup (Yang) - sysfs fix (Guoqing) - Misc fixes (Geert, Wu, luo) * tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits) bcache: Revert "bcache: use bvec_virt" ataflop: Add missing semicolon to return statement floppy: address add_disk() error handling on probe ataflop: address add_disk() error handling on probe block: update __register_blkdev() probe documentation ataflop: remove ataflop_probe_lock mutex mtd/ubi/block: add error handling support for add_disk() block/sunvdc: add error handling support for add_disk() z2ram: add error handling support for add_disk() nvdimm/pmem: use add_disk() error handling nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned nvdimm/blk: add error handling support for add_disk() nvdimm/blk: avoid calling del_gendisk() on early failures nvdimm/btt: add error handling support for add_disk() nvdimm/btt: use goto error labels on btt_blk_init() loop: Remove duplicate assignments drbd: Fix double free problem in drbd_create_device nvdimm/btt: do not call del_gendisk() if not needed bcache: fix use-after-free problem in bcache_device_free() zram: replace fsync_bdev with sync_blockdev ...
| * ataflop: Add missing semicolon to return statementGeert Uytterhoeven2021-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/block/ataflop.c: In function ‘ataflop_probe’: drivers/block/ataflop.c:2023:2: error: expected expression before ‘if’ 2023 | if (ataflop_alloc_disk(drive, type)) | ^~ drivers/block/ataflop.c:2023:2: error: ‘return’ with a value, in function returning void [-Werror=return-type] drivers/block/ataflop.c:2011:13: note: declared here 2011 | static void ataflop_probe(dev_t dev) | ^~~~~~~~~~~~~ Fixes: 46a7db492e7a2740 ("ataflop: address add_disk() error handling on probe") Reported-by: noreply@ellerman.id.au Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20211106185549.1578444-1-geert@linux-m68k.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * floppy: address add_disk() error handling on probeLuis Chamberlain2021-11-041-4/+13
| | | | | | | | | | | | | | | | | | | | | | We need to cleanup resources on the probe() callback registered with __register_blkdev(), now that add_disk() error handling is supported. Address this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-14-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * ataflop: address add_disk() error handling on probeLuis Chamberlain2021-11-041-6/+12
| | | | | | | | | | | | | | | | | | | | | | We need to cleanup resources on the probe() callback registered with __register_blkdev(), now that add_disk() error handling is supported. Address this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-13-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * ataflop: remove ataflop_probe_lock mutexTetsuo Handa2021-11-041-20/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bf9c0538e485b591 ("ataflop: use a separate gendisk for each media format") introduced ataflop_probe_lock mutex, but forgot to unlock the mutex when atari_floppy_init() (i.e. module loading) succeeded. This will result in double lock deadlock if ataflop_probe() is called. Also, unregister_blkdev() must not be called from atari_floppy_init() with ataflop_probe_lock held when atari_floppy_init() failed, for ataflop_probe() waits for ataflop_probe_lock with major_names_lock held (i.e. AB-BA deadlock). __register_blkdev() needs to be called last in order to avoid calling ataflop_probe() when atari_floppy_init() is about to fail, for memory for completing already-started ataflop_probe() safely will be released as soon as atari_floppy_init() released ataflop_probe_lock mutex. As with commit 8b52d8be86d72308 ("loop: reorder loop_exit"), unregister_blkdev() needs to be called first in order to avoid calling ataflop_alloc_disk() from ataflop_probe() after del_gendisk() from atari_floppy_exit(). By relocating __register_blkdev() / unregister_blkdev() as explained above, we can remove ataflop_probe_lock mutex, for probe function and __exit function are serialized by major_names_lock mutex. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: bf9c0538e485b591 ("ataflop: use a separate gendisk for each media format") Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20211103230437.1639990-11-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block/sunvdc: add error handling support for add_disk()Luis Chamberlain2021-11-041-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. We re-use the same free tag call, so we also add a label for that as well. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-9-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * z2ram: add error handling support for add_disk()Luis Chamberlain2021-11-041-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Only the disk is cleaned up inside z2ram_register_disk() as the caller deals with the rest. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-8-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * loop: Remove duplicate assignmentsluo penghao2021-11-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assignment and operation there will be overwritten later, so it should be deleted. The clang_analyzer complains as follows: drivers/block/loop.c:2330:2 warning: Value stored to 'err' is never read change in v2: Repair the sending email box Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211104064546.3074-1-luo.penghao@zte.com.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * drbd: Fix double free problem in drbd_create_deviceWu Bo2021-11-041-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In drbd_create_device(), the 'out_no_io_page' lable has called blk_cleanup_disk() when return failed. So remove the 'out_cleanup_disk' lable to avoid double free the disk pointer. Fixes: e92ab4eda516 ("drbd: add error handling support for add_disk()") Signed-off-by: Wu Bo <wubo40@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1636013229-26309-1-git-send-email-wubo40@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * zram: replace fsync_bdev with sync_blockdevMing Lei2021-11-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | When calling fsync_bdev(), zram driver guarantees that the bdev won't be opened by anyone, then there can't be one active fs/superblock over the zram bdev, so replace fsync_bdev with sync_blockdev. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * zram: avoid race between zram_remove and disksize_storeMing Lei2021-11-021-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After resetting device in zram_remove(), disksize_store still may come and allocate resources again before deleting gendisk, fix the race by resetting zram after del_gendisk() returns. At that time, disksize_store can't come any more. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * zram: don't fail to remove zram during unloading moduleMing Lei2021-11-021-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the zram module is being unloaded, no one should be using the zram disks. However even while being unloaded the zram module's sysfs attributes might be poked at to re-configure zram devices. This is expected, and kernfs ensures that these operations complete before device_del() completes. But reset_store() may set ->claim which will fail zram_remove(), when this happens, zram_reset_device() is bypassed, and zram->comp can't be destroyed, so the warning of 'Error: Removing state 63 which has instances left.' is triggered during unloading module, together with memory leak and sort of thing. Fixes the issue by not failing zram_remove() if ->claim is set, and we actually need to do nothing in case that zram_reset() is running since del_gendisk() will wait until zram_reset() is done. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * zram: fix race between zram_reset_device() and disksize_store()Ming Lei2021-11-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the ->init_lock is released in zram_reset_device(), disksize_store() can come in and try to allocate meta, but zram_reset_device() is freeing free meta, so cause races. Link: https://lore.kernel.org/linux-block/20210927163805.808907-1-mcgrof@kernel.org/T/#mc617f865a3fa2778e40f317ddf48f6447c20c073 Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nbd: error out if socket index doesn't match in nbd_handle_reply()Yu Kuai2021-11-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | commit fcf3d633d8e1 ("nbd: check sock index in nbd_read_stat()") just add error message when socket index doesn't match. Since the request and reply must be transmitted over the same socket, it's ok to error out in such situation. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20211101092538.1155842-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nbd: Fix hungtask when nbd_config_putYe Bin2021-11-021-20/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got follow issue: [ 247.381177] INFO: task kworker/u10:0:47 blocked for more than 120 seconds. [ 247.382644] Not tainted 4.19.90-dirty #140 [ 247.383502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.385027] Call Trace: [ 247.388384] schedule+0xb8/0x3c0 [ 247.388966] schedule_timeout+0x2b4/0x380 [ 247.392815] wait_for_completion+0x367/0x510 [ 247.397713] flush_workqueue+0x32b/0x1340 [ 247.402700] drain_workqueue+0xda/0x3c0 [ 247.403442] destroy_workqueue+0x7b/0x690 [ 247.405014] nbd_config_put.cold+0x2f9/0x5b6 [ 247.405823] recv_work+0x1fd/0x2b0 [ 247.406485] process_one_work+0x70b/0x1610 [ 247.407262] worker_thread+0x5a9/0x1060 [ 247.408699] kthread+0x35e/0x430 [ 247.410918] ret_from_fork+0x1f/0x30 We can reproduce issue as follows: 1. Inject memory fault in nbd_start_device -1244,10 +1248,18 @@ static int nbd_start_device(struct nbd_device *nbd) nbd_dev_dbg_init(nbd); for (i = 0; i < num_connections; i++) { struct recv_thread_args *args; - - args = kzalloc(sizeof(*args), GFP_KERNEL); + + if (i == 1) { + args = NULL; + printk("%s: inject malloc error\n", __func__); + } + else + args = kzalloc(sizeof(*args), GFP_KERNEL); 2. Inject delay in recv_work -757,6 +760,8 @@ static void recv_work(struct work_struct *work) blk_mq_complete_request(blk_mq_rq_from_pdu(cmd)); } + printk("%s: comm=%s pid=%d\n", __func__, current->comm, current->pid); + mdelay(5 * 1000); nbd_config_put(nbd); atomic_dec(&config->recv_threads); wake_up(&config->recv_wq); 3. Create nbd server nbd-server 8000 /tmp/disk 4. Create nbd client nbd-client localhost 8000 /dev/nbd1 Then will trigger above issue. Reason is when add delay in recv_work, lead to release the last reference of 'nbd->config_refs'. nbd_config_put will call flush_workqueue to make all work finish. Obviously, it will lead to deadloop. To solve this issue, according to Josef's suggestion move 'recv_work' init from start device to nbd_dev_add, then destroy 'recv_work'when nbd device teardown. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20211102015237.2309763-5-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_addYe Bin2021-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | If first_minor is illegal will goto out_free_idr label, this will miss cleanup disk. Fixes: b1a811633f73 ("block: nbd: add sanity check for first_minor") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20211102015237.2309763-4-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nbd: fix possible overflow for 'first_minor' in nbd_dev_add()Yu Kuai2021-11-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | If 'part_shift' is not zero, then 'index << part_shift' might overflow to a value that is not greater than '0xfffff', then sysfs might complains about duplicate creation. Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20211102015237.2309763-3-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nbd: fix max value for 'first_minor'Yu Kuai2021-11-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b1a811633f73 ("block: nbd: add sanity check for first_minor") checks that 'first_minor' should not be greater than 0xff, which is wrong. Whitout the commit, the details that when user pass 0x100000, it ends up create sysfs dir "/sys/block/43:0" are as follows: nbd_dev_add disk->first_minor = index << part_shift -> default part_shift is 5, first_minor is 0x2000000 device_add_disk ddev->devt = MKDEV(disk->major, disk->first_minor) -> (0x2b << 20) | (0x2000000) = 0x2b00000 device_add device_create_sys_dev_entry format_dev_t sprintf(buffer, "%u:%u", MAJOR(dev), MINOR(dev)); -> got 43:0 sysfs_create_link -> /sys/block/43:0 By the way, with the wrong fix, when part_shift is the default value, only 8 ndb devices can be created since 8 << 5 is greater than 0xff. Since the max bits for 'first_minor' should be the same as what MKDEV() does, which is 20. Change the upper bound of 'first_minor' from 0xff to 0xfffff. Fixes: b1a811633f73 ("block: nbd: add sanity check for first_minor") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20211102015237.2309763-2-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block/brd: add error handling support for add_disk()Luis Chamberlain2021-10-301-2/+7
| | | | | | | | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211015235219.2191207-2-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * ps3vram: add error handling support for add_disk()Luis Chamberlain2021-10-301-1/+6
| | | | | | | | | | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Geoff Levand <geoff@infradead.org> Link: https://lore.kernel.org/r/20211015235219.2191207-12-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * ps3disk: add error handling support for add_disk()Luis Chamberlain2021-10-301-2/+6
| | | | | | | | | | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Geoff Levand <geoff@infradead.org> Link: https://lore.kernel.org/r/20211015235219.2191207-11-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * zram: add error handling support for add_disk()Luis Chamberlain2021-10-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211015235219.2191207-9-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2021-11-061-18/+48
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...
| * | zram: introduce an aged idle interfaceBrian Geffon2021-11-061-16/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change introduces an aged idle interface to the existing idle sysfs file for zram. When CONFIG_ZRAM_MEMORY_TRACKING is enabled the idle file now also accepts an integer argument. This integer is the age (in seconds) of pages to mark as idle. The idle file still supports 'all' as it always has. This new approach allows for much more control over which pages get marked as idle. [bgeffon@google.com: use IS_ENABLED and cleanup comment] Link: https://lkml.kernel.org/r/20210924161128.1508015-1-bgeffon@google.com [bgeffon@google.com: Sergey's cleanup suggestions] Link: https://lkml.kernel.org/r/20210929143056.13067-1-bgeffon@google.com Link: https://lkml.kernel.org/r/20210923130115.1344361-1-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | zram: off by one in read_block_state()Dan Carpenter2021-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snprintf() returns the number of bytes it would have printed if there were space. But it does not count the NUL terminator. So that means that if "count == copied" then this has already overflowed by one character. This bug likely isn't super harmful in real life. Link: https://lkml.kernel.org/r/20210916130404.GA25094@kili Fixes: c0265342bff4 ("zram: introduce zram memory tracking") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | zram_drv: allow reclaim on bio_allocJaewon Kim2021-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The read_from_bdev_async is not called on atomic context. So GFP_NOIO is available rather than GFP_ATOMIC. If there were reclaimable pages with GFP_NOIO, we can avoid allocation failure and page fault failure. Link: https://lkml.kernel.org/r/20210908005241.28062-1-jaewon31.kim@samsung.com Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2021-11-032-59/+120
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "vhost and virtio fixes and features: - Hardening work by Jason - vdpa driver for Alibaba ENI - Performance tweaks for virtio blk - virtio rng rework using an internal buffer - mac/mtu programming for mlx5 vdpa - Misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (45 commits) vdpa/mlx5: Forward only packets with allowed MAC address vdpa/mlx5: Support configuration of MAC vdpa/mlx5: Fix clearing of VIRTIO_NET_F_MAC feature bit vdpa_sim_net: Enable user to set mac address and mtu vdpa: Enable user to set mac and mtu of vdpa device vdpa: Use kernel coding style for structure comments vdpa: Introduce query of device config layout vdpa: Introduce and use vdpa device get, set config helpers virtio-scsi: don't let virtio core to validate used buffer length virtio-blk: don't let virtio core to validate used length virtio-net: don't let virtio core to validate used length virtio_ring: validate used buffer length virtio_blk: correct types for status handling virtio_blk: allow 0 as num_request_queues i2c: virtio: Add support for zero-length requests virtio-blk: fixup coccinelle warnings virtio_ring: fix typos in vring_desc_extra virtio-pci: harden INTX interrupts virtio_pci: harden MSI-X interrupts virtio_config: introduce a new .enable_cbs method ...
| * | | virtio-blk: don't let virtio core to validate used lengthJason Wang2021-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We never tries to use used length, so the patch prevents the virtio core from validating used length. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027022107.14357-4-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | | virtio_blk: correct types for status handlingMichael S. Tsirkin2021-11-011-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtblk_setup_cmd returns blk_status_t in an int, callers then assign it back to a blk_status_t variable. blk_status_t is either u32 or (more typically) u8 so it works, but is inelegant and causes sparse warnings. Pass the status in blk_status_t in a consistent way. Reported-by: kernel test robot <lkp@intel.com> Fixes: b2c5221fd074 ("virtio-blk: avoid preallocating big SGL for data") Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * | | virtio_blk: allow 0 as num_request_queuesMichael S. Tsirkin2021-11-011-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The default value is 0 meaning "no limit". However if 0 is specified on the command line it is instead silently converted to 1. Further, the value is already validated at point of use, there's no point in duplicating code validating the value when it is set. Simplify the code while making the behaviour more consistent by using plain module_param. Fixes: 1a662cf6cb9a ("virtio-blk: add num_request_queues module parameter") Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | | virtio-blk: fixup coccinelle warningsYe Guojin2021-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | coccicheck complains about the use of snprintf() in sysfs show functions: WARNING use scnprintf or sprintf Use sysfs_emit instead of scnprintf or sprintf makes more sense. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Link: https://lore.kernel.org/r/20211021065111.1047824-1-ye.guojin@zte.com.cn Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
| * | | virtio_blk: Fix spelling mistake: "advertisted" -> "advertised"Colin Ian King2021-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a spelling mistake in a dev_err error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20211025102240.22801-1-colin.i.king@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * | | virtio-blk: validate num_queues during probeJason Wang2021-11-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an untrusted device neogitates BLK_F_MQ but advertises a zero num_queues, the driver may end up trying to allocating zero size buffers where ZERO_SIZE_PTR is returned which may pass the checking against the NULL. This will lead unexpected results. Fixing this by failing the probe in this case. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | | virtio-blk: add num_request_queues module parameterMax Gurtovoy2021-11-011-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes a user would like to control the amount of request queues to be created for a block device. For example, for limiting the memory footprint of virtio-blk devices. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210902204622.54354-1-mgurtovoy@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | | virtio-blk: avoid preallocating big SGL for dataMax Gurtovoy2021-11-012-56/+101
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to pre-allocate a big buffer for the IO SGL anymore. If a device has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For HW virtio-blk device, nr_hw_queues can be 64 or 128 and each queue's depth might be 128. This means the resulting preallocation for the data SGLs is big. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe drivers so it should be reasonable for virtio block as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't support SG_CHAIN, use only runtime allocation for the SGL. Re-organize the setup of the IO request to fit the new sg chain mechanism. No performance degradation was seen (fio libaio engine with 16 jobs and 128 iodepth): IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after) -------- --------------------------------- ---------------------------------- 512B 318K/316K 329K/325K 4KB 323K/321K 353K/349K 16KB 199K/208K 250K/275K 128KB 36K/36.1K 39.2K/41.7K Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Link: https://lore.kernel.org/r/20210901131434.31158-1-mgurtovoy@nvidia.com Reviewed-by: Feng Li <lifeng1519@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> # kconfig fixups Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | | Merge tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-011-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block inode sync updates from Jens Axboe: "This contains improvements to how bdev inode syncing is handled, unifying the API" * tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block: block: simplify the block device syncing code ntfs3: use sync_blockdev_nowait fat: use sync_blockdev_nowait btrfs: use sync_blockdev xen-blkback: use sync_blockdev block: remove __sync_blockdev fs: remove __sync_filesystem
| * | | xen-blkback: use sync_blockdevChristoph Hellwig2021-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use sync_blockdev instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211019062530.2174626-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-11-011-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kiocb->ki_complete() cleanup from Jens Axboe: "This removes the res2 argument from kiocb->ki_complete(). Only the USB gadget code used it, everybody else passes 0. The USB guys checked the user gadget code they could find, and everybody just uses res as expected for the async interface" * tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block: fs: get rid of the res2 iocb->ki_complete argument usb: remove res2 argument from gadget code completions
| * | | | fs: get rid of the res2 iocb->ki_complete argumentJens Axboe2021-10-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | Merge tag 'for-5.16/passthrough-flag-2021-10-29' of ↵Linus Torvalds2021-11-014-8/+11
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.dk/linux-block Pull QUEUE_FLAG_SCSI_PASSTHROUGH removal from Jens Axboe: "This contains a series leading to the removal of the QUEUE_FLAG_SCSI_PASSTHROUGH queue flag" * tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block: block: remove blk_{get,put}_request block: remove QUEUE_FLAG_SCSI_PASSTHROUGH block: remove the initialize_rq_fn blk_mq_ops method scsi: add a scsi_alloc_request helper bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fn nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands sd: implement ->get_unique_id block: add a ->get_unique_id method
| * | | | | block: remove blk_{get,put}_requestChristoph Hellwig2021-10-293-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are now pointless wrappers around blk_mq_{alloc,free}_request, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block: remove QUEUE_FLAG_SCSI_PASSTHROUGHChristoph Hellwig2021-10-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export scsi_device_from_queue for use with pktcdvd and use that instead of the otherwise unused QUEUE_FLAG_SCSI_PASSTHROUGH queue flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | scsi: add a scsi_alloc_request helperChristoph Hellwig2021-10-222-2/+2
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new helper that calls blk_get_request and initializes the scsi_request to avoid the indirect call through ->.initialize_rq_fn. Note that this makes the pktcdvd driver depend on the SCSI core, but given that only SCSI devices support SCSI passthrough requests that is not a functional change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>