summaryrefslogtreecommitdiffstats
path: root/drivers/block/virtio_blk.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch '6.5/scsi-staging' into 6.5/scsi-fixesMartin K. Petersen2023-07-111-19/+15
|\ | | | | | | | | | | Pull in the currently staged SCSI fixes for 6.5. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: block: virtio_blk: Set zone limits before revalidating zonesDamien Le Moal2023-07-051-19/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In virtblk_probe_zoned_device(), execute blk_queue_chunk_sectors() and blk_queue_max_zone_append_sectors() to respectively set the zoned device zone size and maximum zone append sector limit before executing blk_revalidate_disk_zones(). This is to allow the block layer zone reavlidation to check these device characteristics prior to checking all zones of the device. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230703024812.76778-5-dlemoal@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Revert "virtio-blk: support completion batching for the IRQ path"Michael S. Tsirkin2023-06-211-45/+37
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. This change appears to have broken things... We now see applications hanging during disk accesses. e.g. multi-port virtio-blk device running in h/w (FPGA) Host running a simple 'fio' test. [global] thread=1 direct=1 ioengine=libaio norandommap=1 group_reporting=1 bs=4K rw=read iodepth=128 runtime=1 numjobs=4 time_based [job0] filename=/dev/vda [job1] filename=/dev/vdb [job2] filename=/dev/vdc ... [job15] filename=/dev/vdp i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads This is repeatedly run in a loop. After a few, normally <10 seconds, fio hangs. With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. Last message: fio-3.19 Starting 8 threads Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] I think this means at the end of the run 1 queue was left incomplete. 'diskstats' (run while fio is hung) shows no outstanding transactions. e.g. $ cat /proc/diskstats ... 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 ... Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. e.g. PF= 0 vq=0 1 2 3 [a]request_count - 839416590 813148916 105586179 84988123 [b]completion1_count - 839416590 813148916 105586179 84988123 [c]completion2_count - 0 0 0 0 PF= 1 vq=0 1 2 3 [a]request_count - 823335887 812516140 104582672 75856549 [b]completion1_count - 823335887 812516140 104582672 75856549 [c]completion2_count - 0 0 0 0 i.e. the issue is after the virtio-blk driver. This change was introduced in kernel 6.3.0. I am seeing this using 6.3.3. If I run with an earlier kernel (5.15), it does not occur. If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. e.g. kernel 5.15 - this is OK virtio_blk.c,virtblk_done() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { blk_mq_complete_request(req); } kernel 6.3.3 - this fails virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } If I do, kernel 6.3.3 - this is OK virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { virtblk_request_done(req); //force this here... if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } Perhaps you might like to fix/test/revert this change... Martin Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/ Cc: Suwan Kim <suwan.kim027@gmail.com> Tested-by: edliaw@google.com Reported-by: "Roberts, Martin" <martin.roberts@intel.com> Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio-blk: fix ZBD probe in kernels without ZBD supportDmitry Fomichev2023-04-041-16/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the kernel is built without support for zoned block devices, virtio-blk probe needs to error out any host-managed device scans to prevent such devices from appearing in the system as non-zoned. The current virtio-blk code simply bypasses all ZBD checks if CONFIG_BLK_DEV_ZONED is not defined and this leads to host-managed block devices being presented as non-zoned in the OS. This is one of the main problems this patch series is aimed to fix. In this patch, make VIRTIO_BLK_F_ZONED feature defined even when CONFIG_BLK_DEV_ZONED is not. This change makes the code compliant with the voted revision of virtio-blk ZBD spec. Modify the probe code to look at the situation when VIRTIO_BLK_F_ZONED is negotiated in a kernel that is built without ZBD support. In this case, the code checks the zoned model of the device and fails the probe is the device is host-managed. The patch also adds the comment to clarify that the call to perform the zoned device probe is correctly placed after virtio_device ready(). Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Message-Id: <20230330214953.1088216-3-dmitry.fomichev@wdc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio-blk: fix to match virtio specDmitry Fomichev2023-04-041-81/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The merged patch series to support zoned block devices in virtio-blk is not the most up to date version. The merged patch can be found at https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomichev@wdc.com/ but the latest and reviewed version is https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomichev@wdc.com/ The reason is apparently that the correct mailing lists and maintainers were not copied. The differences between the two are mostly cleanups, but there is one change that is very important in terms of compatibility with the approved virtio-zbd specification. Before it was approved, the OASIS virtio spec had a change in VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the current virtio-blk driver code. In the running code, the status is the first byte of the in-header that is followed by some pad bytes and the u64 that carries the sector at which the data has been written to the zone back to the driver, aka the append sector. This layout turned out to be problematic for implementing in QEMU and the request status byte has been eventually made the last byte of the in-header. The current code doesn't expect that and this causes the append sector value always come as zero to the block layer. This needs to be fixed ASAP. Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Message-Id: <20230330214953.1088216-2-dmitry.fomichev@wdc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2023-02-251-54/+414
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
| * virtio-blk: support completion batching for the IRQ pathSuwan Kim2023-02-201-37/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds completion batching to the IRQ path. It reuses batch completion code of virtblk_poll(). It collects requests to io_comp_batch and processes them all at once. It can boost up the performance by 2%. To validate the performance improvement and stabilty, I did fio test with 4 vCPU VM and 12 vCPU VM respectively. Both VMs have 8GB ram and the same number of HW queues as vCPU. The fio cammad is as follows and I ran the fio 5 times and got IOPS average. (io_uring, randread, direct=1, bs=512, iodepth=64 numjobs=2,4) Test result shows about 2% improvement. 4 vcpu VM | numjobs=2 | numjobs=4 ----------------------------------------------------------- fio without patch | 367.2K IOPS | 397.6K IOPS ----------------------------------------------------------- fio with patch | 372.8K IOPS | 407.7K IOPS 12 vcpu VM | numjobs=2 | numjobs=4 ----------------------------------------------------------- fio without patch | 363.6K IOPS | 374.8K IOPS ----------------------------------------------------------- fio with patch | 373.8K IOPS | 385.3K IOPS Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20221221145456.281218-3-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-blk: set req->state to MQ_RQ_COMPLETE after polling I/O is finishedSuwan Kim2023-02-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver should set req->state to MQ_RQ_COMPLETE after it finishes to process req. But virtio-blk doesn't set MQ_RQ_COMPLETE after virtblk_poll() handles req and req->state still remains MQ_RQ_IN_FLIGHT. Fortunately so far there is no issue about it because blk_mq_end_request_batch() sets req->state to MQ_RQ_IDLE. In this patch, virblk_poll() calls blk_mq_complete_request_remote() to set req->state to MQ_RQ_COMPLETE before it adds req to a batch completion list. So it properly sets req->state after polling I/O is finished. Fixes: 4e0400525691 ("virtio-blk: support polling I/O") Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20221221145456.281218-2-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio_blk: zone append in header type tweakMichael S. Tsirkin2023-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | virtio blk returns a 64 bit append_sector in an input buffer, in LE format. This field is not tagged as LE correctly, so even though the generated code is ok, we get warnings from sparse: drivers/block/virtio_blk.c:332:33: sparse: sparse: cast to restricted __le64 Make sparse happy by using the correct type. Message-Id: <20221220125154.564265-1-mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio_blk: temporary variable type tweakMichael S. Tsirkin2023-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtblk_result returns blk_status_t which is a bitwise restricted type, so we are not supposed to stuff it in a plain int temporary variable. All we do with it is pass it on to a function expecting blk_status_t so the generated code is ok, but we get warnings from sparse: drivers/block/virtio_blk.c:326:36: sparse: sparse: incorrect type in initializer (different base types) @@ expected int status @@ +got restricted blk_status_t @@ drivers/block/virtio_blk.c:334:33: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected restricted +blk_status_t [usertype] error @@ got int status @@ Make sparse happy by using the correct type. Message-Id: <20221220124152.523531-1-mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
| * virtio-blk: add support for zoned block devicesDmitry Fomichev2023-02-151-18/+369
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for Zoned Block Devices (ZBDs) to the kernel virtio-blk driver. The patch accompanies the virtio-blk ZBD support draft that is now being proposed for standardization. The latest version of the draft is linked at https://github.com/oasis-tcs/virtio-spec/issues/143 . The QEMU zoned device code that implements these protocol extensions has been developed by Sam Li and it is currently in review at the QEMU mailing list. A number of virtblk request structure changes has been introduced to accommodate the functionality that is specific to zoned block devices and, most importantly, make room for carrying the Zoned Append sector value from the device back to the driver along with the request status. The zone-specific code in the patch is heavily influenced by NVMe ZNS code in drivers/nvme/host/zns.c, but it is simpler because the proposed virtio ZBD draft only covers the zoned device features that are relevant to the zoned functionality provided by Linux block layer. includes the following fixup: virtio-blk: fix probe without CONFIG_BLK_DEV_ZONED When building without CONFIG_BLK_DEV_ZONED, VIRTIO_BLK_F_ZONED is excluded from array of driver features. As a result virtio_has_feature panics in virtio_check_driver_offered_feature since that by design verifies that a feature we are checking for is listed in the feature array. To fix, replace the call to virtio_has_feature with a stub. Message-Id: <20221016034127.330942-3-dmitry.fomichev@wdc.com> Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20221220112340.518841-1-mst@redhat.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Debugged-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* | virtio_blk: use bvec_set_virt to initialize special_vecChristoph Hellwig2023-02-031-3/+1
|/ | | | | | | | | | | Use the bvec_set_virt helper to initialize the special_vec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20230203150634.3199647-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* virtio_blk: Fix signedness bug in virtblk_prep_rq()Rafael Mendonca2022-12-281-2/+4
| | | | | | | | | | | | | | | | The virtblk_map_data() function returns negative error codes, however, the 'nents' field of vbr->sg_table is an unsigned int, which causes the error handling not to work correctly. Cc: stable@vger.kernel.org Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Message-Id: <20221021204126.927603-1-rafaelmendsr@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Suwan Kim <suwan.kim027@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* virtio_blk: use UINT_MAX instead of -1UAngus Chen2022-12-281-1/+1
| | | | | | | | | | | | We use UINT_MAX to limit max_discard_sectors in virtblk_probe, we can use UINT_MAX to limit max_hw_sectors for consistencies. No functional change intended. Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com> Message-Id: <20221110030124.1986-1-angus.chen@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* virtio-blk: use a helper to handle request queuing errorsDmitry Fomichev2022-12-281-13/+16
| | | | | | | | | Define a new helper function, virtblk_fail_to_queue(), to clean up the error handling code in virtio_queue_rq(). Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20221016034127.330942-2-dmitry.fomichev@wdc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio-blk: replace ida_simple[get|remove] with ida_[alloc_range|free]Pankaj Raghav2022-11-301-4/+4
| | | | | | | | | | | | | ida_simple[get|remove] are deprecated, and are just wrappers to ida_[alloc_range|free]. Replace ida_simple[get|remove] with their corresponding counterparts. No functional changes. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20221130123001.25473-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2022-10-101-18/+92
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - 9k mtu perf improvements - vdpa feature provisioning - virtio blk SECURE ERASE support - fixes and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: don't try to use intxif pin is zero vDPA: conditionally read MTU and MAC in dev cfg space vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fill vDPA: check virtio device features to detect MQ vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presence vDPA: only report driver features if FEATURES_OK is set vDPA: allow userspace to query features of a vDPA device virtio_blk: add SECURE ERASE command support vp_vdpa: support feature provisioning vdpa_sim_net: support feature provisioning vdpa: device feature provisioning virtio-net: use mtu size as buffer length for big packets virtio-net: introduce and use helper function for guest gso support checks virtio: drop vp_legacy_set_queue_size virtio_ring: make vring_alloc_queue_packed prettier virtio_ring: split: Operators use unified style vhost: add __init/__exit annotations to module init/exit funcs
| * virtio_blk: add SECURE ERASE command supportAlvaro Karsz2022-10-071-18/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support for the VIRTIO_BLK_F_SECURE_ERASE VirtIO feature. A device that offers this feature can receive VIRTIO_BLK_T_SECURE_ERASE commands. A device which supports this feature has the following fields in the virtio config: - max_secure_erase_sectors - max_secure_erase_seg - secure_erase_sector_alignment max_secure_erase_sectors and secure_erase_sector_alignment are expressed in 512-byte units. Every secure erase command has the following fields: - sectors: The starting offset in 512-byte units. - num_sectors: The number of sectors. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20220921082729.2516779-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* | Merge tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linuxLinus Torvalds2022-10-071-3/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - handle number of queue changes in the TCP and RDMA drivers (Daniel Wagner) - allow changing the number of queues in nvmet (Daniel Wagner) - also consider host_iface when checking ip options (Daniel Wagner) - don't map pages which can't come from HIGHMEM (Fabio M. De Francesco) - avoid unnecessary flush bios in nvmet (Guixin Liu) - shrink and better pack the nvme_iod structure (Keith Busch) - add comment for unaligned "fake" nqn (Linjun Bao) - print actual source IP address through sysfs "address" attr (Martin Belanger) - various cleanups (Jackie Liu, Wolfram Sang, Genjian Zhang) - handle effects after freeing the request (Keith Busch) - copy firmware_rev on each init (Keith Busch) - restrict management ioctls to admin (Keith Busch) - ensure subsystem reset is single threaded (Keith Busch) - report the actual number of tagset maps in nvme-pci (Keith Busch) - small fabrics authentication fixups (Christoph Hellwig) - add common code for tagset allocation and freeing (Christoph Hellwig) - stop using the request_queue in nvmet (Christoph Hellwig) - set min_align_mask before calculating max_hw_sectors (Rishabh Bhatnagar) - send a rediscover uevent when a persistent discovery controller reconnects (Sagi Grimberg) - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi) - MD pull request via Song: - Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan. - Raid10 performance optimization, by Yu Kuai. - sbitmap wakeup hang fixes (Hugh, Keith, Jan, Yu) - IO scheduler switching quisce fix (Keith) - s390/dasd block driver updates (Stefan) - support for recovery for the ublk driver (ZiyangZhang) - rnbd drivers fixes and updates (Guoqing, Santosh, ye, Christoph) - blk-mq and null_blk map fixes (Bart) - various bcache fixes (Coly, Jilin, Jules) - nbd signal hang fix (Shigeru) - block writeback throttling fix (Yu) - optimize the passthrough mapping handling (me) - prepare block cgroups to being gendisk based (Christoph) - get rid of an old PSI hack in the block layer, moving it to the callers instead where it belongs (Christoph) - blk-throttle fixes and cleanups (Yu) - misc fixes and cleanups (Liu Shixin, Liu Song, Miaohe, Pankaj, Ping-Xiang, Wolfram, Saurabh, Li Jinlin, Li Lei, Lin, Li zeming, Miaohe, Bart, Coly, Gaosheng * tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux: (162 commits) sbitmap: fix lockup while swapping block: add rationale for not using blk_mq_plug() when applicable block: adapt blk_mq_plug() to not plug for writes that require a zone lock s390/dasd: use blk_mq_alloc_disk blk-cgroup: don't update the blkg lookup hint in blkg_conf_prep nvmet: don't look at the request_queue in nvmet_bdev_set_limits nvmet: don't look at the request_queue in nvmet_bdev_zone_mgmt_emulate_all blk-mq: use quiesced elevator switch when reinitializing queues block: replace blk_queue_nowait with bdev_nowait nvme: remove nvme_ctrl_init_connect_q nvme-loop: use the tagset alloc/free helpers nvme-loop: store the generic nvme_ctrl in set->driver_data nvme-loop: initialize sqsize later nvme-fc: use the tagset alloc/free helpers nvme-fc: store the generic nvme_ctrl in set->driver_data nvme-fc: keep ctrl->sqsize in sync with opts->queue_size nvme-rdma: use the tagset alloc/free helpers nvme-rdma: store the generic nvme_ctrl in set->driver_data nvme-tcp: use the tagset alloc/free helpers nvme-tcp: store the generic nvme_ctrl in set->driver_data ...
| * block: Change the return type of blk_mq_map_queues() into voidBart Van Assche2022-08-221-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | virtio-blk: Fix WARN_ON_ONCE in virtio_queue_rq()Suwan Kim2022-09-271-6/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a request fails at virtio_queue_rqs(), it is inserted to requeue_list and passed to virtio_queue_rq(). Then blk_mq_start_request() can be called again at virtio_queue_rq() and trigger WARN_ON_ONCE like below trace because request state was already set to MQ_RQ_IN_FLIGHT in virtio_queue_rqs() despite the failure. [ 1.890468] ------------[ cut here ]------------ [ 1.890776] WARNING: CPU: 2 PID: 122 at block/blk-mq.c:1143 blk_mq_start_request+0x8a/0xe0 [ 1.891045] Modules linked in: [ 1.891250] CPU: 2 PID: 122 Comm: journal-offline Not tainted 5.19.0+ #44 [ 1.891504] Hardware name: ChromiumOS crosvm, BIOS 0 [ 1.891739] RIP: 0010:blk_mq_start_request+0x8a/0xe0 [ 1.891961] Code: 12 80 74 22 48 8b 4b 10 8b 89 64 01 00 00 8b 53 20 83 fa ff 75 08 ba 00 00 00 80 0b 53 24 c1 e1 10 09 d1 89 48 34 5b 41 5e c3 <0f> 0b eb b8 65 8b 05 2b 39 b6 7e 89 c0 48 0f a3 05 39 77 5b 01 0f [ 1.892443] RSP: 0018:ffffc900002777b0 EFLAGS: 00010202 [ 1.892673] RAX: 0000000000000000 RBX: ffff888004bc0000 RCX: 0000000000000000 [ 1.892952] RDX: 0000000000000000 RSI: ffff888003d7c200 RDI: ffff888004bc0000 [ 1.893228] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888004bc0100 [ 1.893506] R10: ffffffffffffffff R11: ffffffff8185ca10 R12: ffff888004bc0000 [ 1.893797] R13: ffffc90000277900 R14: ffff888004ab2340 R15: ffff888003d86e00 [ 1.894060] FS: 00007ffa143a4640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 1.894412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.894682] CR2: 00005648577d9088 CR3: 00000000053da004 CR4: 0000000000170ee0 [ 1.894953] Call Trace: [ 1.895139] <TASK> [ 1.895303] virtblk_prep_rq+0x1e5/0x280 [ 1.895509] virtio_queue_rq+0x5c/0x310 [ 1.895710] ? virtqueue_add_sgs+0x95/0xb0 [ 1.895905] ? _raw_spin_unlock_irqrestore+0x16/0x30 [ 1.896133] ? virtio_queue_rqs+0x340/0x390 [ 1.896453] ? sbitmap_get+0xfa/0x220 [ 1.896678] __blk_mq_issue_directly+0x41/0x180 [ 1.896906] blk_mq_plug_issue_direct+0xd8/0x2c0 [ 1.897115] blk_mq_flush_plug_list+0x115/0x180 [ 1.897342] blk_add_rq_to_plug+0x51/0x130 [ 1.897543] blk_mq_submit_bio+0x3a1/0x570 [ 1.897750] submit_bio_noacct_nocheck+0x418/0x520 [ 1.897985] ? submit_bio_noacct+0x1e/0x260 [ 1.897989] ext4_bio_write_page+0x222/0x420 [ 1.898000] mpage_process_page_bufs+0x178/0x1c0 [ 1.899451] mpage_prepare_extent_to_map+0x2d2/0x440 [ 1.899603] ext4_writepages+0x495/0x1020 [ 1.899733] do_writepages+0xcb/0x220 [ 1.899871] ? __seccomp_filter+0x171/0x7e0 [ 1.900006] file_write_and_wait_range+0xcd/0xf0 [ 1.900167] ext4_sync_file+0x72/0x320 [ 1.900308] __x64_sys_fsync+0x66/0xa0 [ 1.900449] do_syscall_64+0x31/0x50 [ 1.900595] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1.900747] RIP: 0033:0x7ffa16ec96ea [ 1.900883] Code: b8 4a 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 e3 02 f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 43 03 f8 ff 8b 44 24 [ 1.901302] RSP: 002b:00007ffa143a3ac0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 1.901499] RAX: ffffffffffffffda RBX: 0000560277ec6fe0 RCX: 00007ffa16ec96ea [ 1.901696] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000016 [ 1.901884] RBP: 0000560277ec5910 R08: 0000000000000000 R09: 00007ffa143a4640 [ 1.902082] R10: 00007ffa16e4d39e R11: 0000000000000293 R12: 00005602773f59e0 [ 1.902459] R13: 0000000000000000 R14: 00007fffbfc007ff R15: 00007ffa13ba4000 [ 1.902763] </TASK> [ 1.902877] ---[ end trace 0000000000000000 ]--- To avoid calling blk_mq_start_request() twice, This patch moves the execution of blk_mq_start_request() to the end of virtblk_prep_rq(). And instead of requeuing failed request to plug list in the error path of virtblk_add_req_batch(), it uses blk_mq_requeue_request() to change failed request state to MQ_RQ_IDLE. Then virtblk can safely handle the request on the next trial. Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()") Reported-by: Alexandre Courbot <acourbot@chromium.org> Tested-by: Alexandre Courbot <acourbot@chromium.org> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Message-Id: <20220830150153.12627-1-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2022-08-121-14/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...
| * virtio-blk: Avoid use-after-free on suspend/resumeShigeru Yoshida2022-08-111-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hctx->user_data is set to vq in virtblk_init_hctx(). However, vq is freed on suspend and reallocated on resume. So, hctx->user_data is invalid after resume, and it will cause use-after-free accessing which will result in the kernel crash something like below: [ 22.428391] Call Trace: [ 22.428899] <TASK> [ 22.429339] virtqueue_add_split+0x3eb/0x620 [ 22.430035] ? __blk_mq_alloc_requests+0x17f/0x2d0 [ 22.430789] ? kvm_clock_get_cycles+0x14/0x30 [ 22.431496] virtqueue_add_sgs+0xad/0xd0 [ 22.432108] virtblk_add_req+0xe8/0x150 [ 22.432692] virtio_queue_rqs+0xeb/0x210 [ 22.433330] blk_mq_flush_plug_list+0x1b8/0x280 [ 22.434059] __blk_flush_plug+0xe1/0x140 [ 22.434853] blk_finish_plug+0x20/0x40 [ 22.435512] read_pages+0x20a/0x2e0 [ 22.436063] ? folio_add_lru+0x62/0xa0 [ 22.436652] page_cache_ra_unbounded+0x112/0x160 [ 22.437365] filemap_get_pages+0xe1/0x5b0 [ 22.437964] ? context_to_sid+0x70/0x100 [ 22.438580] ? sidtab_context_to_sid+0x32/0x400 [ 22.439979] filemap_read+0xcd/0x3d0 [ 22.440917] xfs_file_buffered_read+0x4a/0xc0 [ 22.441984] xfs_file_read_iter+0x65/0xd0 [ 22.442970] __kernel_read+0x160/0x2e0 [ 22.443921] bprm_execve+0x21b/0x640 [ 22.444809] do_execveat_common.isra.0+0x1a8/0x220 [ 22.446008] __x64_sys_execve+0x2d/0x40 [ 22.446920] do_syscall_64+0x37/0x90 [ 22.447773] entry_SYSCALL_64_after_hwframe+0x63/0xcd This patch fixes this issue by getting vq from vblk, and removes virtblk_init_hctx(). Fixes: 4e0400525691 ("virtio-blk: support polling I/O") Cc: "Suwan Kim" <suwan.kim027@gmail.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Message-Id: <20220810160948.959781-1-syoshida@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | block: remove blk_cleanup_diskChristoph Hellwig2022-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: simplify disk shutdownChristoph Hellwig2022-06-281-1/+0
|/ | | | | | | | | | | | | | | | | Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* virtio-blk: support mq_ops->queue_rqs()Suwan Kim2022-05-311-16/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch supports mq_ops->queue_rqs() hook. It has an advantage of batch submission to virtio-blk driver. It also helps polling I/O because polling uses batched completion of block layer. Batch submission in queue_rqs() can boost polling performance. In queue_rqs(), it iterates plug->mq_list, collects requests that belong to same HW queue until it encounters a request from other HW queue or sees the end of the list. Then, virtio-blk adds requests into virtqueue and kicks virtqueue to submit requests. If there is an error, it inserts error request to requeue_list and passes it to ordinary block layer path. For verification, I did fio test. (io_uring, randread, direct=1, bs=4K, iodepth=64 numjobs=N) I set 4 vcpu and 2 virtio-blk queues for VM and run fio test 5 times. It shows about 2% improvement. | numjobs=2 | numjobs=4 ----------------------------------------------------------- fio without queue_rqs() | 291K IOPS | 238K IOPS ----------------------------------------------------------- fio with queue_rqs() | 295K IOPS | 243K IOPS For polling I/O performance, I also did fio test as below. (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=4) I set 4 vcpu and 2 poll queues for VM. It shows about 2% improvement in polling I/O. | IOPS | avg latency ----------------------------------------------------------- fio poll without queue_rqs() | 424K | 613.05 usec ----------------------------------------------------------- fio poll with queue_rqs() | 435K | 601.01 usec Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Message-Id: <20220406153207.163134-3-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
* virtio-blk: support polling I/OSuwan Kim2022-05-311-3/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch supports polling I/O via virtio-blk driver. Polling feature is enabled by module parameter "poll_queues" and it sets dedicated polling queues for virtio-blk. This patch improves the polling I/O throughput and latency. The virtio-blk driver doesn't not have a poll function and a poll queue and it has been operating in interrupt driven method even if the polling function is called in the upper layer. virtio-blk polling is implemented upon 'batched completion' of block layer. virtblk_poll() queues completed request to io_comp_batch->req_list and later, virtblk_complete_batch() calls unmap function and ends the requests in batch. virtio-blk reads the number of poll queues from module parameter "poll_queues". If VM sets queue parameter as below, ("num-queues=N" [QEMU property], "poll_queues=M" [module parameter]) It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)] as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default queues, the poll queues have no callback function. Regarding HW-SW queue mapping, the default queue mapping uses the existing method that condsiders MSI irq vector. But the poll queue doesn't have an irq, so it uses the regular blk-mq cpu mapping. For verifying the improvement, I did Fio polling I/O performance test with io_uring engine with the options below. (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N) I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll queues for VM. As a result, IOPS and average latency improved about 10%. Test result: - Fio io_uring poll without virtio-blk poll support -- numjobs=1 : IOPS = 339K, avg latency = 188.33us -- numjobs=2 : IOPS = 367K, avg latency = 347.33us -- numjobs=4 : IOPS = 383K, avg latency = 682.06us - Fio io_uring poll with virtio-blk poll support -- numjobs=1 : IOPS = 385K, avg latency = 165.94us -- numjobs=2 : IOPS = 408K, avg latency = 313.28us -- numjobs=4 : IOPS = 424K, avg latency = 613.05us Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Message-Id: <20220406153207.163134-2-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
* virtio_blk: fix the discard_granularity and discard_alignment queue limitsChristoph Hellwig2022-05-031-3/+4
| | | | | | | | | | | | | | | | | | | | | | | The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. On the other hand the discard_sector_alignment from the virtio 1.1 looks similar to what Linux uses as discard granularity (even if not very well described): "discard_sector_alignment can be used by OS when splitting a request based on alignment. " And at least qemu does set it to the discard granularity. So stop setting the discard_alignment and use the virtio discard_sector_alignment to set the discard granularity. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove QUEUE_FLAG_DISCARDChristoph Hellwig2022-04-171-2/+0
| | | | | | | | | | | | | | | | | | | Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds2022-03-211-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: - NVMe updates via Christoph: - add vectored-io support for user-passthrough (Kanchan Joshi) - add verbose error logging (Alan Adamson) - support buffered I/O on block devices in nvmet (Chaitanya Kulkarni) - central discovery controller support (Martin Belanger) - fix and extended the globally unique idenfier validation (Christoph) - move away from the deprecated IDA APIs (Sagi Grimberg) - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin, Chaitanya Kulkarni) - add lockdep annotations for in-kernel sockets (Chris Leech) - use vmalloc for ANA log buffer (Hannes Reinecke) - kerneldoc fixes (Chaitanya Kulkarni) - cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph) - warn about shared namespaces without multipathing (Christoph) - MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul, Erik, Dirk) - loop cleanups and queue depth configuration (Chaitanya) - null_blk cleanups and fixes (Chaitanya) - Use descriptive init/exit names in virtio_blk (Randy) - Use bvec_kmap_local() in drivers (Christoph) - bcache fixes (Mingzhe) - xen blk-front persistent grant speedups (Juergen) - rnbd fix and cleanup (Gioh) - Misc fixes (Christophe, Colin) * tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits) virtio_blk: eliminate anonymous module_init & module_exit nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH nvme: remove nvme_alloc_request and nvme_alloc_request_qid nvme: cleanup how disk->disk_name is assigned nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate nvmet: use snprintf() with PAGE_SIZE in configfs nvmet: don't fold lines nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport nvme-tcp: lockdep: annotate in-kernel sockets nvme-tcp: don't fold the line nvme-tcp: don't initialize ret variable nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio nvme-multipath: use vmalloc for ANA log buffer xen/blkfront: speed up purge_persistent_grants() raid5: initialize the stripe_head embeeded bios as needed raid5-cache: statically allocate the recovery ra bio raid5-cache: fully initialize flush_bio when needed raid5-ppl: fully initialize the bio in ppl_new_iounit ...
| * virtio_blk: eliminate anonymous module_init & module_exitRandy Dunlap2022-03-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log. Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names. Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs Fixes: e467cde23818 ("Block driver using virtio.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220316192010.19001-2-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds2022-03-211-52/+14
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
| * virtio_blk: simplify refcountingChristoph Hellwig2022-02-161-52/+14
| | | | | | | | | | | | | | | | | | | | | | | | Implement the ->free_disk method to free the virtio_blk structure only once the last gendisk reference goes away instead of keeping a local refcount. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | virtio-blk: Remove BUG_ON() in virtio_queue_rq()Xie Yongji2022-03-061-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have a BUG_ON() to make sure the number of sg list does not exceed queue_max_segments() in virtio_queue_rq(). However, the block layer uses queue_max_discard_segments() instead of queue_max_segments() to limit the sg list for discard requests. So the BUG_ON() might be triggered if virtio-blk device reports a larger value for max discard segment than queue_max_segments(). To fix it, let's simply remove the BUG_ON() which has become unnecessary after commit 02746e26c39e("virtio-blk: avoid preallocating big SGL for data"). And the unused vblk->sg_elems can also be removed together. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zeroXie Yongji2022-03-061-2/+8
|/ | | | | | | | | | | | | Currently the value of max_discard_segment will be set to MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device set 0 to max_discard_seg in configuration space. It's incorrect since the device might not be able to handle such large descriptors. To fix it, let's follow max_segments restrictions in this case. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2022-01-181-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes. - partial support for < MAX_ORDER - 1 granularity for virtio-mem - driver_override for vdpa - sysfs ABI documentation for vdpa - multiqueue config support for mlx5 vdpa - and misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits) vdpa/mlx5: Fix tracking of current number of VQs vdpa/mlx5: Fix is_index_valid() to refer to features vdpa: Protect vdpa reset with cf_mutex vdpa: Avoid taking cf_mutex lock on get status vdpa/vdpa_sim_net: Report max device capabilities vdpa: Use BIT_ULL for bit operations vdpa/vdpa_sim: Configure max supported virtqueues vdpa/mlx5: Report max device capabilities vdpa: Support reporting max device capabilities vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps() vdpa: Add support for returning device configuration information vdpa/mlx5: Support configuring max data virtqueue vdpa/mlx5: Fix config_attr_mask assignment vdpa: Allow to configure max data virtqueues vdpa: Read device configuration only if FEATURES_OK vdpa: Sync calls set/get config/status with cf_mutex vdpa/mlx5: Distribute RX virtqueues in RQT object vdpa: Provide interface to read driver features vdpa: clean up get_config_size ret value handling virtio_ring: mark ring unused on error ...
| * virtio: wrap config->reset callsMichael S. Tsirkin2022-01-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | block: remove the gendisk argument to blk_execute_rqChristoph Hellwig2021-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: remove GENHD_FL_EXT_DEVTChristoph Hellwig2021-11-291-1/+0
|/ | | | | | | | | | | | | | | All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* virtio-blk: modify the value type of num in virtio_queue_rq()Ye Guojin2021-11-241-1/+1
| | | | | | | | | | | | | | | This was found by coccicheck: ./drivers/block/virtio_blk.c, 334, 14-17, WARNING Unsigned expression compared with zero num < 0 Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Link: https://lore.kernel.org/r/20211117063955.160777-1-ye.guojin@zte.com.cn Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 02746e26c39e ("virtio-blk: avoid preallocating big SGL for data") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* Revert "virtio-blk: don't let virtio core to validate used length"Michael S. Tsirkin2021-11-241-1/+0
| | | | | | | | | This reverts commit a40392edf1b2c7822bc0ce68413106661a9d4232. Attempts to validate length in the core did not work out. We'll drop them, so revert the dependent changes in drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2021-11-031-59/+119
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "vhost and virtio fixes and features: - Hardening work by Jason - vdpa driver for Alibaba ENI - Performance tweaks for virtio blk - virtio rng rework using an internal buffer - mac/mtu programming for mlx5 vdpa - Misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (45 commits) vdpa/mlx5: Forward only packets with allowed MAC address vdpa/mlx5: Support configuration of MAC vdpa/mlx5: Fix clearing of VIRTIO_NET_F_MAC feature bit vdpa_sim_net: Enable user to set mac address and mtu vdpa: Enable user to set mac and mtu of vdpa device vdpa: Use kernel coding style for structure comments vdpa: Introduce query of device config layout vdpa: Introduce and use vdpa device get, set config helpers virtio-scsi: don't let virtio core to validate used buffer length virtio-blk: don't let virtio core to validate used length virtio-net: don't let virtio core to validate used length virtio_ring: validate used buffer length virtio_blk: correct types for status handling virtio_blk: allow 0 as num_request_queues i2c: virtio: Add support for zero-length requests virtio-blk: fixup coccinelle warnings virtio_ring: fix typos in vring_desc_extra virtio-pci: harden INTX interrupts virtio_pci: harden MSI-X interrupts virtio_config: introduce a new .enable_cbs method ...
| * virtio-blk: don't let virtio core to validate used lengthJason Wang2021-11-011-0/+1
| | | | | | | | | | | | | | | | | | We never tries to use used length, so the patch prevents the virtio core from validating used length. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027022107.14357-4-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio_blk: correct types for status handlingMichael S. Tsirkin2021-11-011-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | virtblk_setup_cmd returns blk_status_t in an int, callers then assign it back to a blk_status_t variable. blk_status_t is either u32 or (more typically) u8 so it works, but is inelegant and causes sparse warnings. Pass the status in blk_status_t in a consistent way. Reported-by: kernel test robot <lkp@intel.com> Fixes: b2c5221fd074 ("virtio-blk: avoid preallocating big SGL for data") Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * virtio_blk: allow 0 as num_request_queuesMichael S. Tsirkin2021-11-011-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The default value is 0 meaning "no limit". However if 0 is specified on the command line it is instead silently converted to 1. Further, the value is already validated at point of use, there's no point in duplicating code validating the value when it is set. Simplify the code while making the behaviour more consistent by using plain module_param. Fixes: 1a662cf6cb9a ("virtio-blk: add num_request_queues module parameter") Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-blk: fixup coccinelle warningsYe Guojin2021-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | coccicheck complains about the use of snprintf() in sysfs show functions: WARNING use scnprintf or sprintf Use sysfs_emit instead of scnprintf or sprintf makes more sense. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Link: https://lore.kernel.org/r/20211021065111.1047824-1-ye.guojin@zte.com.cn Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
| * virtio_blk: Fix spelling mistake: "advertisted" -> "advertised"Colin Ian King2021-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | There is a spelling mistake in a dev_err error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20211025102240.22801-1-colin.i.king@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * virtio-blk: validate num_queues during probeJason Wang2021-11-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an untrusted device neogitates BLK_F_MQ but advertises a zero num_queues, the driver may end up trying to allocating zero size buffers where ZERO_SIZE_PTR is returned which may pass the checking against the NULL. This will lead unexpected results. Fixing this by failing the probe in this case. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
| * virtio-blk: add num_request_queues module parameterMax Gurtovoy2021-11-011-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | Sometimes a user would like to control the amount of request queues to be created for a block device. For example, for limiting the memory footprint of virtio-blk devices. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210902204622.54354-1-mgurtovoy@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-blk: avoid preallocating big SGL for dataMax Gurtovoy2021-11-011-56/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to pre-allocate a big buffer for the IO SGL anymore. If a device has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For HW virtio-blk device, nr_hw_queues can be 64 or 128 and each queue's depth might be 128. This means the resulting preallocation for the data SGLs is big. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe drivers so it should be reasonable for virtio block as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't support SG_CHAIN, use only runtime allocation for the SGL. Re-organize the setup of the IO request to fit the new sg chain mechanism. No performance degradation was seen (fio libaio engine with 16 jobs and 128 iodepth): IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after) -------- --------------------------------- ---------------------------------- 512B 318K/316K 329K/325K 4KB 323K/321K 353K/349K 16KB 199K/208K 250K/275K 128KB 36K/36.1K 39.2K/41.7K Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Link: https://lore.kernel.org/r/20210901131434.31158-1-mgurtovoy@nvidia.com Reviewed-by: Feng Li <lifeng1519@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> # kconfig fixups Signed-off-by: Michael S. Tsirkin <mst@redhat.com>