summaryrefslogtreecommitdiffstats
path: root/block/blk.h
Commit message (Collapse)AuthorAgeFilesLines
...
* | block: move disk_{block,unblock,flush}_events to blk.hChristoph Hellwig2022-02-021-0/+3
|/ | | | | | | | | | No need to have these declarations in a public header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: only build the icq tracking code when neededChristoph Hellwig2021-12-161-0/+6
| | | | | | | | | Only bfq needs to code to track icq, so make it conditional. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq: move srcu from blk_mq_hw_ctx to request_queueMing Lei2021-12-031-1/+9
| | | | | | | | | | | | | | | | | | | In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch critical area. However, this srcu instance stays at the end of hctx, and it often takes standalone cacheline, often cold. Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on the indirect percpu variable which is allocated from heap instead of being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It doesn't matter if srcu structure stays in hctx or request queue. So switch to per-request-queue srcu for protecting dispatch, and this way simplifies quiesce a lot, not mention quiesce is always done on the request queue wide. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: switch to atomic_t for request referencesJens Axboe2021-12-031-0/+31
| | | | | | | | | | | | refcount_t is not as expensive as it used to be, but it's still more expensive than the io_uring method of using atomic_t and just checking for potential over/underflow. This borrows that same implementation, which in turn is based on the mm implementation from Linus. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the ->rq_disk field in struct requestChristoph Hellwig2021-11-291-1/+1
| | | | | | | | | | Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: simplify ioc_lookup_icqChristoph Hellwig2021-11-291-1/+1
| | | | | | | | Remove the ioc argument as it always points to current->io_context. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk_mq_sched_assign_ioc to blk-ioc.cChristoph Hellwig2021-11-291-5/+1
| | | | | | | | | | Move blk_mq_sched_assign_ioc so that many interfaces from the file can be marked static. Rename the function to ioc_find_get_icq as well and return the icq to simplify the interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: don't include <linux/part_stat.h> in blk.hChristoph Hellwig2021-11-291-1/+0
| | | | | | | | Not needed, shift it into the source files that need it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: don't include <linux/idr.h> in blk.hChristoph Hellwig2021-11-291-1/+0
| | | | | | | | Not needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: don't include <linux/blk-mq.h> in blk.hChristoph Hellwig2021-11-291-1/+0
| | | | | | | | Not needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: don't include blk-mq.h in blk.hChristoph Hellwig2021-11-291-1/+0
| | | | | | | | | No needed, shift a blk-stat.h include into the source file that needs it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: don't include blk-mq-sched.h in blk.hChristoph Hellwig2021-11-291-1/+0
| | | | | | | | No needed, shift it into the source files that need it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the e argument to elevator_exitChristoph Hellwig2021-11-291-1/+1
| | | | | | | | All callers pass q->elevator. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove elevator_exitChristoph Hellwig2021-11-291-10/+1
| | | | | | | | | Open code elevator_exit in it's only caller, and rename __elevator_exit to elevator_exit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk_get_flush_queue to blk-flush.cChristoph Hellwig2021-11-291-6/+0
| | | | | | | | blk_get_flush_queue is only used in blk-flush.c, so move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq: simplify the plug handling in blk_mq_submit_bioChristoph Hellwig2021-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | blk_mq_submit_bio has two different plug cases, one that uses full plugging and a limited plugging one. The limited plugging case is only used for a corner case that does not matter in real life: - no ->commit_rqs (so not NVMe) - no shared tags (so not SCSI) - not rotational (so no old disk or floppy driver) - must have multiple queues (so no eMMC) Remove the limited merging case and all the related junk to simplify blk_mq_submit_bio and the functions called from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123160443.1315598-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: merge disk_scan_partitions and blkdev_reread_partChristoph Hellwig2021-11-291-0/+1
| | | | | | | | | Unify the functionality that implements a partition rescan for a gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk_print_req_error to blk-mq.cChristoph Hellwig2021-11-291-1/+1
| | | | | | | | | | | This function is only used by the request completion path. Factor out a blk_status_to_str to keep blk_errors private in blk-core.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk_account_io_{start,done} to blk-mq.cChristoph Hellwig2021-11-291-20/+1
| | | | | | | | | | These are only used for request based I/O, so move them where they are used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move request based cloning helpers to blk-mq.cChristoph Hellwig2021-11-291-0/+10
| | | | | | | | | Keep all the request based code together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq: don't insert FUA request with data into scheduler queueMing Lei2021-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | We never insert flush request into scheduler queue before. Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in blk_insert_flush") tries to handle FUA data request as normal request. This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in case of kyber since RQF_ELVPRIV isn't set for flush request, then ->finish_request won't be called. Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert() when the device supports FUA, just like what we did before. [1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/ Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: d92ca9d8348f ("blk-mq: don't handle non-flush requests in blk_insert_flush") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20211118153041.2163228-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move queue enter logic into blk_mq_submit_bio()Jens Axboe2021-11-041-0/+1
| | | | | | | | | | | | | | | Retain the old logic for the fops based submit, but for our internal blk_mq_submit_bio(), move the queue entering logic into the core function itself. We need to be a bit careful if going into the scheduler, as a scheduler or queue mappings can arbitrarily change before we have entered the queue. Have the bio scheduler mapping do that separately, it's a very cheap operation compared to actually doing merging locking and lookups. Reviewed-by: Christoph Hellwig <hch@lst.de> [axboe: update to check merge post submit_bio_checks() doing remap...] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: make bio_queue_enter() fast-path available inlineJens Axboe2021-11-041-0/+34
| | | | | | | | | | Just a prep patch for shifting the queue enter logic. This moves the expected fast path inline, and leaves __bio_queue_enter() as an out-of-line function call. We don't want to inline the latter, as it's mostly slow path code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Add independent access ranges supportDamien Le Moal2021-10-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Concurrent Positioning Ranges VPD page (for SCSI) and data log page (for ATA) contain parameters describing the set of contiguous LBAs that can be served independently by a single LUN multi-actuator hard-disk. Similarly, a logically defined block device composed of multiple disks can in some cases execute requests directed at different sector ranges in parallel. A dm-linear device aggregating 2 block devices together is an example. This patch implements support for exposing a block device independent access ranges to the user through sysfs to allow optimizing device accesses to increase performance. To describe the set of independent sector ranges of a device (actuators of a multi-actuator HDDs or table entries of a dm-linear device), The type struct blk_independent_access_ranges is introduced. This structure describes the sector ranges using an array of struct blk_independent_access_range structures. This range structure defines the start sector and number of sectors of the access range. The ranges in the array cannot overlap and must contain all sectors within the device capacity. The function disk_set_independent_access_ranges() allows a device driver to signal to the block layer that a device has multiple independent access ranges. In this case, a struct blk_independent_access_ranges is attached to the device request queue by the function disk_set_independent_access_ranges(). The function disk_alloc_independent_access_ranges() is provided for drivers to allocate this structure. struct blk_independent_access_ranges contains kobjects (struct kobject) to expose to the user through sysfs the set of independent access ranges supported by a device. When the device is initialized, sysfs registration of the ranges information is done from blk_register_queue() using the block layer internal function disk_register_independent_access_ranges(). If a driver calls disk_set_independent_access_ranges() for a registered queue, e.g. when a device is revalidated, disk_set_independent_access_ranges() will execute disk_register_independent_access_ranges() to update the sysfs attribute files. The sysfs file structure created starts from the independent_access_ranges sub-directory and contains the start sector and number of sectors of each range, with the information for each range grouped in numbered sub-directories. E.g. for a dual actuator HDD, the user sees: $ tree /sys/block/sdk/queue/independent_access_ranges/ /sys/block/sdk/queue/independent_access_ranges/ |-- 0 | |-- nr_sectors | `-- sector `-- 1 |-- nr_sectors `-- sector For a regular device with a single access range, the independent_access_ranges sysfs directory does not exist. Device revalidation may lead to changes to this structure and to the attribute values. When manipulated, the queue sysfs_lock and sysfs_dir_lock mutexes are held for atomicity, similarly to how the blk-mq and elevator sysfs queue sub-directories are protected. The code related to the management of independent access ranges is added in the new file block/blk-ia-ranges.c. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-2-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq: don't handle non-flush requests in blk_insert_flushChristoph Hellwig2021-10-191-1/+1
| | | | | | | | | | | | | | | | Return to the normal blk_mq_submit_bio flow if the bio did not end up actually being a flush because the device didn't support it. Note that this is basically impossible to hit without special instrumentation given that submit_bio_checks already clears these flags usually, so we'd need a tight race to actually hit this code path. With this the call to blk_mq_run_hw_queue for the flush requests can be removed given that the actual flush requests are always issued via the requeue workqueue which runs the queue unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: return whether or not to unplug through booleanJens Axboe2021-10-191-1/+1
| | | | | | | | Instead of returning the same queue request through a request pointer, use a boolean to accomplish the same. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move update request helpers into blk-mq.cJens Axboe2021-10-181-0/+1
| | | | | | | | | | | For some reason we still have them in blk-core, with the rest of the request completion being in blk-mq. That causes and out-of-line call for each completion. Move them into blk-mq.c instead, where they belong. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: handle fast path of bio splitting inlineJens Axboe2021-10-181-1/+26
| | | | | | | | | The fast path is no splitting needed. Separate the handling into a check part we can inline, and an out-of-line handling path if we do need to split. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: rename REQ_HIPRI to REQ_POLLEDChristoph Hellwig2021-10-181-2/+2
| | | | | | | | | | | | | Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: inline hot paths of blk_account_io_*()Pavel Begunkov2021-10-181-3/+21
| | | | | | | | | | Extract hot paths of __blk_account_io_start() and __blk_account_io_done() into inline functions, so we don't always pay for function calls. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b0662a636bd4cc7b4f84c9d0a41efa46a688ef13.1633781740.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: merge block_ioctl into blkdev_ioctlChristoph Hellwig2021-10-181-2/+1
| | | | | | | | Simplify the ioctl path and match the code structure on the compat side. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move the *blkdev_ioctl declarations out of blkdev.hChristoph Hellwig2021-10-181-0/+4
| | | | | | | | These are only used inside of block/. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: bump max plugged deferred size from 16 to 32Jens Axboe2021-10-181-0/+6
| | | | | | | | | | | | | | | | | Particularly for NVMe with efficient deferred submission for many requests, there are nice benefits to be seen by bumping the default max plug count from 16 to 32. This is especially true for virtualized setups, where the submit part is more expensive. But can be noticed even on native hardware. Reduce the multiple queue factor from 4 to 2, since we're changing the default size. While changing it, move the defines into the block layer private header. These aren't values that anyone outside of the block layer uses, or should use. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk-throtl fast path inlineJens Axboe2021-10-181-16/+0
| | | | | | | | Even if no policies are defined, we spend ~2% of the total IO time checking. Move the fast path inline. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()John Garry2021-10-181-1/+1
| | | | | | | | | | | To be more concise and consistent in naming, rename blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs(). Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move a few merge helpers out of <linux/blkdev.h>Christoph Hellwig2021-10-181-0/+38
| | | | | | | | | | These are block-layer internal helpers, so move them to block/blk.h and block/blk-merge.c. Also update a comment a bit to use better grammar. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move elevator.h to block/Christoph Hellwig2021-10-181-0/+2
| | | | | | | | | | | | Except for the features passed to blk_queue_required_elevator_features, elevator.h is only needed internally to the block layer. Move the ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to block/, dropping all the spurious includes outside of that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: keep q_usage_counter in atomic mode after del_gendiskChristoph Hellwig2021-10-151-0/+1
| | | | | | | | | | | | | Don't switch back to percpu mode to avoid the double RCU grace period when tearing down SCSI devices. After removing the disk only passthrough commands can be send anyway. Suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-6-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: drain file system I/O on del_gendiskChristoph Hellwig2021-10-151-0/+1
| | | | | | | | | | | | | | | | | Instead of delaying draining of file system I/O related items like the blk-qos queues, the integrity read workqueue and timeouts only when the request_queue is removed, do that when del_gendisk is called. This is important for SCSI where the upper level drivers that control the gendisk are separate entities, and the disk can be freed much earlier than the request_queue, or can even be unbound without tearing down the queue. Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: split out operations on block special filesChristoph Hellwig2021-09-071-0/+2
| | | | | | | | | | Add a new block/fops.c for all the file and address_space operations that provide the block special file support. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210907141303.1371844-2-hch@lst.de [axboe: correct trailing whitespace while at it] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds2021-08-301-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull support for struct bio recycling from Jens Axboe: "This adds bio recycling support for polled IO, allowing quick reuse of a bio for high IOPS scenarios via a percpu bio_set list. It's good for almost a 10% improvement in performance, bumping our per-core IO limit from ~3.2M IOPS to ~3.5M IOPS" * tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block: bio: improve kerneldoc documentation for bio_alloc_kiocb() block: provide bio_clear_hipri() helper block: use the percpu bio cache in __blkdev_direct_IO io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
| * block: provide bio_clear_hipri() helperJens Axboe2021-08-231-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Any case that turns off REQ_HIPRI must also clear BIO_PERCPU_CACHE, as non-polled IO may complete through hard/soft IRQ and hence isn't safe for our polled bio alloc cache. Provide a helper that does just that, and use it in the merging code as well if we split a bio and turn off polling. Fixes: be863b9e4348 ("block: clear BIO_PERCPU_CACHE flag if polling isn't supported") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds2021-08-301-9/+11
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "Nothing major in here - lots of good cleanups and tech debt handling, which is also evident in the diffstats. In particular: - Add disk sequence numbers (Matteo) - Discard merge fix (Ming) - Relax disk zoned reporting restrictions (Niklas) - Bio error handling zoned leak fix (Pavel) - Start of proper add_disk() error handling (Luis, Christoph) - blk crypto fix (Eric) - Non-standard GPT location support (Dmitry) - IO priority improvements and cleanups (Damien)o - blk-throtl improvements (Chunguang) - diskstats_show() stack reduction (Abd-Alrhman) - Loop scheduler selection (Bart) - Switch block layer to use kmap_local_page() (Christoph) - Remove obsolete disk_name helper (Christoph) - block_device refcounting improvements (Christoph) - Ensure gendisk always has a request queue reference (Christoph) - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)" * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits) sg: pass the device name to blk_trace_setup block, bfq: cleanup the repeated declaration blk-crypto: fix check for too-large dun_bytes blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN blk-zoned: allow zone management send operations without CAP_SYS_ADMIN block: mark blkdev_fsync static block: refine the disk_live check in del_gendisk mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA mmc: block: Support alternative_gpt_sector() operation partitions/efi: Support non-standard GPT location block: Add alternative_gpt_sector() operation bio: fix page leak bio_add_hw_page failure block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT block: remove a pointless call to MINOR() in device_add_disk null_blk: add error handling support for add_disk() virtio_blk: add error handling support for add_disk() block: add error handling for device_add_disk / add_disk block: return errors from disk_alloc_events block: return errors from blk_integrity_add block: call blk_register_queue earlier in device_add_disk ...
| * block: return errors from disk_alloc_eventsLuis Chamberlain2021-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | Prepare for proper error handling in add_disk. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: return errors from blk_integrity_addLuis Chamberlain2021-08-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | Prepare for proper error handling in add_disk. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * blk-throtl: optimize IOPS throttle for large IO scenariosChunguang Xu2021-08-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After patch 54efd50 (block: make generic_make_request handle arbitrarily sized bios), the IO through io-throttle may be larger, and these IOs may be further split into more small IOs. However, IOPS throttle does not seem to be aware of this change, which makes the calculation of IOPS of large IOs incomplete, resulting in disk-side IOPS that does not meet expectations. Maybe we should fix this problem. We can reproduce it by set max_sectors_kb of disk to 128, set blkio.write_iops_throttle to 100, run a dd instance inside blkio and use iostat to watch IOPS: dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct As a result, without this change the average IOPS is 1995, with this change the IOPS is 98. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/65869aaad05475797d63b4c3fed4f529febe3c26.1627876014.git.brookxu@tencent.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: pass a gendisk to bdev_resize_partitionChristoph Hellwig2021-08-121-2/+2
| | | | | | | | | | | | | | | | | | bdev_resize_partition can only operate on the whole device. Make that clear by passing a gendisk instead of a block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210810154512.1809898-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: pass a gendisk to bdev_del_partitionChristoph Hellwig2021-08-121-1/+1
| | | | | | | | | | | | | | | | | | bdev_del_partition can only operate on the whole device. Make that clear by passing a gendisk instead of a block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210810154512.1809898-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: pass a gendisk to bdev_add_partitionChristoph Hellwig2021-08-121-2/+2
| | | | | | | | | | | | | | | | | | bdev_add_partition can only operate on the whole device. Make that clear by passing a gendisk instead of a block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210810154512.1809898-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: remove disk_name()Christoph Hellwig2021-08-021-1/+0
| | | | | | | | | | | | | | | | | | Remove the disk_name function now that all users are gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210727062518.122108-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>