summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* block: introduce holder opsChristoph Hellwig2023-06-0534-56/+90
| | | | | | | | | | | | | | Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove blk_drop_partitionsChristoph Hellwig2023-06-051-12/+4
| | | | | | | | | | | There is only a single caller left, so fold the loop into that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: delete partitions later in del_gendiskChristoph Hellwig2023-06-053-13/+32
| | | | | | | | | | | | | | | | | | | Delay dropping the block_devices for partitions in del_gendisk until after the call to blk_mark_disk_dead, so that we can implementat notification of removed devices in blk_mark_disk_dead. This requires splitting a lower-level drop_partition helper out of delete_partition and using that from del_gendisk, while having a common loop for the whole device and partitions that calls remove_inode_hash, fsync_bdev and __invalidate_device before the call to blk_mark_disk_dead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: unhash the inode earlier in delete_partitionChristoph Hellwig2023-06-051-6/+6
| | | | | | | | | | | | | Move the call to remove_inode_hash to the beginning of delete_partition, as we want to prevent opening a block_device that is about to be removed ASAP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: avoid repeated work in blk_mark_disk_deadChristoph Hellwig2023-06-051-1/+3
| | | | | | | | | | | | | Check if GD_DEAD is already set in blk_mark_disk_dead, and don't duplicate the work already done. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendiskChristoph Hellwig2023-06-051-14/+12
| | | | | | | | | | | | | | | | | | | | | | blk_mark_disk_dead does very similar work a a section of del_gendisk: - set the GD_DEAD flag - set the capacity to zero - start a queue drain but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by the disk, sets the capacity to zero before starting the drain, and both with sending a uevent and kernel message for this fake capacity change. Move the exact logic from the more heavily used del_gendisk into blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: turn bdev_lock into a mutexChristoph Hellwig2023-06-051-14/+13
| | | | | | | | | | | | | There is no reason for this lock to spin, and being able to sleep under it will come in handy soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: refactor bd_may_claimChristoph Hellwig2023-06-051-18/+22
| | | | | | | | | | | | | | The long if/else chain obsfucates the actual logic. Tidy it up to be more structured. Also drop the whole argument, as it can be trivially derived from bdev using bdev_whole, and having the bdev_whole in the function makes it easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: factor out a bd_end_claim helper from blkdev_putChristoph Hellwig2023-06-051-30/+33
| | | | | | | | | | | | Move all the logic to release an exclusive claim into a helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: stop defining __KERNEL_SYSCALLS__Christoph Hellwig2023-06-052-2/+0
| | | | | | | | | | __KERNEL_SYSCALLS__ hasn't been needed since Linux 2.6.19 so stop defining it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230601151646.1386867-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ublk: add control command of UBLK_U_CMD_GET_FEATURESMing Lei2023-06-042-0/+29
| | | | | | | | | | | | | | | | | | | | Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's feature set or capability. This way can simplify userspace for maintaining compatibility because userspace doesn't need to send command to one device for querying driver feature set any more. Such as, with the queried feature set, userspace can choose to use: - UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO, - UBLK_U_CMD_* or UBLK_CMD_* Userspace code: https://github.com/ming1/ubdsrv/commits/features-cmd Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230603040601.775227-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Replace all non-returning strlcpy with strscpyAzeem Shaikh2023-06-013-3/+3
| | | | | | | | | | | | | | | | | | strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230530155608.272266-1-azeemshaikh38@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-ioc: protect ioc_destroy_icq() by 'queue_lock'Yu Kuai2023-06-011-17/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, icq is tracked by both request_queue(icq->q_node) and task(icq->ioc_node), and ioc_clear_queue() from elevator exit is not safe because it can access the list without protection: ioc_clear_queue ioc_release_fn lock queue_lock list_splice /* move queue list to a local list */ unlock queue_lock /* * lock is released, the local list * can be accessed through task exit. */ lock ioc->lock while (!hlist_empty) icq = hlist_entry lock queue_lock ioc_destroy_icq delete icq->ioc_node while (!list_empty) icq = list_entry() list_del icq->q_node /* * This is not protected by any lock, * list_entry concurrent with list_del * is not safe. */ unlock queue_lock unlock ioc->lock Fix this problem by protecting list 'icq->q_node' by queue_lock from ioc_clear_queue(). Reported-and-tested-by: Pradeep Pragallapati <quic_pragalla@quicinc.com> Link: https://lore.kernel.org/lkml/20230517084434.18932-1-quic_pragalla@quicinc.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531073435.2923422-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: mark bio_add_folio as __must_checkJohannes Thumshirn2023-06-011-1/+2
| | | | | | | | | | | Now that all callers of bio_add_folio() check the return value, mark it as __must_check. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/381360a45ac3684120cfbe1e07685e9c36256e47.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* fs: iomap: use bio_add_folio_nofail where possibleJohannes Thumshirn2023-06-011-3/+3
| | | | | | | | | | | | | | | When the iomap buffered-io code can't add a folio to a bio, it allocates a new bio and adds the folio to that one. This is done using bio_add_folio(), but doesn't check for errors. As adding a folio to a newly created bio can't fail, use the newly introduced bio_add_folio_nofail() function. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/58fa893c24c67340a63323f09a179fefdca07f2a.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add bio_add_folio_nofailJohannes Thumshirn2023-06-012-0/+10
| | | | | | | | | | Just like for bio_add_pages() add a no-fail variant for bio_add_folio(). Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/924dff4077812804398ef84128fb920507fa4be1.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: mark bio_add_page as __must_checkJohannes Thumshirn2023-06-011-1/+2
| | | | | | | | | | | Now that all users of bio_add_page check for the return value, mark bio_add_page as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/7ae4a902e08fe2e90c012ee07aeb35d4aae28373.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm-crypt: use __bio_add_page to add single page to clone bioJohannes Thumshirn2023-06-011-2/+1
| | | | | | | | | | crypt_alloc_buffer() already allocates enough entries in the clone bio's vector, so adding a page to the bio can't fail. Use __bio_add_page() to reflect this. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/f9a4dee5e81389fd70ffc442da01006538e55aca.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: raid1: check if adding pages to resync bio failsJohannes Thumshirn2023-06-012-15/+16
| | | | | | | | | | | | | | | | Check if adding pages to resync bio fails and if bail out. As the comment above suggests this cannot happen, WARN if it actually happens. Technically __bio_add_pages() would be sufficient here, but asserting the pages actually get added to the bio is preferred. This way we can mark bio_add_pages as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/33aea4c271220dc9bcab58c4b7bec478c1511142.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: raid1: use __bio_add_page for adding single page to bioJohannes Thumshirn2023-06-011-1/+1
| | | | | | | | | | | | | | | | | The sync request code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/6cf7f66c6e646231200d025dfd5f2d3ae75c8fe5.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: check for failure when adding pages in alloc_behind_master_bioJohannes Thumshirn2023-06-011-1/+4
| | | | | | | | | | | | | | | | alloc_behind_master_bio() can possibly add multiple pages to a bio, but it is not checking for the return value of bio_add_page() if adding really succeeded. Check if the page adding succeeded and if not bail out. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/827aa12d44ebf3f50b41b47f5cedc0f80179f2c1.1685532726.git.johannes.thumshirn@wdc.com [axboe: fold in s/free_page/put_page fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: use __bio_add_page for adding single page to bioJohannes Thumshirn2023-05-311-1/+1
| | | | | | | | | | | | | | | | The floppy code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/33c445a3b431270c72d9be03d5da1b08ae983920.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* zram: use __bio_add_page for adding single page to bioJohannes Thumshirn2023-05-311-1/+1
| | | | | | | | | | | | | | | | | | The zram writeback code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/cfd141dd7773315879a126f2aa81b7f698bc0e10.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* zonefs: use __bio_add_page for adding single page to bioJohannes Thumshirn2023-05-311-1/+1
| | | | | | | | | | | | | | | | | The zonefs superblock reading code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/04c9978ccaa0fc9871cd4248356638d98daccf0c.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* gfs2: use __bio_add_page for adding single page to bioJohannes Thumshirn2023-05-311-1/+1
| | | | | | | | | | | | | | | | | | The GFS2 superblock reading code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/087c67d4e4973f949d3519c1e4822784ce583c5a.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* jfs: logmgr: use __bio_add_page to add single page to bioJohannes Thumshirn2023-05-311-2/+2
| | | | | | | | | | | | | | | | | The JFS IO code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/9fb5ed86d19f6e0b6f64dfc4109a48ff8ff24497.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: raid5: use __bio_add_page to add single page to new bioJohannes Thumshirn2023-05-311-2/+2
| | | | | | | | | | | | | | | | | | | The raid5-ppl submission code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. For adding consecutive pages, the return is actually checked and a new bio is allocated if adding the page fails. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/27e6bcd762354bff74602e89159cdd12ae3d1fa9.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: raid5-log: use __bio_add_page to add single pageJohannes Thumshirn2023-05-311-1/+1
| | | | | | | | | | | | | | | | | | The raid5 log metadata submission code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/832a810d6c9e71f88b0a39cb076a8c70e8bcb821.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: use __bio_add_page to add single pageJohannes Thumshirn2023-05-311-2/+2
| | | | | | | | | | | | | | | | | | | The md-raid superblock writing code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Signed-of_-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/ca196f5e650e318106dbb4496eb6cbac4bc800bd.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* fs: buffer: use __bio_add_page to add single page to bioJohannes Thumshirn2023-05-311-2/+1
| | | | | | | | | | | | | | | | | | The buffer_head submission code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Gou Hao <gouhao@uniontech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/84ff2dcbe81b258a73ad900adb5266e208b61a4d.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm: dm-zoned: use __bio_add_page for adding single metadata pageJohannes Thumshirn2023-05-311-3/+3
| | | | | | | | | | | | | | | dm-zoned uses bio_add_page() for adding a single page to a freshly created metadata bio. Use __bio_add_page() instead as adding a single page to a new bio is always guaranteed to succeed. This brings us a step closer to marking bio_add_page() __must_check Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/55a0c8dad7550379647873b579dc7cfbe0191f96.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: use __bio_add_page to add page to bioJohannes Thumshirn2023-05-311-3/+1
| | | | | | | | | | | | | The drbd code only adds a single page to a newly created bio. So use __bio_add_page() to add the page which is guaranteed to succeed in this case. This brings us closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/435007afac14f3766455559059d21843771fae53.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* swap: use __bio_add_page to add page to bioJohannes Thumshirn2023-05-311-4/+4
| | | | | | | | | | | | | | The swap code only adds a single page to a newly created bio. So use __bio_add_page() to add the page which is guaranteed to succeed in this case. This brings us closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/5bdafd9de806b2dab92302b30eb7a3a5f10c37d9.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Use iov_iter_extract_pages() and page pinning in direct-io.cDavid Howells2023-05-311-29/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the old block-based direct-I/O code to use iov_iter_extract_pages() to pin user pages or leave kernel pages unpinned rather than taking refs when submitting bios. This makes use of the preceding patches to not take pins on the zero page (thereby allowing insertion of zero pages in with pinned pages) and to get additional pins on pages, allowing an extracted page to be used in multiple bios without having to re-extract it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@infradead.org> cc: David Hildenbrand <david@redhat.com> cc: Lorenzo Stoakes <lstoakes@gmail.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Jason Gunthorpe <jgg@nvidia.com> cc: Logan Gunthorpe <logang@deltatee.com> cc: Hillf Danton <hdanton@sina.com> cc: Christian Brauner <brauner@kernel.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230526214142.958751-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* mm: Provide a function to get an additional pin on a pageDavid Howells2023-05-312-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a function to get an additional pin on a page that we already have a pin on. This will be used in fs/direct-io.c when dispatching multiple bios to a page we've extracted from a user-backed iter rather than redoing the extraction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@infradead.org> cc: David Hildenbrand <david@redhat.com> cc: Lorenzo Stoakes <lstoakes@gmail.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Jason Gunthorpe <jgg@nvidia.com> cc: Logan Gunthorpe <logang@deltatee.com> cc: Hillf Danton <hdanton@sina.com> cc: Christian Brauner <brauner@kernel.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20230526214142.958751-3-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* mm: Don't pin ZERO_PAGE in pin_user_pages()David Howells2023-05-313-3/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer to it from the page tables and make unpin_user_page*() correspondingly ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a zero page's refcount as we're only allowed ~2 million pins on it - something that userspace can conceivably trigger. Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@infradead.org> cc: David Hildenbrand <david@redhat.com> cc: Lorenzo Stoakes <lstoakes@gmail.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Jason Gunthorpe <jgg@nvidia.com> cc: Logan Gunthorpe <logang@deltatee.com> cc: Hillf Danton <hdanton@sina.com> cc: Christian Brauner <brauner@kernel.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20230526214142.958751-2-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: constify the whole_disk device_attributeThomas Weißschuh2023-05-301-1/+1
| | | | | | | | The struct is never modified so it can be const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230419-const-partition-v3-4-4e14e48be367@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: constify struct part_attr_groupThomas Weißschuh2023-05-301-1/+1
| | | | | | | | The struct is never modified so it can be const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230419-const-partition-v3-3-4e14e48be367@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: constify struct part_type part_typeThomas Weißschuh2023-05-302-2/+2
| | | | | | | | The struct is never modified so it can be const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230419-const-partition-v3-2-4e14e48be367@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: constify partition prober arrayThomas Weißschuh2023-05-301-1/+1
| | | | | | | | | | The array is never modified so it can be const. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202304191640.SkNk7kVN-lkp@intel.com/ Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230419-const-partition-v3-1-4e14e48be367@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: convert bio_map_user_iov to use iov_iter_extract_pagesDavid Howells2023-05-241-12/+11
| | | | | | | | | | | | | | | | | | | | | | | This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O could otherwise end up being visible to/affected by the child process). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-7-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pagesDavid Howells2023-05-241-11/+12
| | | | | | | | | | | | | | | | | | | | | | | This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O could otherwise end up being affected by/visible to the child process). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-6-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Add BIO_PAGE_PINNED and associated infrastructureDavid Howells2023-05-244-4/+18
| | | | | | | | | | | | | | | | | | Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-5-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logicChristoph Hellwig2023-05-246-4/+6
| | | | | | | | | | | | | | | | | | | Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Fix bio_flagged() so that gcc can better optimise itDavid Howells2023-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <bio_release_pages+20> <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-3-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroingDavid Howells2023-05-241-1/+1
| | | | | | | | | | | | | | | | ZERO_PAGE can't go away, no need to hold an extra reference. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-2-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'for-6.5/splice' into for-6.5/blockJens Axboe2023-05-2468-621/+694
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge splice bits as subsequent block cleanups and improvements for DIO depend on them. * for-6.5/splice: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
| * splice: kdoc for filemap_splice_read() and copy_splice_read()David Howells2023-05-242-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide kerneldoc comments for filemap_splice_read() and copy_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Steve French <smfrench@gmail.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-32-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * iov_iter: Kill ITER_PIPEDavid Howells2023-05-243-444/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ITER_PIPE-type iterator was only used by generic_file_splice_read() and that has been replaced and removed. This leaves ITER_PIPE unused - so remove it too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * splice: Remove generic_file_splice_read()David Howells2023-05-242-45/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove generic_file_splice_read() as it has been replaced with calls to filemap_splice_read() and copy_splice_read(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Steve French <smfrench@gmail.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-30-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>