summaryrefslogtreecommitdiffstats
path: root/lib/iov_iter.c
Commit message (Collapse)AuthorAgeFilesLines
...
* iov_iter_npages(): don't bother with iterate_all_kinds()Al Viro2021-06-101-34/+54
| | | | | | | note that in bvec case pages can be compound ones - we can't just assume that each segment is covered by one (sub)page Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of iterate_all_kinds() in ↵Al Viro2021-06-101-56/+91
| | | | | | | | | | | | | | iov_iter_get_pages()/iov_iter_get_pages_alloc() Here iterate_all_kinds() is used just to find the first (non-empty, in case of iovec) segment. Which can be easily done explicitly. Note that in bvec case we now can get more than PAGE_SIZE worth of them, in case when we have a compound page in bvec and a range that crosses a subpage boundary. Older behaviour had been to stop on that boundary; we used to get the right first page (for_each_bvec() took care of that), but that was all we'd got. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter_gap_alignment(): get rid of iterate_all_kinds()Al Viro2021-06-101-13/+14
| | | | | | | | | | | | | | | For one thing, it's only used for iovec (and makes sense only for those). For another, here we don't care about iov_offset, since the beginning of the first segment and the end of the last one are ignored. So it makes a lot more sense to just walk through the iovec array... We need to deal with the case of truncated iov_iter, but unlike the situation with iov_iter_alignment() we don't care where the last segment ends - just which segment is the last one. [fixed a braino spotted by Qian Cai <quic_qiancai@quicinc.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter_alignment(): don't bother with iterate_all_kinds()Al Viro2021-06-101-10/+53
| | | | | | | | It's easier to go over the array manually. We need to watch out for truncated iov_iter, though - iovec array might cover more than i->count. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* sanitize iov_iter_fault_in_readable()Al Viro2021-06-101-10/+16
| | | | | | | | | | | | | | | 1) constify iov_iter argument; we are not advancing it in this primitive. 2) cap the amount requested by the amount of data in iov_iter. All existing callers should've been safe, but the check is really cheap and doing it here makes for easier analysis, as well as more consistent semantics among the primitives. 3) don't bother with iterate_iovec(). Explicit loop is not any harder to follow, and we get rid of standalone iterate_iovec() users - it's only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: optimize iov_iter_advance() for iovec and kvecAl Viro2021-06-101-14/+28
| | | | | | | | | | We can do better than generic iterate_and_advance() for this one; inspired by bvec_iter_advance() (and massaged into that form by equivalent transformations). [fixed a braino caught by kernel test robot <oliver.sang@intel.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: separate direction from flavourAl Viro2021-06-101-37/+48
| | | | | | | | | | | Instead of having them mixed in iter->type, use separate ->iter_type and ->data_source (u8 and bool resp.) And don't bother with (pseudo-) bitmap for the former - microoptimizations from being able to check if the flavour is one of two values are not worth the confusion for optimizer. It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE, so we end up with extra headache. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARDAl Viro2021-06-101-2/+0
| | | | | | the field is not used for that flavour Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: reorder handling of flavours in primitivesAl Viro2021-06-101-46/+45
| | | | | | | | iovec is the most common one; test it first and test explicitly, rather than "not anything else". Replace all flavour checks with use of iov_iter_is_...() helpers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert()Al Viro2021-06-101-104/+0
| | | | | | | | | | Use corresponding plain variants, revert on short copy. That's the way it should've been done from the very beginning, except that we didn't have iov_iter_revert() back then... [fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter_advance(): use consistent semantics for move past the endAl Viro2021-06-031-3/+2
| | | | | | | | asking to advance by more than we have left in the iov_iter should move to the very end; it should *not* leave negative i->count and it should not spew into syslog, etc. - it's a legitimate operation. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [xarray] iov_iter_fault_in_readable() should do nothing in xarray caseAl Viro2021-06-031-1/+1
| | | | | | | | ... and actually should just check it's given an iovec-backed iterator in the first place. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* copy_page_to_iter(): fix ITER_DISCARD caseAl Viro2021-06-031-2/+5
| | | | | | | we need to advance the iterator... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* teach copy_page_to_iter() to handle compound pagesAl Viro2021-06-031-3/+25
| | | | | | | | | | | | | | In situation when copy_page_to_iter() got a compound page the current code would only work on systems with no CONFIG_HIGHMEM. It *is* the majority of real-world setups, or we would've drown in bug reports by now. Still needs fixing. Current variant works for solitary page; rename that to __copy_page_to_iter() and turn the handling of compound pages into a loop over subpages. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: Remove iov_iter_for_each_range()David Howells2021-06-031-27/+0
| | | | | | | | Remove iov_iter_for_each_range() as it's no longer used with the removal of lustre. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: lift memzero_page() to highmem.hIra Weiny2021-05-051-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "btrfs: Convert kmap/memset/kunmap to memzero_user()". Lifting memzero_user(), convert it to kmap_local_page() and then use it in btrfs. This patch (of 3): memzero_page() can replace the kmap/memset/kunmap pattern in other places in the code. While zero_user() has the same interface it is not the same call and its use should be limited and some of those calls may be better converted from zero_user() to memzero_page().[1] But that is not addressed in this series. Lift memzero_page() to highmem. [1] https://lore.kernel.org/lkml/CAHk-=wijdojzo56FzYqE5TOYw2Vws7ik3LEMGj9SPQaJJ+Z73Q@mail.gmail.com/ Link: https://lkml.kernel.org/r/20210309212137.2610186-1-ira.weiny@intel.com Link: https://lkml.kernel.org/r/20210309212137.2610186-2-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* iov_iter: Four fixes for ITER_XARRAYDavid Howells2021-04-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix four things[1] in the patch that adds ITER_XARRAY[2]: (1) Remove the address_space struct predeclaration. This is a holdover from when it was ITER_MAPPING. (2) Fix _copy_mc_to_iter() so that the xarray segment updates count and iov_offset in the iterator before returning. (3) Fix iov_iter_alignment() to not loop in the xarray case. Because the middle pages are all whole pages, only the end pages need be considered - and this can be reduced to just looking at the start position in the xarray and the iteration size. (4) Fix iov_iter_advance() to limit the size of the advance to no more than the remaining iteration size. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Jeff Layton <jlayton@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Link: https://lore.kernel.org/r/YIVrJT8GwLI0Wlgx@zeniv-ca.linux.org.uk [1] Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk [2]
* iov_iter: Add ITER_XARRAYDavid Howells2021-04-231-23/+290
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an iterator, ITER_XARRAY, that walks through a set of pages attached to an xarray, starting at a given page and offset and walking for the specified amount of bytes. The iterator supports transparent huge pages. The iterate_xarray() macro calls the helper function with rcu_access() helped. I think that this is only a problem for iov_iter_for_each_range() - and that returns an error for ITER_XARRAY (also, this function does not appear to be called). The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed. This is useful for copying data from socket buffers to inodes in network filesystems and for transferring data between those inodes and the cache using direct I/O. Whilst it is true that ITER_BVEC could be used instead, that would require a bio_vec array to be allocated to refer to all the pages - which should be redundant if inode->i_pages also points to all these pages. Note that older versions of this patch implemented an ITER_MAPPING instead, which was almost the same. Changes: v7: - Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Christoph Hellwig <hch@lst.de> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789064740.6155.11932541175173658065.stgit@warthog.procyon.org.uk/ # v6 Link: https://lore.kernel.org/r/27c369a8f42bb8a617672b2dc0126a5c6df5a050.camel@kernel.org [1]
* Merge branch 'kmap-conversion-for-5.12' of ↵Linus Torvalds2021-03-011-14/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull kmap conversion updates from David Sterba: "This contains changes regarding kmap API use and eg conversion from kmap_atomic to kmap_local_page. The API belongs to memory management but to save cross-tree dependency headaches we've agreed to take it through the btrfs tree because there are some trivial conversions possible, while the rest will need some time and getting the easy cases out of the way would be convenient. The changes can be grouped: - function exports, new helpers - new VM_BUG_ON for additional verification; it's been discussed if it should be VM_BUG_ON or BUG_ON, the former was chosen due to performance reasons - code replaced by relevant helpers" [ This is an updated version of a request that originally came in during the merge window, but I asked for some updates: https://lore.kernel.org/lkml/cover.1614090658.git.dsterba@suse.com/ which is why this got merge after the merge window closed. - Linus ] * 'kmap-conversion-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: use copy_highpage() instead of 2 kmaps() btrfs: use memcpy_[to|from]_page() and kmap_local_page() mm/highmem: Add VM_BUG_ON() to mem*_page() calls mm/highmem: Introduce memcpy_page(), memmove_page(), and memset_page() mm/highmem: Convert memcpy_[to|from]_page() to kmap_local_page() mm/highmem: Lift memcpy_[to|from]_page to core
| * mm/highmem: Lift memcpy_[to|from]_page to coreIra Weiny2021-02-111-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Working through a conversion to a call kmap_local_page() instead of kmap() revealed many places where the pattern kmap/memcpy/kunmap occurred. Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al Viro all suggested putting this code into helper functions. Al Viro further pointed out that these functions already existed in the iov_iter code.[1] Various locations for the lifted functions were considered. Headers like mm.h or string.h seem ok but don't really portray the functionality well. pagemap.h made some sense but is for page cache functionality.[2] Another alternative would be to create a new header for the promoted memcpy functions, but it masks the fact that these are designed to copy to/from pages using the kernel direct mappings and complicates matters with a new header. Placing these functions in 'highmem.h' is suboptimal especially with the changes being proposed in the functionality of kmap. From a caller perspective including/using 'highmem.h' implies that the functions defined in that header are only required when highmem is in use which is increasingly not the case with modern processors. However, highmem.h is where all the current functions like this reside (zero_user(), clear_highpage(), clear_user_highpage(), copy_user_highpage(), and copy_highpage()). So it makes the most sense even though it is distasteful for some.[3] Lift memcpy_to_page() and memcpy_from_page() to pagemap.h. [1] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/ https://lore.kernel.org/lkml/20201013112544.GA5249@infradead.org/ [2] https://lore.kernel.org/lkml/20201208122316.GH7338@casper.infradead.org/ [3] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/#t https://lore.kernel.org/lkml/20201208163814.GN1563847@iweiny-DESK2.sc.intel.com/ Cc: Boris Pismenny <borisp@mellanox.com> Cc: Or Gerlitz <gerlitz.or@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds2021-02-211-2/+19
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
| * | iov_iter: optimise bvec iov_iter_advance()Pavel Begunkov2021-01-251-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iov_iter_advance() is heavily used, but implemented through generic means. For bvecs there is a specifically crafted function for that, so use bvec_iter_advance() instead, it's faster and slimmer. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | bvec/iter: disallow zero-length segment bvecsPavel Begunkov2021-01-251-2/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | | | zero-length bvec segments are allowed in general, but not handled by bio and down the block layer so filtered out. This inconsistency may be confusing and prevent from optimisations. As zero-length segments are useless and places that were generating them are patched, declare them not allowed. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* / udp: fix skb_copy_and_csum_datagram with odd segment sizesWillem de Bruijn2021-02-041-10/+14
|/ | | | | | | | | | | | | | | | | | | | | | | When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd. The open coded implementation of skb_copy_and_csum_datagram did this. With the switch to __skb_datagram_iter calling csum_and_copy_to_iter, pos was reinitialized to 0 on each call. Bring back the pos by passing it along with the csum to the callback. Changes v1->v2 - pass csum value, instead of csump pointer (Alexander Duyck) Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Reported-by: Oliver Graute <oliver.graute@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* iov_iter: fix the uaccess area in copy_compat_iovec_from_userChristoph Hellwig2021-01-151-1/+1
| | | | | | | | | sizeof needs to be called on the compat pointer, not the native one. Fixes: 89cd35c58bc2 ("iov_iter: transparently handle compat iovecs in import_iovec") Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lib, uaccess: add failure injection to usercopy functionsAlbert van der Linde2020-10-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | To test fault-tolerance of user memory access functions, introduce fault injection to usercopy functions. If a failure is expected return either -EFAULT or the total amount of bytes that were not copied. Signed-off-by: Albert van der Linde <alinde@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20200831171733.955393-3-alinde@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'work.iov_iter' of ↵Linus Torvalds2020-10-121-42/+136
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat iovec cleanups from Al Viro: "Christoph's series around import_iovec() and compat variant thereof" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: security/keys: remove compat_keyctl_instantiate_key_iov mm: remove compat_process_vm_{readv,writev} fs: remove compat_sys_vmsplice fs: remove the compat readv/writev syscalls fs: remove various compat readv/writev helpers iov_iter: transparently handle compat iovecs in import_iovec iov_iter: refactor rw_copy_check_uvector and import_iovec iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c compat.h: fix a spelling error in <linux/compat.h>
| * iov_iter: transparently handle compat iovecs in import_iovecChristoph Hellwig2020-10-031-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use in compat_syscall to import either native or the compat iovecs, and remove the now superflous compat_import_iovec. This removes the need for special compat logic in most callers, and the remaining ones can still be simplified by using __import_iovec with a bool compat parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * iov_iter: refactor rw_copy_check_uvector and import_iovecChristoph Hellwig2020-10-031-186/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split rw_copy_check_uvector into two new helpers with more sensible calling conventions: - iovec_from_user copies a iovec from userspace either into the provided stack buffer if it fits, or allocates a new buffer for it. Returns the actually used iovec. It also verifies that iov_len does fit a signed type, and handles compat iovecs if the compat flag is set. - __import_iovec consolidates the native and compat versions of import_iovec. It calls iovec_from_user, then validates each iovec actually points to user addresses, and ensures the total length doesn't overflow. This has two major implications: - the access_process_vm case loses the total lenght checking, which wasn't required anyway, given that each call receives two iovecs for the local and remote side of the operation, and it verifies the total length on the local side already. - instead of a single loop there now are two loops over the iovecs. Given that the iovecs are cache hot this doesn't make a major difference Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * iov_iter: move rw_copy_check_uvector() into lib/iov_iter.cDavid Laight2020-09-251-0/+176
| | | | | | | | | | | | | | | | | | This lets the compiler inline it into import_iovec() generating much better code. Signed-off-by: David Laight <david.laight@aculab.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'work.csum_and_copy' of ↵Linus Torvalds2020-10-121-12/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull copy_and_csum cleanups from Al Viro: "Saner calling conventions for csum_and_copy_..._user() and friends" [ Removing 800+ lines of code and cleaning stuff up is good - Linus ] * 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ppc: propagate the calling conventions change down to csum_partial_copy_generic() amd64: switch csum_partial_copy_generic() to new calling conventions sparc64: propagate the calling convention changes down to __csum_partial_copy_...() xtensa: propagate the calling conventions change down into csum_partial_copy_generic() mips: propagate the calling convention change down into __csum_partial_copy_..._user() mips: __csum_partial_copy_kernel() has no users left mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic() i386: propagate the calling conventions change down to csum_partial_copy_generic() sh: propage the calling conventions change down to csum_partial_copy_generic() m68k: get rid of zeroing destination on error in csum_and_copy_from_user() arm: propagate the calling convention changes down to csum_partial_copy_from_user() alpha: propagate the calling convention changes down to csum_partial_copy.c helpers saner calling conventions for csum_and_copy_..._user() csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum csum_partial_copy_nocheck(): drop the last argument unify generic instances of csum_partial_copy_nocheck() icmp_push_reply(): reorder adding the checksum up skb_copy_and_csum_bits(): don't bother with the last argument
| * | saner calling conventions for csum_and_copy_..._user()Al Viro2020-08-201-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers of these primitives will * discard anything we might've copied in case of error * ignore the csum value in case of error * always pass 0xffffffff as the initial sum, so the resulting csum value (in case of success, that is) will never be 0. That suggest the following calling conventions: * don't pass err_ptr - just return 0 on error. * don't bother with zeroing destination, etc. in case of error * don't pass the initial sum - just use 0xffffffff. This commit does the minimal conversion in the instances of csum_and_copy_...(); the changes of actual asm code behind them are done later in the series. Note that this asm code is often shared with csum_partial_copy_nocheck(); the difference is that csum_partial_copy_nocheck() passes 0 for initial sum while csum_and_copy_..._user() pass 0xffffffff. Fortunately, we are free to pass 0xffffffff in all cases and subsequent patches will use that freedom without any special comments. A part that could be split off: parisc and uml/i386 claimed to have csum_and_copy_to_user() instances of their own, but those were identical to the generic one, so we simply drop them. Not sure if it's worth a separate commit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sumAl Viro2020-08-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Preparation for the change of calling conventions; right now all callers pass 0 as initial sum. Passing 0xffffffff instead yields the values comparable mod 0xffff and guarantees that 0 will not be returned on success. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | csum_partial_copy_nocheck(): drop the last argumentAl Viro2020-08-201-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | It's always 0. Note that we theoretically could use ~0U as well - result will be the same modulo 0xffff, _if_ the damn thing did the right thing for any value of initial sum; later we'll make use of that when convenient. However, unlike csum_and_copy_..._user(), there are instances that did not work for arbitrary initial sums; c6x is one such. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* / x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams2020-10-061-25/+23
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
* iov_iter: Move unnecessary inclusion of crypto/hash.hHerbert Xu2020-06-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | The header file linux/uio.h includes crypto/hash.h which pulls in most of the Crypto API. Since linux/uio.h is used throughout the kernel this means that every tiny bit of change to the Crypto API causes the entire kernel to get rebuilt. This patch fixes this by moving it into lib/iov_iter.c instead where it is actually used. This patch also fixes the ifdef to use CRYPTO_HASH instead of just CRYPTO which does not guarantee the existence of ahash. Unfortunately a number of drivers were relying on linux/uio.h to provide access to linux/slab.h. This patch adds inclusions of linux/slab.h as detected by build failures. Also skbuff.h was relying on this to provide a declaration for ahash_request. This patch adds a forward declaration instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: Use generic instrumented.hMarco Elver2020-03-211-3/+4
| | | | | | | | | | | | This replaces the kasan instrumentation with generic instrumentation, implicitly adding KCSAN instrumentation support. For KASAN no functional change is intended. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* pipe: Fix bogus dereference in iov_iter_alignment()Jan Kara2019-12-161-1/+2
| | | | | | | | | | | We cannot look at 'i->pipe' unless we know the iter is a pipe. Move the ring_size load to a branch in iov_iter_alignment() where we've already checked the iter is a pipe to avoid bogus dereference. Reported-by: syzbot+bea68382bae9490e7dd6@syzkaller.appspotmail.com Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds2019-12-011-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
| * compat_ioctl: reimplement SG_IO handlingArnd Bergmann2019-10-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two code locations that implement the SG_IO ioctl: the old sg.c driver, and the generic scsi_ioctl helper that is in turn used by multiple drivers. To eradicate the old compat_ioctl conversion handler for the SG_IO command, I implement a readable pair of put_sg_io_hdr() /get_sg_io_hdr() helper functions that can be used for both compat and native mode, and then I call this from both drivers. For the iovec handling, there is already a compat_import_iovec() function that can simply be called in place of import_iovec(). To avoid having to pass the compat/native state through multiple indirections, I mark the SG_IO command itself as compatible in fs/compat_ioctl.c and use in_compat_syscall() to figure out where we are called from. As a side-effect of this, the sg.c driver now also accepts the 32-bit sg_io_hdr format in compat mode using the read/write interface, not just ioctl. This should improve compatiblity with old 32-bit binaries, but it would break if any application intentionally passes the 64-bit data structure in compat mode here. Steffen Maier helped debug an issue in an earlier version of this patch. Cc: Steffen Maier <maier@linux.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: Doug Gilbert <dgilbert@interlog.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | pipe: Allow pipes to have kernel-reserved slotsDavid Howells2019-11-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split pipe->ring_size into two numbers: (1) pipe->ring_size - indicates the hard size of the pipe ring. (2) pipe->max_usage - indicates the maximum number of pipe ring slots that userspace orchestrated events can fill. This allows for a pipe that is both writable by the general kernel notification facility and by userspace, allowing plenty of ring space for notifications to be added whilst preventing userspace from being able to pin too much unswappable kernel space. Signed-off-by: David Howells <dhowells@redhat.com>
* | pipe: Use head and tail pointers for the ring, not cursor and lengthDavid Howells2019-10-311-118/+151
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert pipes to use head and tail pointers for the buffer ring rather than pointer and length as the latter requires two atomic ops to update (or a combined op) whereas the former only requires one. (1) The head pointer is the point at which production occurs and points to the slot in which the next buffer will be placed. This is equivalent to pipe->curbuf + pipe->nrbufs. The head pointer belongs to the write-side. (2) The tail pointer is the point at which consumption occurs. It points to the next slot to be consumed. This is equivalent to pipe->curbuf. The tail pointer belongs to the read-side. (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They are only masked off when the array is being accessed, e.g.: pipe->bufs[head & mask] This means that it is not necessary to have a dead slot in the ring as head == tail isn't ambiguous. (4) The ring is empty if "head == tail". A helper, pipe_empty(), is provided for this. (5) The occupancy of the ring is "head - tail". A helper, pipe_occupancy(), is provided for this. (6) The number of free slots in the ring is "pipe->ring_size - occupancy". A helper, pipe_space_for_user() is provided to indicate how many slots userspace may use. (7) The ring is full if "head - tail >= pipe->ring_size". A helper, pipe_full(), is provided for this. Signed-off-by: David Howells <dhowells@redhat.com>
* mm: introduce page_size()Matthew Wilcox (Oracle)2019-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | Patch series "Make working with compound pages easier", v2. These three patches add three helpers and convert the appropriate places to use them. This patch (of 3): It's unnecessarily hard to find out the size of a potentially huge page. Replace 'PAGE_SIZE << compound_order(page)' with page_size(page). Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* uio: make import_iovec()/compat_import_iovec() return bytes on successJens Axboe2019-05-311-7/+8
| | | | | | | | | | | Currently these functions return < 0 on error, and 0 for success. Change that so that we return < 0 on error, but number of bytes for success. Some callers already treat the return value that way, others need a slight tweak. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* treewide: Add SPDX license identifier for missed filesThomas Gleixner2019-05-211-0/+1
| | | | | | | | | | | | | | | | | Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny2019-05-141-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* iov_iter: Fix build error without CONFIG_CRYPTOYueHaibing2019-04-031-0/+4
| | | | | | | | | | | | | | | If CONFIG_CRYPTO is not set or set to m, gcc building warn this: lib/iov_iter.o: In function `hash_and_copy_to_iter': iov_iter.c:(.text+0x9129): undefined reference to `crypto_stats_get' iov_iter.c:(.text+0x9152): undefined reference to `crypto_stats_ahash_update' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: d05f443554b3 ("iov_iter: introduce hash_and_copy_to_iter helper") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: optimize page_copy_sane()Eric Dumazet2019-02-261-2/+15
| | | | | | | | | | | | | | | | | Avoid cache line miss dereferencing struct page if we can. page_copy_sane() mostly deals with order-0 pages. Extra cache line miss is visible on TCP recvmsg() calls dealing with GRO packets (typically 45 page frags are attached to one skb). Bringing the 45 struct pages into cpu cache while copying the data is not free, since the freeing of the skb (and associated page frags put_page()) can happen after cache lines have been evicted. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2019-01-051-27/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull trivial vfs updates from Al Viro: "A few cleanups + Neil's namespace_unlock() optimization" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: exec: make prepare_bprm_creds static genheaders: %-<width>s had been there since v6; %-*s - since v7 VFS: use synchronize_rcu_expedited() in namespace_unlock() iov_iter: reduce code duplication
| * iov_iter: reduce code duplicationAl Viro2018-11-271-27/+27
| | | | | | | | | | | | | | The same combination of csum_partial_copy_nocheck() with csum_add_block() is used in a bunch of places. Add a helper doing just that and use it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>