summaryrefslogtreecommitdiffstats
path: root/io_uring
Commit message (Collapse)AuthorAgeFilesLines
* io_uring/rw: fix short rw error handlingPavel Begunkov2022-09-091-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | We have a couple of problems, first reports of unexpected link breakage for reads when cqe->res indicates that the IO was done in full. The reason here is partial IO with retries. TL;DR; we compare the result in __io_complete_rw_common() against req->cqe.res, but req->cqe.res doesn't store the full length but rather the length left to be done. So, when we pass the full corrected result via kiocb_done() -> __io_complete_rw_common(), it fails. The second problem is that we don't try to correct res in io_complete_rw(), which, for instance, might be a problem for O_DIRECT but when a prefix of data was cached in the page cache. We also definitely don't want to pass a corrected result into io_rw_done(). The fix here is to leave __io_complete_rw_common() alone, always pass not corrected result into it and fix it up as the last step just before actually finishing the I/O. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://github.com/axboe/liburing/issues/643 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: copy addr for zc on POLL_FIRSTPavel Begunkov2022-09-081-3/+4
| | | | | | | | | | | | Every time we return from an issue handler and expect the request to be retried we should also setup it for async exec ourselves. Do that when we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll re-read the address, which might be a surprise for the userspace. Fixes: 092aeedb750a9 ("io_uring: allow to pass addr into sendzc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: recycle kbuf recycle on tw requeuePavel Begunkov2022-09-071-0/+1
| | | | | | | | | | | | When we queue a request via tw for execution it's not going to be executed immediately, so when io_queue_async() hits IO_APOLL_READY and queues a tw but doesn't try to recycle/consume the buffer some other request may try to use the the buffer. Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a19bc9e211e3184215a58e129b62f440180e9212.1662480490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/kbuf: fix not advancing READV kbuf ringPavel Begunkov2022-09-071-2/+6
| | | | | | | | | | | | When we don't recycle a selected ring buffer we should advance the head of the ring, so don't just skip io_kbuf_recycle() for IORING_OP_READV but adjust the ring. Fixes: 934447a603b22 ("io_uring: do not recycle buffer in READV") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/a6d85e2611471bcb5d5dcd63a8342077ddc2d73d.1662480490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/notif: Remove the unused function io_notif_complete()Jiapeng Chong2022-09-051-8/+0
| | | | | | | | | | | | | | The function io_notif_complete() is defined in the notif.c file, but not called elsewhere, so delete this unused function. io_uring/notif.c:24:20: warning: unused function 'io_notif_complete' [-Wunused-function]. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2047 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220905020436.51894-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: simplify zerocopy send user APIPavel Begunkov2022-09-016-74/+42
| | | | | | | | | | | | | | | | | | | | | | | | Following user feedback, this patch simplifies zerocopy send API. One of the main complaints is that the current API is difficult with the userspace managing notification slots, and then send retries with error handling make it even worse. Instead of keeping notification slots change it to the per-request notifications model, which posts both completion and notification CQEs for each request when any data has been sent, and only one CQE if it fails. All notification CQEs will have IORING_CQE_F_NOTIF set and IORING_CQE_F_MORE in completion CQEs indicates whether to wait a notification or not. IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now. This is less flexible, but greatly simplifies the user API and also the kernel implementation. We reuse notif helpers in this patch, but in the future there won't be need for keeping two requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/notif: remove notif registrationPavel Begunkov2022-09-014-95/+1
| | | | | | | | | We're going to remove the userspace exposed zerocopy notification API, remove notification registration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Revert "io_uring: rename IORING_OP_FILES_UPDATE"Pavel Begunkov2022-09-013-22/+8
| | | | | | | | | | | This reverts commit 4379d5f15b3fd4224c37841029178aa8082a242e. We removed notification flushing, also cleanup uapi preparation changes to not pollute it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89edc3905350f91e1b6e26d9dbf42ee44fd451a2.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Revert "io_uring: add zc notification flush requests"Pavel Begunkov2022-09-011-38/+0
| | | | | | | | | | | This reverts commit 492dddb4f6e3a5839c27d41ff1fecdbe6c3ab851. Soon we won't have the very notion of notification flushing, so remove notification flushing requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8850334ca56e65b413cb34fd158db81d7b2865a3.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: fix overexcessive retriesPavel Begunkov2022-08-261-1/+1
| | | | | | | | | | | | | Length parameter of io_sg_from_iter() can be smaller than the iterator's size, as it's with TCP, so when we set from->count at the end of the function we truncate the iterator forcing TCP to return preliminary with a short send. It affects zerocopy sends with large payload sizes and leads to retries and possible request failures. Fixes: 3ff1a0d395c00 ("io_uring: enable managed frags with register buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: save address for sendzc async executionPavel Begunkov2022-08-253-8/+50
| | | | | | | | | | | | | We usually copy all bits that a request needs from the userspace for async execution, so the userspace can keep them on the stack. However, send zerocopy violates this pattern for addresses and may reloads it e.g. from io-wq. Save the address if any in ->async_data as usual. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com [axboe: fold in incremental fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: conditional ->async_data allocationPavel Begunkov2022-08-242-3/+6
| | | | | | | | | | | | | There are opcodes that need ->async_data only in some cases and allocation it unconditionally may hurt performance. Add an option to opdef to make move the allocation part from the core io_uring to opcode specific code. Note, we can't just set opdef->async_size to zero because there are other helpers that rely on it, e.g. io_alloc_async_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/notif: order notif vs send CQEsPavel Begunkov2022-08-241-2/+4
| | | | | | | | | | | | | | | | | | | Currently, there is no ordering between notification CQEs and completions of the send flushing it, this quite complicates the userspace, especially since we don't flush notification when the send(+flush) request fails, i.e. there will be only one CQE. What we can do is to make sure that notification completions come only after sends. The easiest way to achieve this is to not try to complete a notification inline from io_sendzc() but defer it to task_work, considering that io-wq sendzc is disallowed CQEs will be naturally ordered because task_works will only be executed after we're done with submission and so inline completion. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cddfd1c2bf91f22b9fe08e13b7dffdd8f858a151.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: fix indentationPavel Begunkov2022-08-241-1/+1
| | | | | | | | Fix up indentation before we get complaints from tooling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: fix zc send link failingPavel Begunkov2022-08-241-0/+1
| | | | | | | | | | | | Failed requests should be marked with req_set_fail(), so links and cqe skipping work correctly, which is missing in io_sendzc(). Note, io_sendzc() return IOU_OK on failure, so the core code won't do the cleanup for us. Fixes: 06a5464be84e4 ("io_uring: wire send zc request type") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: fix must_hold annotationPavel Begunkov2022-08-241-1/+1
| | | | | | | | | | Fix up the io_alloc_notif()'s __must_hold as we don't have a ctx argument there but should get it from the slot instead. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbb0a920f18e0aed590bf58300af817b9befb8a3.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix submission-failure handling for uring-cmdKanchan Joshi2022-08-231-1/+1
| | | | | | | | | | | If ->uring_cmd returned an error value different from -EAGAIN or -EIOCBQUEUED, it gets overridden with IOU_OK. This invites trouble as caller (io_uring core code) handles IOU_OK differently than other error codes. Fix this by returning the actual error code. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix off-by-one in sync cancelation file checkJens Axboe2022-08-231-1/+1
| | | | | | | | | | The passed in index should be validated against the number of registered files we have, it needs to be smaller than the index value to avoid going one beyond the end. Fixes: 78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()") Reported-by: Luo Likang <luolikang@nsfocus.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: use right helpers for async_dataPavel Begunkov2022-08-181-2/+2
| | | | | | | | | | There is another spot where we check ->async_data directly instead of using req_has_async_data(), which is the way to do it, fix it up. Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/notif: raise limit on notification slotsPavel Begunkov2022-08-151-1/+1
| | | | | | | | | 1024 notification slots is rather an arbitrary value, raise it up, everything is accounted to memcg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eb78a0a5f2fa5941f8e845cdae5fb399bf7ba0be.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: improve zc addr import error handlingPavel Begunkov2022-08-151-8/+8
| | | | | | | | | | We may account memory to a memcg of a request that didn't even got to the network layer. It's not a bug as it'll be routinely cleaned up on flush, but it might be confusing for the userspace. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8aae61f4c3ddc4da97c1da876bb73871f352d50.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: use right helpers for async recyclePavel Begunkov2022-08-151-1/+1
| | | | | | | | | | | We have a helper that checks for whether a request contains anything in ->async_data or not, namely req_has_async_data(). It's better to use it as it might have some extra considerations. Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7414da4e7c3c32c31fc02dfd1355af4ccf4ca5f.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds2022-08-1322-147/+170
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: - Regression fix for this merge window, fixing a wrong order of arguments for io_req_set_res() for passthru (Dylan) - Fix for the audit code leaking context memory (Peilin) - Ensure that provided buffers are memcg accounted (Pavel) - Correctly handle short zero-copy sends (Pavel) - Sparse warning fixes for the recvmsg multishot command (Dylan) - Error handling fix for passthru (Anuj) - Remove randomization of struct kiocb fields, to avoid it growing in size if re-arranged in such a fashion that it grows more holes or padding (Keith, Linus) - Small series improving type safety of the sqe fields (Stefan) * tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block: io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields io_uring: make io_kiocb_to_cmd() typesafe fs: don't randomize struct kiocb fields io_uring: consistently make use of io_notif_to_data() io_uring: fix error handling for io_uring_cmd io_uring: fix io_recvmsg_prep_multishot sparse warnings io_uring/net: send retry for zerocopy io_uring: mem-account pbuf buckets audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() io_uring: pass correct parameters to io_req_set_res
| * io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fieldsStefan Metzmacher2022-08-122-3/+19
| | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/ffcaf8dc4778db4af673822df60dbda6efdd3065.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: make io_kiocb_to_cmd() typesafeStefan Metzmacher2022-08-1219-128/+126
| | | | | | | | | | | | | | | | | | We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: consistently make use of io_notif_to_data()Stefan Metzmacher2022-08-111-1/+1
| | | | | | | | | | | | | | | | | | This makes the assignment typesafe. It prepares changing io_kiocb_to_cmd() in the next commit. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/8da6e9d12cf95ad4bc73274406d12bca7aabf72e.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: fix error handling for io_uring_cmdAnuj Gupta2022-08-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 97b388d70b53 ("io_uring: handle completions in the core") moved the error handling from handler to core. But for io_uring_cmd handler we end up completing more than once (both in handler and in core) leading to use_after_free. Change io_uring_cmd handler to avoid calling io_uring_cmd_done in case of error. Fixes: 97b388d70b53 ("io_uring: handle completions in the core") Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220811091459.6929-1-anuj20.g@samsung.com [axboe: fix ret vs req typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: fix io_recvmsg_prep_multishot sparse warningsDylan Yudaken2022-08-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fix casts missing the __user parts. This seemed to only cause errors on the alpha build, or if checked with sparse, but it was definitely an oversight. Reported-by: kernel test robot <lkp@intel.com> Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220805115450.3921352-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: send retry for zerocopyPavel Begunkov2022-08-041-3/+17
| | | | | | | | | | | | | | | | | | | | io_uring handles short sends/recvs for stream sockets when MSG_WAITALL is set, however new zerocopy send is inconsistent in this regard, which might be confusing. Handle short sends. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b876a4838597d9bba4f3215db60d72c33c448ad0.1659622472.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: mem-account pbuf bucketsPavel Begunkov2022-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Potentially, someone may create as many pbuf bucket as there are indexes in an xarray without any other restrictions bounding our memory usage, put memory needed for the buckets under memory accounting. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d34c452e45793e978d26e2606211ec9070d329ea.1659622312.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()Peilin Ye2022-08-042-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently @audit_context is allocated twice for io_uring workers: 1. copy_process() calls audit_alloc(); 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which is effectively audit_alloc()) and overwrites @audit_context, causing: BUG: memory leak unreferenced object 0xffff888144547400 (size 1024): <...> hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8135cfc3>] audit_alloc+0x133/0x210 [<ffffffff81239e63>] copy_process+0xcd3/0x2340 [<ffffffff8123b5f3>] create_io_thread+0x63/0x90 [<ffffffff81686604>] create_io_worker+0xb4/0x230 [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0 [<ffffffff8167663a>] io_queue_iowq+0xba/0x200 [<ffffffff816768b3>] io_queue_async+0x113/0x180 [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0 [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120 [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570 [<ffffffff81272c4e>] task_work_run+0x7e/0xc0 [<ffffffff8125a688>] get_signal+0xc18/0xf10 [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730 [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180 [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20 [<ffffffff844a7e80>] do_syscall_64+0x40/0x80 Then, 3. io_sq_thread() or io_wqe_worker() frees @audit_context using audit_free(); 4. do_exit() eventually calls audit_free() again, which is okay because audit_free() does a NULL check. As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and redundant audit_free() calls. Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Suggested-by: Paul Moore <paul@paul-moore.com> Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: pass correct parameters to io_req_set_resMing Lei2022-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | The two parameters of 'res' and 'cflags' are swapped, so fix it. Without this fix, 'ublk del' hangs forever. Cc: Pavel Begunkov <asml.silence@gmail.com> Fixes: de23077eda61f ("io_uring: set completion results upfront") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220803120757.1668278-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'pull-work.iov_iter-base' of ↵Linus Torvalds2022-08-031-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter updates from Al Viro: "Part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(). new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's the explanation for the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work..." * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: first_iovec_segment(): just return address iov_iter: massage calling conventions for first_{iovec,bvec}_segment() iov_iter: first_{iovec,bvec}_segment() - simplify a bit iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT iov_iter_bvec_advance(): don't bother with bvec_iter copy_page_{to,from}_iter(): switch iovec variants to generic keep iocb_flags() result cached in struct file iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC struct file: use anonymous union member for rcuhead and llist btrfs: use IOMAP_DIO_NOSYNC teach iomap_dio_rw() to suppress dsync No need of likely/unlikely on calls of check_copy_size()
* Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of ↵Linus Torvalds2022-08-0211-92/+601
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.dk/linux-block Pull io_uring zerocopy support from Jens Axboe: "This adds support for efficient support for zerocopy sends through io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and UDP. The core network changes to support this is in a stable branch from Jakub that both io_uring and net-next has pulled in, and the io_uring changes are layered on top of that. All of the work has been done by Pavel" * tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits) io_uring: notification completion optimisation io_uring: export req alloc from core io_uring/net: use unsigned for flags io_uring/net: make page accounting more consistent io_uring/net: checks errors of zc mem accounting io_uring/net: improve io_get_notif_slot types selftests/io_uring: test zerocopy send io_uring: enable managed frags with register buffers io_uring: add zc notification flush requests io_uring: rename IORING_OP_FILES_UPDATE io_uring: flush notifiers after sendzc io_uring: sendzc with fixed buffers io_uring: allow to pass addr into sendzc io_uring: account locked pages for non-fixed zc io_uring: wire send zc request type io_uring: add notification slot registration io_uring: add rsrc referencing for notifiers io_uring: complete notifiers in tw io_uring: cache struct io_notif io_uring: add zc notification infrastructure ...
| * io_uring: notification completion optimisationPavel Begunkov2022-07-274-141/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to use all optimisations that we have for io_uring requests like completion batching, memory caching and more but for zc notifications. Fortunately, notification perfectly fit the request model so we can overlay them onto struct io_kiocb and use all the infratructure. Most of the fields of struct io_notif natively fits into io_kiocb, so we replace struct io_notif with struct io_kiocb carrying struct io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand coded caching. __io_notif_complete_tw() is converted to use io_uring's tw infra. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: export req alloc from corePavel Begunkov2022-07-272-21/+22
| | | | | | | | | | | | | | | | | | We want to do request allocation out of the core io_uring code, make the allocation functions public for other io_uring parts. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0314fedd3a02a514210ba42d4720332538c65956.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: use unsigned for flagsPavel Begunkov2022-07-251-2/+2
| | | | | | | | | | | | | | | | Use unsigned int type for msg flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5cfaed13d3191337b14b8664ca68b515d9e2d1b4.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: make page accounting more consistentPavel Begunkov2022-07-255-14/+37
| | | | | | | | | | | | | | | | | | Make network page accounting more consistent with how buffer registration is working, i.e. account all memory to ctx->user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4aacfe64bbb81b27f9ecf5d5c219c69a07e5aa56.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: checks errors of zc mem accountingPavel Begunkov2022-07-251-1/+3
| | | | | | | | | | | | | | | | | | mm_account_pinned_pages() may fail, don't ignore the return value. Fixes: e29e3bd4b968 ("io_uring: account locked pages for non-fixed zc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dae0542ed8e6706071bb83ad3e7ad6a70d207fd9.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: improve io_get_notif_slot typesPavel Begunkov2022-07-251-1/+1
| | | | | | | | | | | | | | | | Don't use signed int for slot indexing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4d15aefebb5e55729dd9b5ec01ab16b70033343.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: enable managed frags with register buffersPavel Begunkov2022-07-241-1/+55
| | | | | | | | | | | | | | | | | | | | io_uring's registered buffers infra has a good performant way of pinning pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely register buffer backed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: add zc notification flush requestsPavel Begunkov2022-07-241-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df13e2363400682a73dd9e71c3b990b8d1ff0333.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: rename IORING_OP_FILES_UPDATEPavel Begunkov2022-07-243-8/+22
| | | | | | | | | | | | | | | | | | | | IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0a907133907d9af3415a8a7aa1802c6aa97c03c6.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: flush notifiers after sendzcPavel Begunkov2022-07-245-12/+27
| | | | | | | | | | | | | | | | | | | | Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: sendzc with fixed buffersPavel Begunkov2022-07-241-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com [axboe: fold in 32-bit pointer cast warning fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: allow to pass addr into sendzcPavel Begunkov2022-07-241-2/+16
| | | | | | | | | | | | | | | | | | Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: account locked pages for non-fixed zcPavel Begunkov2022-07-242-0/+7
| | | | | | | | | | | | | | | | | | Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: wire send zc request typePavel Begunkov2022-07-243-0/+112
| | | | | | | | | | | | | | | | | | | | | | Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a80387c6a68ce9cf99b3b6ef6f71068468761fb7.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: add notification slot registrationPavel Begunkov2022-07-243-0/+55
| | | | | | | | | | | | | | | | Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a0aa8161fe3ebb2a4cc6e5dbd0cffb96e6881cf5.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: add rsrc referencing for notifiersPavel Begunkov2022-07-243-3/+15
| | | | | | | | | | | | | | | | | | | | | | In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cd7a01d26837945b6982fa9cf15a63230f2ed4f.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>