summaryrefslogtreecommitdiffstats
path: root/io_uring
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'io_uring-6.0-2022-09-23' of git://git.kernel.dk/linuxLinus Torvalds2022-09-241-0/+3
|\ | | | | | | | | | | | | | | | | Pull io_uring fix from Jens Axboe: "Just a single fix for an issue with un-reaped IOPOLL requests on ring exit" * tag 'io_uring-6.0-2022-09-23' of git://git.kernel.dk/linux: io_uring: ensure that cached task references are always put on exit
| * io_uring: ensure that cached task references are always put on exitJens Axboe2022-09-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | io_uring caches task references to avoid doing atomics for each of them per request. If a request is put from the same task that allocated it, then we can maintain a per-ctx cache of them. This obviously relies on io_uring always pruning caches in a reliable way, and there's currently a case off io_uring fd release where we can miss that. One example is a ring setup with IOPOLL, which relies on the task polling for completions, which will free them. However, if such a task submits a request and then exits or closes the ring without reaping the completion, then ring release will reap and put. If release happens from that very same task, the completed request task refs will get put back into the cache pool. This is problematic, as we're now beyond the point of pruning caches. Manually drop these caches after doing an IOPOLL reap. This releases references from the current task, which is enough. If another task happens to be doing the release, then the caching will not be triggered and there's no issue. Cc: stable@vger.kernel.org Fixes: e98e49b2bbf7 ("io_uring: extend task put optimisations") Reported-by: Homin Rhee <hominlab@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'io_uring-6.0-2022-09-18' of git://git.kernel.dk/linuxLinus Torvalds2022-09-182-9/+9
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "Nothing really major here, but figured it'd be nicer to just get these flushed out for -rc6 so that the 6.1 branch will have them as well. That'll make our lives easier going forward in terms of development, and avoid trivial conflicts in this area. - Simple trace rename so that the returned opcode name is consistent with the enum definition (Stefan) - Send zc rsrc request vs notification lifetime fix (Pavel)" * tag 'io_uring-6.0-2022-09-18' of git://git.kernel.dk/linux: io_uring/opdef: rename SENDZC_NOTIF to SEND_ZC io_uring/net: fix zc fixed buf lifetime
| * io_uring/opdef: rename SENDZC_NOTIF to SEND_ZCStefan Metzmacher2022-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's confusing to see the string SENDZC_NOTIF in ftrace output when using IORING_OP_SEND_ZC. Fixes: b48c312be05e8 ("io_uring/net: simplify zerocopy send user API") Signed-off-by: Stefan Metzmacher <metze@samba.org> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: io-uring@vger.kernel.org Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8e5cd8616919c92b6c3c7b6ea419fdffd5b97f3c.1663363798.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: fix zc fixed buf lifetimePavel Begunkov2022-09-181-8/+8
| | | | | | | | | | | | | | | | | | | | | | Notifications usually outlive requests, so we need to pin buffers with it by assigning a rsrc to it instead of the request. Fixed: b48c312be05e8 ("io_uring/net: simplify zerocopy send user API") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dd6406ff8a90887f2b36ed6205dac9fda17c1f35.1663366886.git.asml.silence@gmail.com Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-blockLinus Torvalds2022-09-162-2/+3
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "Two small patches: - Fix using an unsigned type for the return value, introduced in this release (Pavel) - Stable fix for a missing check for a fixed file on put (me)" * tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-block: io_uring/msg_ring: check file type before putting io_uring/rw: fix error'ed retry return values
| * io_uring/msg_ring: check file type before puttingJens Axboe2022-09-151-1/+2
| | | | | | | | | | | | | | | | | | | | If we're invoked with a fixed file, follow the normal rules of not calling io_fput_file(). Fixed files are permanently registered to the ring, and do not need putting separately. Cc: stable@vger.kernel.org Fixes: aa184e8671f0 ("io_uring: don't attempt to IOPOLL for MSG_RING requests") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/rw: fix error'ed retry return valuesPavel Begunkov2022-09-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Kernel test robot reports that we test negativity of an unsigned in io_fixup_rw_res() after a recent change, which masks error codes and messes up the return value in case I/O is re-retried and failed with an error. Fixes: 4d9cb92ca41dd ("io_uring/rw: fix short rw error handling") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9754a0970af1861e7865f9014f735c70dc60bf79.1663071587.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-blockLinus Torvalds2022-09-095-25/+29
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: - Removed function that became unused after last week's merge (Jiapeng) - Two small fixes for kbuf recycling (Pavel) - Include address copy for zc send for POLLFIRST (Pavel) - Fix for short IO handling in the normal read/write path (Pavel) * tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-block: io_uring/rw: fix short rw error handling io_uring/net: copy addr for zc on POLL_FIRST io_uring: recycle kbuf recycle on tw requeue io_uring/kbuf: fix not advancing READV kbuf ring io_uring/notif: Remove the unused function io_notif_complete()
| * io_uring/rw: fix short rw error handlingPavel Begunkov2022-09-091-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a couple of problems, first reports of unexpected link breakage for reads when cqe->res indicates that the IO was done in full. The reason here is partial IO with retries. TL;DR; we compare the result in __io_complete_rw_common() against req->cqe.res, but req->cqe.res doesn't store the full length but rather the length left to be done. So, when we pass the full corrected result via kiocb_done() -> __io_complete_rw_common(), it fails. The second problem is that we don't try to correct res in io_complete_rw(), which, for instance, might be a problem for O_DIRECT but when a prefix of data was cached in the page cache. We also definitely don't want to pass a corrected result into io_rw_done(). The fix here is to leave __io_complete_rw_common() alone, always pass not corrected result into it and fix it up as the last step just before actually finishing the I/O. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://github.com/axboe/liburing/issues/643 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: copy addr for zc on POLL_FIRSTPavel Begunkov2022-09-081-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Every time we return from an issue handler and expect the request to be retried we should also setup it for async exec ourselves. Do that when we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll re-read the address, which might be a surprise for the userspace. Fixes: 092aeedb750a9 ("io_uring: allow to pass addr into sendzc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: recycle kbuf recycle on tw requeuePavel Begunkov2022-09-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | When we queue a request via tw for execution it's not going to be executed immediately, so when io_queue_async() hits IO_APOLL_READY and queues a tw but doesn't try to recycle/consume the buffer some other request may try to use the the buffer. Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a19bc9e211e3184215a58e129b62f440180e9212.1662480490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/kbuf: fix not advancing READV kbuf ringPavel Begunkov2022-09-071-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | When we don't recycle a selected ring buffer we should advance the head of the ring, so don't just skip io_kbuf_recycle() for IORING_OP_READV but adjust the ring. Fixes: 934447a603b22 ("io_uring: do not recycle buffer in READV") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/a6d85e2611471bcb5d5dcd63a8342077ddc2d73d.1662480490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/notif: Remove the unused function io_notif_complete()Jiapeng Chong2022-09-051-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The function io_notif_complete() is defined in the notif.c file, but not called elsewhere, so delete this unused function. io_uring/notif.c:24:20: warning: unused function 'io_notif_complete' [-Wunused-function]. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2047 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220905020436.51894-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-blockLinus Torvalds2022-09-028-230/+52
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: - A single fix for over-eager retries for networking (Pavel) - Revert the notification slot support for zerocopy sends. It turns out that even after more than a year or development and testing, there's not full agreement on whether just using plain ordered notifications is Good Enough to avoid the complexity of using the notifications slots. Because of that, we decided that it's best left to a future final decision. We can always bring back this feature, but we can't really change it or remove it once we've released 6.0 with it enabled. The reverts leave the usual CQE notifications as the primary interface for knowing when data was sent, and when it was acked. (Pavel) * tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block: selftests/net: return back io_uring zc send tests io_uring/net: simplify zerocopy send user API io_uring/notif: remove notif registration Revert "io_uring: rename IORING_OP_FILES_UPDATE" Revert "io_uring: add zc notification flush requests" selftests/net: temporarily disable io_uring zc test io_uring/net: fix overexcessive retries
| * io_uring/net: simplify zerocopy send user APIPavel Begunkov2022-09-016-74/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following user feedback, this patch simplifies zerocopy send API. One of the main complaints is that the current API is difficult with the userspace managing notification slots, and then send retries with error handling make it even worse. Instead of keeping notification slots change it to the per-request notifications model, which posts both completion and notification CQEs for each request when any data has been sent, and only one CQE if it fails. All notification CQEs will have IORING_CQE_F_NOTIF set and IORING_CQE_F_MORE in completion CQEs indicates whether to wait a notification or not. IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now. This is less flexible, but greatly simplifies the user API and also the kernel implementation. We reuse notif helpers in this patch, but in the future there won't be need for keeping two requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/notif: remove notif registrationPavel Begunkov2022-09-014-95/+1
| | | | | | | | | | | | | | | | | | We're going to remove the userspace exposed zerocopy notification API, remove notification registration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * Revert "io_uring: rename IORING_OP_FILES_UPDATE"Pavel Begunkov2022-09-013-22/+8
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 4379d5f15b3fd4224c37841029178aa8082a242e. We removed notification flushing, also cleanup uapi preparation changes to not pollute it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89edc3905350f91e1b6e26d9dbf42ee44fd451a2.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * Revert "io_uring: add zc notification flush requests"Pavel Begunkov2022-09-011-38/+0
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 492dddb4f6e3a5839c27d41ff1fecdbe6c3ab851. Soon we won't have the very notion of notification flushing, so remove notification flushing requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8850334ca56e65b413cb34fd158db81d7b2865a3.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: fix overexcessive retriesPavel Begunkov2022-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Length parameter of io_sg_from_iter() can be smaller than the iterator's size, as it's with TCP, so when we set from->count at the end of the function we truncate the iterator forcing TCP to return preliminary with a short send. It affects zerocopy sends with large payload sizes and leads to retries and possible request failures. Fixes: 3ff1a0d395c00 ("io_uring: enable managed frags with register buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'lsm-pr-20220829' of ↵Linus Torvalds2022-08-311-0/+5
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM support for IORING_OP_URING_CMD from Paul Moore: "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD. These are necessary as without them the IORING_OP_URING_CMD remains outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch, and my SELinux patch). They have been discussed at length with the io_uring folks, and Jens has given his thumbs-up on the relevant patches (see the commit descriptions). There is one patch that is not strictly necessary, but it makes testing much easier and is very trivial: the /dev/null IORING_OP_URING_CMD patch." * tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: Smack: Provide read control for io_uring_cmd /dev/null: add IORING_OP_URING_CMD support selinux: implement the security_uring_cmd() LSM hook lsm,io_uring: add LSM hooks for the new uring_cmd file op
| * lsm,io_uring: add LSM hooks for the new uring_cmd file opLuis Chamberlain2022-08-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | io-uring cmd support was added through ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd"), this extended the struct file_operations to allow a new command which each subsystem can use to enable command passthrough. Add an LSM specific for the command passthrough which enables LSMs to inspect the command details. This was discussed long ago without no clear pointer for something conclusive, so this enables LSMs to at least reject this new file operation. [0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Moore <paul@paul-moore.com>
* | io_uring/net: save address for sendzc async executionPavel Begunkov2022-08-253-8/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | We usually copy all bits that a request needs from the userspace for async execution, so the userspace can keep them on the stack. However, send zerocopy violates this pattern for addresses and may reloads it e.g. from io-wq. Save the address if any in ->async_data as usual. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com [axboe: fold in incremental fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring: conditional ->async_data allocationPavel Begunkov2022-08-242-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | There are opcodes that need ->async_data only in some cases and allocation it unconditionally may hurt performance. Add an option to opdef to make move the allocation part from the core io_uring to opcode specific code. Note, we can't just set opdef->async_size to zero because there are other helpers that rely on it, e.g. io_alloc_async_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/notif: order notif vs send CQEsPavel Begunkov2022-08-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is no ordering between notification CQEs and completions of the send flushing it, this quite complicates the userspace, especially since we don't flush notification when the send(+flush) request fails, i.e. there will be only one CQE. What we can do is to make sure that notification completions come only after sends. The easiest way to achieve this is to not try to complete a notification inline from io_sendzc() but defer it to task_work, considering that io-wq sendzc is disallowed CQEs will be naturally ordered because task_works will only be executed after we're done with submission and so inline completion. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cddfd1c2bf91f22b9fe08e13b7dffdd8f858a151.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/net: fix indentationPavel Begunkov2022-08-241-1/+1
| | | | | | | | | | | | | | | | Fix up indentation before we get complaints from tooling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/net: fix zc send link failingPavel Begunkov2022-08-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Failed requests should be marked with req_set_fail(), so links and cqe skipping work correctly, which is missing in io_sendzc(). Note, io_sendzc() return IOU_OK on failure, so the core code won't do the cleanup for us. Fixes: 06a5464be84e4 ("io_uring: wire send zc request type") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/net: fix must_hold annotationPavel Begunkov2022-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | Fix up the io_alloc_notif()'s __must_hold as we don't have a ctx argument there but should get it from the slot instead. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbb0a920f18e0aed590bf58300af817b9befb8a3.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring: fix submission-failure handling for uring-cmdKanchan Joshi2022-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | If ->uring_cmd returned an error value different from -EAGAIN or -EIOCBQUEUED, it gets overridden with IOU_OK. This invites trouble as caller (io_uring core code) handles IOU_OK differently than other error codes. Fix this by returning the actual error code. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring: fix off-by-one in sync cancelation file checkJens Axboe2022-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | The passed in index should be validated against the number of registered files we have, it needs to be smaller than the index value to avoid going one beyond the end. Fixes: 78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()") Reported-by: Luo Likang <luolikang@nsfocus.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/net: use right helpers for async_dataPavel Begunkov2022-08-181-2/+2
| | | | | | | | | | | | | | | | | | | | There is another spot where we check ->async_data directly instead of using req_has_async_data(), which is the way to do it, fix it up. Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/notif: raise limit on notification slotsPavel Begunkov2022-08-151-1/+1
| | | | | | | | | | | | | | | | | | 1024 notification slots is rather an arbitrary value, raise it up, everything is accounted to memcg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eb78a0a5f2fa5941f8e845cdae5fb399bf7ba0be.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/net: improve zc addr import error handlingPavel Begunkov2022-08-151-8/+8
| | | | | | | | | | | | | | | | | | | | We may account memory to a memcg of a request that didn't even got to the network layer. It's not a bug as it'll be routinely cleaned up on flush, but it might be confusing for the userspace. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8aae61f4c3ddc4da97c1da876bb73871f352d50.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | io_uring/net: use right helpers for async recyclePavel Begunkov2022-08-151-1/+1
|/ | | | | | | | | | | We have a helper that checks for whether a request contains anything in ->async_data or not, namely req_has_async_data(). It's better to use it as it might have some extra considerations. Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7414da4e7c3c32c31fc02dfd1355af4ccf4ca5f.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds2022-08-1322-147/+170
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: - Regression fix for this merge window, fixing a wrong order of arguments for io_req_set_res() for passthru (Dylan) - Fix for the audit code leaking context memory (Peilin) - Ensure that provided buffers are memcg accounted (Pavel) - Correctly handle short zero-copy sends (Pavel) - Sparse warning fixes for the recvmsg multishot command (Dylan) - Error handling fix for passthru (Anuj) - Remove randomization of struct kiocb fields, to avoid it growing in size if re-arranged in such a fashion that it grows more holes or padding (Keith, Linus) - Small series improving type safety of the sqe fields (Stefan) * tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block: io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields io_uring: make io_kiocb_to_cmd() typesafe fs: don't randomize struct kiocb fields io_uring: consistently make use of io_notif_to_data() io_uring: fix error handling for io_uring_cmd io_uring: fix io_recvmsg_prep_multishot sparse warnings io_uring/net: send retry for zerocopy io_uring: mem-account pbuf buckets audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() io_uring: pass correct parameters to io_req_set_res
| * io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fieldsStefan Metzmacher2022-08-122-3/+19
| | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/ffcaf8dc4778db4af673822df60dbda6efdd3065.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: make io_kiocb_to_cmd() typesafeStefan Metzmacher2022-08-1219-128/+126
| | | | | | | | | | | | | | | | | | We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: consistently make use of io_notif_to_data()Stefan Metzmacher2022-08-111-1/+1
| | | | | | | | | | | | | | | | | | This makes the assignment typesafe. It prepares changing io_kiocb_to_cmd() in the next commit. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/8da6e9d12cf95ad4bc73274406d12bca7aabf72e.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: fix error handling for io_uring_cmdAnuj Gupta2022-08-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 97b388d70b53 ("io_uring: handle completions in the core") moved the error handling from handler to core. But for io_uring_cmd handler we end up completing more than once (both in handler and in core) leading to use_after_free. Change io_uring_cmd handler to avoid calling io_uring_cmd_done in case of error. Fixes: 97b388d70b53 ("io_uring: handle completions in the core") Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220811091459.6929-1-anuj20.g@samsung.com [axboe: fix ret vs req typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: fix io_recvmsg_prep_multishot sparse warningsDylan Yudaken2022-08-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fix casts missing the __user parts. This seemed to only cause errors on the alpha build, or if checked with sparse, but it was definitely an oversight. Reported-by: kernel test robot <lkp@intel.com> Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220805115450.3921352-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: send retry for zerocopyPavel Begunkov2022-08-041-3/+17
| | | | | | | | | | | | | | | | | | | | io_uring handles short sends/recvs for stream sockets when MSG_WAITALL is set, however new zerocopy send is inconsistent in this regard, which might be confusing. Handle short sends. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b876a4838597d9bba4f3215db60d72c33c448ad0.1659622472.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: mem-account pbuf bucketsPavel Begunkov2022-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Potentially, someone may create as many pbuf bucket as there are indexes in an xarray without any other restrictions bounding our memory usage, put memory needed for the buckets under memory accounting. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d34c452e45793e978d26e2606211ec9070d329ea.1659622312.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()Peilin Ye2022-08-042-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently @audit_context is allocated twice for io_uring workers: 1. copy_process() calls audit_alloc(); 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which is effectively audit_alloc()) and overwrites @audit_context, causing: BUG: memory leak unreferenced object 0xffff888144547400 (size 1024): <...> hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8135cfc3>] audit_alloc+0x133/0x210 [<ffffffff81239e63>] copy_process+0xcd3/0x2340 [<ffffffff8123b5f3>] create_io_thread+0x63/0x90 [<ffffffff81686604>] create_io_worker+0xb4/0x230 [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0 [<ffffffff8167663a>] io_queue_iowq+0xba/0x200 [<ffffffff816768b3>] io_queue_async+0x113/0x180 [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0 [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120 [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570 [<ffffffff81272c4e>] task_work_run+0x7e/0xc0 [<ffffffff8125a688>] get_signal+0xc18/0xf10 [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730 [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180 [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20 [<ffffffff844a7e80>] do_syscall_64+0x40/0x80 Then, 3. io_sq_thread() or io_wqe_worker() frees @audit_context using audit_free(); 4. do_exit() eventually calls audit_free() again, which is okay because audit_free() does a NULL check. As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and redundant audit_free() calls. Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Suggested-by: Paul Moore <paul@paul-moore.com> Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: pass correct parameters to io_req_set_resMing Lei2022-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | The two parameters of 'res' and 'cflags' are swapped, so fix it. Without this fix, 'ublk del' hangs forever. Cc: Pavel Begunkov <asml.silence@gmail.com> Fixes: de23077eda61f ("io_uring: set completion results upfront") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220803120757.1668278-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'pull-work.iov_iter-base' of ↵Linus Torvalds2022-08-031-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter updates from Al Viro: "Part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(). new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's the explanation for the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work..." * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: first_iovec_segment(): just return address iov_iter: massage calling conventions for first_{iovec,bvec}_segment() iov_iter: first_{iovec,bvec}_segment() - simplify a bit iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT iov_iter_bvec_advance(): don't bother with bvec_iter copy_page_{to,from}_iter(): switch iovec variants to generic keep iocb_flags() result cached in struct file iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC struct file: use anonymous union member for rcuhead and llist btrfs: use IOMAP_DIO_NOSYNC teach iomap_dio_rw() to suppress dsync No need of likely/unlikely on calls of check_copy_size()
* Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of ↵Linus Torvalds2022-08-0211-92/+601
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.dk/linux-block Pull io_uring zerocopy support from Jens Axboe: "This adds support for efficient support for zerocopy sends through io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and UDP. The core network changes to support this is in a stable branch from Jakub that both io_uring and net-next has pulled in, and the io_uring changes are layered on top of that. All of the work has been done by Pavel" * tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits) io_uring: notification completion optimisation io_uring: export req alloc from core io_uring/net: use unsigned for flags io_uring/net: make page accounting more consistent io_uring/net: checks errors of zc mem accounting io_uring/net: improve io_get_notif_slot types selftests/io_uring: test zerocopy send io_uring: enable managed frags with register buffers io_uring: add zc notification flush requests io_uring: rename IORING_OP_FILES_UPDATE io_uring: flush notifiers after sendzc io_uring: sendzc with fixed buffers io_uring: allow to pass addr into sendzc io_uring: account locked pages for non-fixed zc io_uring: wire send zc request type io_uring: add notification slot registration io_uring: add rsrc referencing for notifiers io_uring: complete notifiers in tw io_uring: cache struct io_notif io_uring: add zc notification infrastructure ...
| * io_uring: notification completion optimisationPavel Begunkov2022-07-274-141/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to use all optimisations that we have for io_uring requests like completion batching, memory caching and more but for zc notifications. Fortunately, notification perfectly fit the request model so we can overlay them onto struct io_kiocb and use all the infratructure. Most of the fields of struct io_notif natively fits into io_kiocb, so we replace struct io_notif with struct io_kiocb carrying struct io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand coded caching. __io_notif_complete_tw() is converted to use io_uring's tw infra. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: export req alloc from corePavel Begunkov2022-07-272-21/+22
| | | | | | | | | | | | | | | | | | We want to do request allocation out of the core io_uring code, make the allocation functions public for other io_uring parts. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0314fedd3a02a514210ba42d4720332538c65956.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: use unsigned for flagsPavel Begunkov2022-07-251-2/+2
| | | | | | | | | | | | | | | | Use unsigned int type for msg flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5cfaed13d3191337b14b8664ca68b515d9e2d1b4.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring/net: make page accounting more consistentPavel Begunkov2022-07-255-14/+37
| | | | | | | | | | | | | | | | | | Make network page accounting more consistent with how buffer registration is working, i.e. account all memory to ctx->user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4aacfe64bbb81b27f9ecf5d5c219c69a07e5aa56.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>