Linux - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	io_uring: fix 'sync' handling of io_fallback_tw()	Jens Axboe	3 days	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \|	A previous commit added a 'sync' parameter to io_fallback_tw(), which if true, means the caller wants to wait on the fallback thread handling it. But the logic is somewhat messed up, ensure that ctxs are swapped and flushed appropriately. Cc: stable@vger.kernel.org Fixes: dfbe5561ae93 ("io_uring: flush offloaded and delayed task_work on exit") Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: don't duplicate flushing in io_req_post_cqe	Pavel Begunkov	3 days	1	-3/+8
\| \| \| \| \| \| \| \| \| \|	io_req_post_cqe() sets submit_state.cq_flush so that *flush_completions() can take care of batch commiting CQEs. Don't commit it twice by using __io_cq_unlock_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/41c416660c509cee676b6cad96081274bcb459f3.1745493861.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/zcrx: fix late dma unmap for a dead dev	Pavel Begunkov	9 days	2	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a problem with page pools not dma-unmapping immediately when the device is going down, and delaying it until the page pool is destroyed, which is not allowed (see links). That just got fixed for normal page pools, and we need to address memory providers as well. Unmap pages in the memory provider uninstall callback, and protect it with a new lock. There is also a gap between when a dma mapping is created and the mp is installed, so if the device is killed in between, io_uring would be holding on to dma mappings to a dead device with no one to call ->uninstall. Move it to page pool init and rely on ->is_mapped to make sure it's only done once. Link: https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ Link: https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/ Fixes: 34a3e60821ab9 ("io_uring/zcrx: implement zerocopy receive pp memory provider") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef9b7db249b14f6e0b570a1bb77ff177389f881c.1744965853.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/rsrc: ensure segments counts are correct on kbuf buffers	Jens Axboe	10 days	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \|	kbuf imports have the front offset adjusted and segments removed, but the tail segments are still included in the segment count that gets passed in the iov_iter. As the segments aren't necessarily all the same size, move importing to a separate helper and iterate the mapped length to get an exact count. Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/rsrc: send exact nr_segs for fixed buffer	Nitesh Shetty	10 days	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sending exact nr_segs, avoids bio split check and processing in block layer, which takes around 5%[1] of overall CPU utilization. In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2] and 5% less CPU utilization. [1] 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split [2] sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1 Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> [Pavel: fixed for kbuf, rebased and reworked on top of cleanups] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7a1a49a8d053bd617c244291d63dbfbc07afde36.1744882081.git.asml.silence@gmail.com [axboe: fold in fix factoring in buf reg offset] Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/rsrc: refactor io_import_fixed	Pavel Begunkov	10 days	1	-17/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	io_import_fixed is a mess. Even though we know the final len of the iterator, we still assign offset + len and do some magic after to correct for that. Do offset calculation first and finalise it with iov_iter_bvec at the end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2d5107fed24f8b23245ef2ede9a5a7f7c426df61.1744882081.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/rsrc: separate kbuf offset adjustments	Pavel Begunkov	10 days	1	-12/+7
\| \| \| \| \| \| \| \| \| \|	Kernel registered buffers are special because segments are not uniform in size, and we have a bunch of optimisations based on that uniformity for normal buffers. Handle kbuf separately, it'll be cleaner this way. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4e9e5990b0ab5aee723c0be5cd9b5bcf810375f9.1744882081.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/rsrc: don't skip offset calculation	Pavel Begunkov	10 days	1	-38/+37
\| \| \| \| \| \| \| \| \| \| \|	Don't optimise for requests with offset=0. Large registered buffers are the preference and hence the user is likely to pass an offset, and the adjustments are not expensive and will be made even cheaper in following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1c2beb20470ee3c886a363d4d8340d3790db19f3.1744882081.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/zcrx: add pp to ifq conversion helper	Pavel Begunkov	12 days	1	-4/+9
\| \| \| \| \| \| \| \| \| \|	It'll likely change how page pools store memory providers, so in preparation for that, keep accesses in one place in io_uring by introducing a helper. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3522eb8fa9b4e21bcf32e7e9ae656c616b282210.1744722526.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/zcrx: return ifq id to the user	Pavel Begunkov	12 days	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	IORING_OP_RECV_ZC requests take a zcrx object id via sqe::zcrx_ifq_idx, which binds it to the corresponding if / queue. However, we don't return that id back to the user. It's fine as currently there can be only one zcrx and the user assumes that its id should be 0, but as we'll need multiple zcrx objects in the future let's explicitly pass it back on registration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8714667d370651962f7d1a169032e5f02682a73e.1744722517.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/kbuf: reject zero sized provided buffers	Jens Axboe	2025-04-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This isn't fixing a real issue, but there's also zero point in going through group and buffer setup, when the buffers are going to be rejected once attempted to get used. Cc: stable@vger.kernel.org Reported-by: syzbot+58928048fd1416f1457c@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/zcrx: separate niov number from pages	Pavel Begunkov	2025-04-07	2	-9/+11
\| \| \| \| \| \| \| \| \| \|	A preparation patch that separates the number of pages / folios from the number of niovs. They will not match in the future to support huge pages, improved dma mapping and/or larger chunk sizes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0780ac966ee84200385737f45bb0f2ada052392b.1743848231.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/zcrx: put refill data into separate cache line	Pavel Begunkov	2025-04-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refill queue lock and other bits are only used from the allocation path on the rx softirq side, but it shares the cache line with other fields like ctx that are used also in the "syscall" path, which causes cache bouncing when softirq runs on a different CPU. Separate them into different cache lines. The first one now contains constant fields used by both contextx, followed by a line responsible for refill queue data. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6d1f598e27d623c07fc49d6baee13089a9b1216c.1743848241.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: don't post tag CQEs on file/buffer registration failure	Pavel Begunkov	2025-04-04	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Buffer / file table registration is all or nothing, if it fails all resources we might have partially registered are dropped and the table is killed. If that happens, it doesn't make sense to post any rsrc tag CQEs. That would be confusing to the application, which should not need to handle that case. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 7029acd8a9503 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") Link: https://lore.kernel.org/r/c514446a8dcb0197cddd5d4ba8f6511da081cf1f.1743777957.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: always do atomic put from iowq	Pavel Begunkov	2025-04-03	2	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	io_uring always switches requests to atomic refcounting for iowq execution before there is any parallilism by setting REQ_F_REFCOUNT, and the flag is not cleared until the request completes. That should be fine as long as the compiler doesn't make up a non existing value for the flags, however KCSAN still complains when the request owner changes oter flag bits: BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work ... read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0: req_ref_put_and_test io_uring/refs.h:22 [inline] Skip REQ_F_REFCOUNT checks for iowq, we know it's set. Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: support vectored kernel fixed buffer	Ming Lei	2025-04-02	1	-3/+88
\| \| \| \| \| \| \| \| \| \| \| \|	io_uring has supported fixed kernel buffer via io_buffer_register_bvec() and io_buffer_unregister_bvec(). The vectored fixed buffer has been ready, so it is natural to support fixed kernel buffer, one use case is ublk. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250325135155.935398-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: add validate_fixed_range() for validate fixed buffer	Ming Lei	2025-04-02	1	-11/+22
\| \| \| \| \| \| \| \| \| \|	Add helper of validate_fixed_range() for validating fixed buffer range. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250325135155.935398-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0	David Wei	2025-04-01	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When readlen is set for a recvzc request, tcp_read_sock() will call io_zcrx_recv_skb() one final time with len == desc->count == 0. This is caused by the !desc->count check happening too late. The offset + 1 != skb->len happens earlier and causes the while loop to continue. Fix this in io_zcrx_recv_skb() instead of tcp_read_sock(). Return early if len is 0 i.e. the read is done. Fixes: 6699ec9a23f8 ("io_uring/zcrx: add a read limit to recvzc requests") Signed-off-by: David Wei <dw@davidwei.uk> Link: https://lore.kernel.org/r/20250401195355.1613813-1-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: avoid import_ubuf for regvec send	Pavel Begunkov	2025-03-31	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	With registered buffers we set up iterators in helpers like io_import_fixed(), and there is no need for a import_ubuf() before that. It was fine as we used real pointers for offset calculation, but that's not the case anymore since introduction of ublk kernel buffers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b2de1a50844f848f62c8de609b494971033a6b9.1743437358.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/rsrc: check size when importing reg buffer	Pavel Begunkov	2025-03-31	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	We're relying on callers to verify the IO size, do it inside of io_import_fixed() instead. It's safer, easier to deal with, and more consistent as now it's done close to the iter init site. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f9c2c75ec4d356a0c61289073f68d98e8a9db190.1743446271.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: cleanup {g,s]etsockopt sqe reading	Pavel Begunkov	2025-03-31	1	-8/+10
\| \| \| \| \| \| \| \|	Add a local variable for the sqe pointer to avoid repetition. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8dbac0f9acda2d3842534eeb7ce10d9276b021ae.1743357108.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: hide caches sqes from drivers	Pavel Begunkov	2025-03-31	2	-2/+3
\| \| \| \| \| \| \| \| \|	There is now an io_uring private part of cmd async_data, move saved sqe into it. Drivers are accessing it via struct io_uring_cmd::cmd. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ecbe078dd57acefdbc4366d083327086c0879378.1743357121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: make zcrx depend on CONFIG_IO_URING	Pavel Begunkov	2025-03-31	1	-0/+1
\| \| \| \| \| \| \| \| \|	Reflect in the kconfig that zcrx requires io_uring compiled. Fixes: 6f377873cb239 ("io_uring/zcrx: add interface queue and refill queue") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8047135a344e79dbd04ee36a7a69cc260aabc2ca.1743356260.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: add req flag invariant build assertion	Pavel Begunkov	2025-03-31	1	-0/+2
\| \| \| \| \| \| \| \| \|	We're caching some of file related request flags in a tricky way, put a build check to make sure flags don't get reshuffled. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9877577b83c25dd78224a8274f799187e7ec7639.1743407551.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: don't pass ctx to tw add remote helper	Pavel Begunkov	2025-03-28	3	-11/+8
\| \| \| \| \| \| \| \| \| \|	Unlike earlier versions, io_msg_remote_post() creates a valid request with a proper context, so don't pass a context to io_req_task_work_add_remote() explicitly but derive it from the request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/721f51cf34996d98b48f0bfd24ad40aa2730167e.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/msg: initialise msg request opcode	Pavel Begunkov	2025-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	It's risky to have msg request opcode set to garbage, so at least initialise it to nop. Later we might want to add a user inaccessible opcode for such cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9afe650fcb348414a4529d89f52eb8969ba06efd.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/msg: rename io_double_lock_ctx()	Pavel Begunkov	2025-03-28	1	-4/+4
\| \| \| \| \| \| \| \| \|	io_double_lock_ctx() doesn't lock both rings. Rename it to prevent any future confusion. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e5defa000efd9b0f5e169cbb6bad4994d46ec5c.1743190078.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: import zc ubuf earlier	Pavel Begunkov	2025-03-28	1	-28/+16
\| \| \| \| \| \| \| \| \| \| \|	io_send_setup() already sets up the iterator for IORING_OP_SEND_ZC, we don't need repeating that at issue time. Move it all together with mem accounting at prep time, which is more consistent with how the non-zc version does that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eb54f007c493ad9f4ca89aa8e715baf30d83fb88.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: set sg_from_iter in advance	Pavel Begunkov	2025-03-28	1	-10/+15
\| \| \| \| \| \| \| \| \|	In preparation to the next patch, set ->sg_from_iter callback at request prep time. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5fe2972701df3bacdb3d760bce195fa640bee201.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: clusterise send vs msghdr branches	Pavel Begunkov	2025-03-28	1	-11/+4
\| \| \| \| \| \| \| \| \|	We have multiple branches at prep for send vs sendmsg handling, put them together so that the variant handling is more localised. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/33abf666d9ded74cba4da2f0d9fe58e88520dffe.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: unify sendmsg setup with zc	Pavel Begunkov	2025-03-28	1	-22/+6
\| \| \| \| \| \| \| \| \| \| \|	io_sendmsg_zc_setup() duplicates parts of io_sendmsg_setup(), and the only difference between them is that the former support vectored registered buffers with nothing zerocopy specific. Merge them together, we want regular sendmsg to eventually support fixed buffers either way. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e5ec40f9dc93355399dc6fa0cbc8b31f0b20ac5.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: combine sendzc flags writes	Pavel Begunkov	2025-03-28	1	-2/+1
\| \| \| \| \| \| \| \| \|	Save an instruction / trip to the cache and assign some of sendzc flags together. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c519d6f406776c3be3ef988a8339a88e45d1ffd9.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: open code io_net_vec_assign()	Pavel Begunkov	2025-03-28	1	-11/+5
\| \| \| \| \| \| \| \|	Get rid of io_net_vec_assign() by open coding it into its only caller. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19191c34b5cfe1161f7eeefa6e785418ea9ad56d.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: open code io_sendmsg_copy_hdr()	Pavel Begunkov	2025-03-28	1	-20/+10
\| \| \| \| \| \| \| \| \|	io_sendmsg_setup() is trivial and io_sendmsg_copy_hdr() doesn't add any good abstraction, open code one into another. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/565318ce585665e88053663eeee5178d2c15692f.1743202294.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring/net: account memory for zc sendmsg	Pavel Begunkov	2025-03-28	1	-1/+11
\| \| \| \| \| \| \| \| \| \|	Account pinned pages for IORING_OP_SENDMSG_ZC, just as we for IORING_OP_SEND_ZC and net/ does for MSG_ZEROCOPY. Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4f00f67ca6ac8e8ed62343ae92b5816b1e0c9c4b.1743086313.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	Merge tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux	Linus Torvalds	2025-03-28	13	-210/+534
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull more io_uring updates from Jens Axboe: "Final separate updates for io_uring. This started out as a series of cleanups improvements and improvements for registered buffers, but as the last series of the io_uring changes for 6.15, it also collected a few fixes for the other branches on top: - Add support for vectored fixed/registered buffers. Previously only single segments have been supported for commands, now vectored variants are supported as well. This series includes networking and file read/write support. - Small series unifying return codes across multi and single shot. - Small series cleaning up registerd buffer importing. - Adding support for vectored registered buffers for uring_cmd. - Fix for io-wq handling of command reissue. - Various little fixes and tweaks" * tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux: (25 commits) io_uring/net: fix io_req_post_cqe abuse by send bundle io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc io_uring: move min_events sanitisation io_uring: rename "min" arg in io_iopoll_check() io_uring: open code __io_post_aux_cqe() io_uring: defer iowq cqe overflow via task_work io_uring: fix retry handling off iowq io_uring/net: only import send_zc buffer once io_uring/cmd: introduce io_uring_cmd_import_fixed_vec io_uring/cmd: add iovec cache for commands io_uring/cmd: don't expose entire cmd async data io_uring: rename the data cmd cache io_uring: rely on io_prep_reg_vec for iovec placement io_uring: introduce io_prep_reg_iovec() io_uring: unify STOP_MULTISHOT with IOU_OK io_uring: return -EAGAIN to continue multishot io_uring: cap cached iovec/bvec size io_uring/net: implement vectored reg bufs for zctx io_uring/net: convert to struct iou_vec io_uring/net: pull vec alloc out of msghdr import ...
\| *	io_uring/net: fix io_req_post_cqe abuse by send bundle	Pavel Begunkov	2025-03-27	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0 [ 114.991597][ T5313] RIP: 0010:io_req_post_cqe+0x12e/0x4f0 [ 115.001880][ T5313] Call Trace: [ 115.002222][ T5313] <TASK> [ 115.007813][ T5313] io_send+0x4fe/0x10f0 [ 115.009317][ T5313] io_issue_sqe+0x1a6/0x1740 [ 115.012094][ T5313] io_wq_submit_work+0x38b/0xed0 [ 115.013223][ T5313] io_worker_handle_work+0x62a/0x1600 [ 115.013876][ T5313] io_wq_worker+0x34f/0xdf0 As the comment states, io_req_post_cqe() should only be used by multishot requests, i.e. REQ_F_APOLL_MULTISHOT, which bundled sends are not. Add a flag signifying whether a request wants to post multiple CQEs. Eventually REQ_F_APOLL_MULTISHOT should imply the new flag, but that's left out for simplicity. Cc: stable@vger.kernel.org Fixes: a05d1f625c7aa ("io_uring/net: support bundles for send") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b611dbb54d1cd47a88681f5d38c84d0c02bc563.1743067183.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc	Caleb Sander Mateos	2025-03-26	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of a bool field in struct io_sr_msg, use REQ_F_IMPORT_BUFFER to track whether io_send_zc() has already imported the buffer. This flag already serves a similar purpose for sendmsg_zc and {read,write}v_fixed. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20250325143943.1226467-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: move min_events sanitisation	Pavel Begunkov	2025-03-25	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	iopoll and normal waiting already duplicate min_completion truncation, so move them inside the corresponding routines. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/254adb289cc04638f25d746a7499260fa89a179e.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: rename "min" arg in io_iopoll_check()	Pavel Begunkov	2025-03-25	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't name arguments "min", it shadows the namesake function. min_events is also more consistent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f52ce9d88d3bca5732a218b0da14924aa6968909.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: open code __io_post_aux_cqe()	Pavel Begunkov	2025-03-25	1	-12/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no reason to keep __io_post_aux_cqe() separately from io_post_aux_cqe(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2c4c1f68d694deea25a212fc09bbb11f330cd82e.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: defer iowq cqe overflow via task_work	Pavel Begunkov	2025-03-25	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't handle CQE overflows in io_req_complete_post() and defer it to flush_completions. It cuts some duplication, and I also want to limit the number of places directly overflowing completions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9046410ac27e18f2baa6f7cdb363ec921cbc3b79.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: fix retry handling off iowq	Pavel Begunkov	2025-03-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	io_req_complete_post() doesn't handle reissue and if called with a REQ_F_REISSUE request it might post extra unexpected completions. Fix it by pushing into flush_completion via task work. Fixes: d803d123948fe ("io_uring/rw: handle -EAGAIN retry at IO completion time") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/badb3d7e462881e7edbfcc2be6301090b07dbe53.1742829388.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring/net: only import send_zc buffer once	Caleb Sander Mateos	2025-03-21	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	io_send_zc() guards its call to io_send_zc_import() with if (!done_io) in an attempt to avoid calling it redundantly on the same req. However, if the initial non-blocking issue returns -EAGAIN, done_io will stay 0. This causes the subsequent issue to unnecessarily re-import the buffer. Add an explicit flag "imported" to io_sr_msg to track if its buffer has already been imported. Clear the flag in io_send_zc_prep(). Call io_send_zc_import() and set the flag in io_send_zc() if it is unset. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 54cdcca05abd ("io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr") Link: https://lore.kernel.org/r/20250321184819.3847386-2-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring/cmd: introduce io_uring_cmd_import_fixed_vec	Pavel Begunkov	2025-03-21	2	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	io_uring_cmd_import_fixed_vec() is a cmd helper around vectored registered buffer import functions, which caches the memory under the hood. The lifetime of the vectore and hence the iterator is bound to the request. Furthermore, the user is not allowed to call it multiple times for a single request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/97487a80dec3fb8cf8aeedf1f9026ef6d503fe4b.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring/cmd: add iovec cache for commands	Pavel Begunkov	2025-03-21	4	-3/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add iou_vec to commands and wire caching for it, but don't expose it to users just yet. We need the vec cleared on initial alloc, but since we can't place it at the beginning at the moment, zero the entire async_data. It's cached, and the performance effects only the initial allocation, and it might be not a bad idea since we're exposing those bits to outside drivers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c0f2145b75791bc6106eb4e72add2cf6a2c72a7a.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring/cmd: don't expose entire cmd async data	Pavel Begunkov	2025-03-19	4	-9/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	io_uring needs private bits in cmd's ->async_data, and they should never be exposed to drivers as it'd certainly be abused. Leave struct io_uring_cmd_data for the drivers but wrap it into a structure. It's a prep patch and doesn't do anything useful yet. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20250319061251.21452-3-sidong.yang@furiosa.ai Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: rename the data cmd cache	Pavel Begunkov	2025-03-19	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pick a more descriptive name for the cmd async data cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20250319061251.21452-2-sidong.yang@furiosa.ai Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: rely on io_prep_reg_vec for iovec placement	Pavel Begunkov	2025-03-10	4	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All vectored reg buffer users should use io_import_reg_vec() for iovec imports, since iovec placement is the function's responsibility and callers shouldn't know much about it, drop the offset parameter from io_prep_reg_vec() and calculate it inside. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/08ed87ca4bbc06724373b6ce06f36b703fe60c4e.1741457480.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| *	io_uring: introduce io_prep_reg_iovec()	Pavel Begunkov	2025-03-10	4	-40/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	iovecs that are turned into registered buffers are imported in a special way with an offset, so that later we can do an in place translation. Add a helper function taking care of it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7de2ecb9ed5efc3c5cf320232236966da5ad4ccc.1741457480.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>