linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	io_uring: add support for futex wake and wait	Jens Axboe	2023-09-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for FUTEX_WAKE/WAIT primitives. IORING_OP_FUTEX_WAKE is mix of FUTEX_WAKE and FUTEX_WAKE_BITSET, as it does support passing in a bitset. Similary, IORING_OP_FUTEX_WAIT is a mix of FUTEX_WAIT and FUTEX_WAIT_BITSET. For both of them, they are using the futex2 interface. FUTEX_WAKE is straight forward, as those can always be done directly from the io_uring submission without needing async handling. For FUTEX_WAIT, things are a bit more complicated. If the futex isn't ready, then we rely on a callback via futex_queue->wake() when someone wakes up the futex. From that calback, we queue up task_work with the original task, which will post a CQE and wake it, if necessary. Cancelations are supported, both from the application point-of-view, but also to be able to cancel pending waits if the ring exits before all events have occurred. The return value of futex_unqueue() is used to gate who wins the potential race between cancelation and futex wakeups. Whomever gets a 'ret == 1' return from that claims ownership of the io_uring futex request. This is just the barebones wait/wake support. PI or REQUEUE support is not added at this point, unclear if we might look into that later. Likewise, explicit timeouts are not supported either. It is expected that users that need timeouts would do so via the usual io_uring mechanism to do that using linked timeouts. The SQE format is as follows: `addr` Address of futex `fd` futex2(2) FUTEX2_* flags `futex_flags` io_uring specific command flags. None valid now. `addr2` Value of futex `addr3` Mask to wake/wait Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: add IORING_OP_WAITID support	Jens Axboe	2023-09-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for an async version of waitid(2), in a fully async version. If an event isn't immediately available, wait for a callback to trigger a retry. The format of the sqe is as follows: sqe->len The 'which', the idtype being queried/waited for. sqe->fd The 'pid' (or id) being waited for. sqe->file_index The 'options' being set. sqe->addr2 A pointer to siginfo_t, if any, being filled in. buf_index, add3, and waitid_flags are reserved/unused for now. waitid_flags will be used for options for this request type. One interesting use case may be to add multi-shot support, so that the request stays armed and posts a notification every time a monitored process state change occurs. Note that this does not support rusage, on Arnd's recommendation. See the waitid(2) man page for details on the arguments. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: add zc notification infrastructure	Pavel Begunkov	2022-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move opcode table to opdef.c	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \| \|	We already have the declarations in opdef.h, move the rest into its own file rather than in the main io_uring.c file. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move read/write related opcodes to its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move rsrc related data, core, and commands	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split provided buffers handling into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \| \|	Move both the opcodes related to it, and the internals code dealing with it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move cancelation into its own file	Jens Axboe	2022-07-24	1	-1/+2
\| \| \| \| \| \| \|	This also helps cleanup the io_uring.h cancel parts, as we can make things static in the cancel.c file, mostly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move poll handling into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \| \| \|	Add a io_poll_issue() rather than export the general task_work locking and io_issue_sqe(), and put the io_op_defs definition and structure into a separate header file so that poll can use it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move io_uring_task (tctx) helpers into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move fdinfo helpers to its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \| \|	This also means moving a bit more of the fixed file handling to the filetable side, which makes sense separately too. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move SQPOLL related handling into its own file	Jens Axboe	2022-07-24	1	-1/+2
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move timeout opcodes and handling into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move msg_ring into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split network related opcodes into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \| \|	While at it, convert the handlers to just use io_eopnotsupp_prep() if CONFIG_NET isn't set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move statx handling to its own file	Jens Axboe	2022-07-24	1	-1/+2
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move epoll handler to its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \| \|	Would be nice to sort out Kconfig for this and don't even compile epoll.c if we don't have epoll configured. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move uring_cmd handling to its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split out open/close operations	Jens Axboe	2022-07-24	1	-1/+2
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: separate out file table handling code	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split out fadvise/madvise operations	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split out fs related sync/fallocate functions	Jens Axboe	2022-07-24	1	-1/+2
\| \| \| \| \| \|	This splits out sync_file_range, fsync, and fallocate. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split out splice related operations	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \|	This splits out splice and tee support. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: split out filesystem related operations	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \| \| \|	This splits out renameat, unlinkat, mkdirat, symlinkat, and linkat. Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move nop into its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move xattr related opcodes to its own file	Jens Axboe	2022-07-24	1	-1/+1
\| \| \| \|	Signed-off-by: Jens Axboe <axboe@kernel.dk>
*	io_uring: move to separate directory	Jens Axboe	2022-07-24	1	-0/+6
	In preparation for splitting io_uring up a bit, move it into its own top level directory. It didn't really belong in fs/ anyway, as it's not a file system only API. This adds io_uring/ and moves the core files in there, and updates the MAINTAINERS file for the new location. Signed-off-by: Jens Axboe <axboe@kernel.dk>