summaryrefslogtreecommitdiffstats
path: root/io_uring/poll.c
Commit message (Collapse)AuthorAgeFilesLines
* io_uring: disallow self-propelled ring pollingPavel Begunkov2022-11-181-0/+2
| | | | | | | | | | | | | | | | | | When we post a CQE we wake all ring pollers as it normally should be. However, if a CQE was generated by a multishot poll request targeting its own ring, it'll wake that request up, which will make it to post a new CQE, which will wake the request and so on until it exhausts all CQ entries. Don't allow multishot polling io_uring files but downgrade them to oneshots, which was always stated as a correct behaviour that the userspace should check for. Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix tw losing poll eventsPavel Begunkov2022-11-171-0/+7
| | | | | | | | | | | | We may never try to process a poll wake and its mask if there was multiple wake ups racing for queueing up a tw. Force io_poll_check_events() to update the mask by vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: update res mask in io_poll_check_eventsPavel Begunkov2022-11-171-0/+3
| | | | | | | | | | | | | | | | When io_poll_check_events() collides with someone attempting to queue a task work, it'll spin for one more time. However, it'll continue to use the mask from the first iteration instead of updating it. For example, if the first wake up was a EPOLLIN and the second EPOLLOUT, the userspace will not get EPOLLOUT in time. Clear the mask for all subsequent iterations to force vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/poll: lockdep annote io_poll_req_insert_lockedPavel Begunkov2022-11-111-0/+2
| | | | | | | | Add a lockdep annotation in io_poll_req_insert_locked(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8115d8e702733754d0aea119e9b5bb63d1eb8b24.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/poll: fix double poll req->flags racesPavel Begunkov2022-11-111-12/+17
| | | | | | | | | | | | | | | | | | | | | | | io_poll_double_prepare() | io_poll_wake() | poll->head = NULL smp_load(&poll->head); /* NULL */ | flags = req->flags; | | req->flags &= ~SINGLE_POLL; req->flags = flags | DOUBLE_POLL | The idea behind io_poll_double_prepare() is to serialise with the first poll entry by taking the wq lock. However, it's not safe to assume that io_poll_wake() is not running when we can't grab the lock and so we may race modifying req->flags. Skip double poll setup if that happens. It's ok because the first poll entry will only be removed when it's definitely completing, e.g. pollfree or oneshot with a valid mask. Fixes: 49f1c68e048f1 ("io_uring: optimise submission side poll_refs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7fab2d502f6121a7d7b199fe4d914a43ca9cdfd.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/poll: disable level triggered pollJens Axboe2022-09-281-1/+1
| | | | | | | | | | | Stefan reports that there are issues with the level triggered notification. Since we're late in the cycle, and it was introduced for the 6.0 release, just disable it at prep time and we can bring this back when Samba is happy with it. Reported-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: make io_kiocb_to_cmd() typesafeStefan Metzmacher2022-08-121-8/+8
| | | | | | | | | We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add abstraction around apoll cacheJens Axboe2022-07-241-13/+5
| | | | | | | In preparation for adding limits, and one more user, abstract out the core bits of the allocation+free cache. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: move apoll cache to poll.cJens Axboe2022-07-241-0/+12
| | | | | | This is where it's used, move the flush handler in there. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: consolidate hash_locked io-wq handlingPavel Begunkov2022-07-241-5/+7
| | | | | | | | | Don't duplicate code disabling REQ_F_HASH_LOCKED for IO_URING_F_UNLOCKED (i.e. io-wq), move the handling into __io_arm_poll_handler(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ff0ffdfaa65b3d536131535c3dad3c63d9b7bb0.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: clear REQ_F_HASH_LOCKED on hash removalPavel Begunkov2022-07-241-5/+2
| | | | | | | | | Instead of clearing REQ_F_HASH_LOCKED while arming a poll, unset the bit when we're removing the entry from the table in io_poll_tw_hash_eject(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/02e48bb88d6f1480c94ac2924c43ad1fbd48e92a.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: don't race double poll setting REQ_F_ASYNC_DATAPavel Begunkov2022-07-241-3/+3
| | | | | | | | | | | Just as with io_poll_double_prepare() setting REQ_F_DOUBLE_POLL, we can race with the first poll entry when setting REQ_F_ASYNC_DATA. Move it under io_poll_double_prepare(). Fixes: a18427bb2d9b ("io_uring: optimise submission side poll_refs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df6920f509c11115aa2bce8b34dc5fdb0eb98920.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: don't miss setting REQ_F_DOUBLE_POLLPavel Begunkov2022-07-241-8/+10
| | | | | | | | | | | | When adding a second poll entry we should set REQ_F_DOUBLE_POLL unconditionally. We might race with the first entry removal but that doesn't change the rule. Fixes: a18427bb2d9b ("io_uring: optimise submission side poll_refs") Reported-and-tested-by: syzbot+49950ba66096b1f0209b@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b680d83ded07424db83e8745585e7a6d72826ef.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix multishot poll on overflowDylan Yudaken2022-07-241-2/+4
| | | | | | | | | | | | | | | | | | | On overflow, multishot poll can still complete with the IORING_CQE_F_MORE flag set. If in the meantime the user clears a CQE and a the poll was cancelled then the poll will post a CQE without the IORING_CQE_F_MORE (and likely result -ECANCELED). However when processing the application will encounter the non-overflow CQE which indicates that there will be no more events posted. Typical userspace applications would free memory associated with the poll in this case. It will then subsequently receive the earlier CQE which has overflowed, which breaks the contract given by the IORING_CQE_F_MORE flag. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-9-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add allow_overflow to io_post_aux_cqeDylan Yudaken2022-07-241-1/+1
| | | | | | | | | | | Some use cases of io_post_aux_cqe would not want to overflow as is, but might want to change the flags/result. For example multishot receive requires in order CQE, and so if there is an overflow it would need to stop receiving until the overflow is taken care of. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add IOU_STOP_MULTISHOT return codeDylan Yudaken2022-07-241-2/+9
| | | | | | | | | | | | | | | For multishot we want a way to signal the caller that multishot has ended but also this might not be an error return. For example sockets return 0 when closed, which should end a multishot recv, but still have a CQE with result 0 Introduce IOU_STOP_MULTISHOT which does this and indicates that the return code is stored inside req->cqe Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-7-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: clean up io_poll_check_events return valuesDylan Yudaken2022-07-241-11/+16
| | | | | | | | | The values returned are a bit confusing, where 0 and 1 have implied meaning, so add some definitions for them. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-6-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: move POLLFREE handling to separate functionJens Axboe2022-07-241-23/+27
| | | | | | | | We really don't care about this at all in terms of performance. Outside of having it already be marked unlikely(), shove it into a separate __cold function. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: optimise submission side poll_refsPavel Begunkov2022-07-241-21/+67
| | | | | | | | | | | | | | | | | | The final poll_refs put in __io_arm_poll_handler() takes quite some cycles. When we're arming from the original task context task_work won't be run, so in this case we can assume that we won't race with task_works and so not take the initial ownership ref. One caveat is that after arming a poll we may race with it, so we have to add a bunch of io_poll_get_ownership() hidden inside of io_poll_can_finish_inline() whenever we want to complete arming inline. For the same reason we can't just set REQ_F_DOUBLE_POLL in __io_queue_proc() and so need to sync with the first poll entry by taking its wq head lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8825315d7f5e182ac1578a031e546f79b1c97d01.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: refactor poll arm error handlingPavel Begunkov2022-07-241-23/+21
| | | | | | | | | | | | | | | | __io_arm_poll_handler() errors parsing is a horror, in case it failed it returns 0 and the caller is expected to look at ipt.error, which already led us to a number of problems before. When it returns a valid mask, leave it as it's not, i.e. return 1 and store the mask in ipt.result_mask. In case of a failure that can be handled inline return an error code (negative value), and return 0 if __io_arm_poll_handler() took ownership of the request and will complete it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/018cacdaef5fe95d7dc56b32e85d752cab7607f6.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: change arm poll return valuesPavel Begunkov2022-07-241-2/+5
| | | | | | | | | | The rules for __io_arm_poll_handler()'s result parsing are complicated, as the first step don't pass return a mask but pass back a positive return code and fill ipt->result_mask. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/529e29e9f97f2e6e383ccd44234d8b576a83a921.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add a helper for apoll allocPavel Begunkov2022-07-241-16/+28
| | | | | | | | | Extract a helper function for apoll allocation, makes the code easier to read. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2f93282b47dd678e805dd0d7097f66968ced495c.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: remove events caching atavismsPavel Begunkov2022-07-241-10/+8
| | | | | | | | | Remove events argument from *io_poll_execute(), it's not needed and not used. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/12efd4e15c6a90cf9e5b59807cfcb57852b51dc7.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: clean poll ->private flaggingPavel Begunkov2022-07-241-3/+17
| | | | | | | | | | We store a req pointer in wqe->private but also take one bit to mark double poll entries. Replace macro helpers with inline functions for better type checking and also name the double flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9a61240555c64ac0b7a9b0eb59a9efeb638a35a4.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add a warn_once for poll_findPavel Begunkov2022-07-241-0/+5
| | | | | | | | | | | io_poll_remove() expects poll_find() to search only for poll requests and passes a flag for this. Just be a little bit extra cautious considering lots of recent poll/cancellation changes and add a WARN_ON_ONCE checking that we don't get an apoll'ed request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec9a66f1e22f99dcd02288d4e42f3cc6bb357804.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix io_poll_remove_all clang warningsPavel Begunkov2022-07-241-2/+5
| | | | | | | | | | | clang complains on bitwise operations with bools, add a bit more verbosity to better show that we want to call io_poll_remove_all_table() twice but with different arguments. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f11d21dcdf9233e0eeb15fa13b858a05a78eb310.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: clean up tracing eventsPavel Begunkov2022-07-241-3/+2
| | | | | | | | | | | | | We have lots of trace events accepting an io_uring request and wanting to print some of its fields like user_data, opcode, flags and so on. However, as trace points were unaware of io_uring structures, we had to pass all the fields as arguments. Teach trace/events/io_uring.h about struct io_kiocb and stop the misery of passing a horde of arguments to trace helpers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40ff72f92798114e56d400f2b003beb6cde6ef53.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: kill extra io_uring_types.h includesPavel Begunkov2022-07-241-1/+0
| | | | | | | | | | io_uring/io_uring.h already includes io_uring_types.h, no need to include it every time. Kill it in a bunch of places, it prepares us for following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: don't expose io_fill_cqe_aux()Pavel Begunkov2022-07-241-16/+8
| | | | | | | | | Deduplicate some code and add a helper for filling an aux CQE, locking and notification. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: mutex locked poll hashingPavel Begunkov2022-07-241-21/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently we do two extra spin lock/unlock pairs to add a poll/apoll request to the cancellation hash table and remove it from there. On the submission side we often already hold ->uring_lock and tw completion is likely to hold it as well. Add a second cancellation hash table protected by ->uring_lock. In concerns for latency because of a need to have the mutex locked on the completion side, use the new table only in following cases: 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there is little to no contention and so the main tw hander will almost always end up grabbing it before calling callbacks. 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is a major user of ->uring_lock. 3) apoll: we normally grab the lock on the completion side anyway to execute the request, so it's free. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1bbad9c78c454b7b92f100bbf46730a37df7194f.1655371007.git.asml.silence@gmail.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: propagate locking state to poll cancelPavel Begunkov2022-07-241-1/+2
| | | | | | | | | | Poll cancellation will be soon need to grab ->uring_lock inside, pass the locking state, i.e. issue_flags, inside the cancellation functions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b86781d047727c07163443b57551a3fa57c7c5e1.1655371007.git.asml.silence@gmail.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: introduce a struct for hash tablePavel Begunkov2022-07-241-18/+22
| | | | | | | | | | Instead of passing around a pointer to hash buckets, add a bit of type safety and wrap it into a structure. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d65bc3faba537ec2aca9eabf334394936d44bd28.1655371007.git.asml.silence@gmail.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: pass hash table into poll_findPavel Begunkov2022-07-241-6/+14
| | | | | | | | | | In preparation for having multiple cancellation hash tables, pass a table pointer into io_poll_find() and other poll cancel functions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a31c88502463dce09254240fa037352927d7ecc3.1655371007.git.asml.silence@gmail.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: use state completion infra for poll reqsPavel Begunkov2022-07-241-6/+2
| | | | | | | | | | Use io_req_task_complete() for poll request completions, so it can utilise state completions and save lots of unnecessary locking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ced94cb5a728d8e386c640d052fd3da3f5d6891a.1655371007.git.asml.silence@gmail.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: pass poll_find lock backPavel Begunkov2022-07-241-20/+26
| | | | | | | | | | | Instead of using implicit knowledge of what is locked or not after io_poll_find() and co returns, pass back a pointer to the locked bucket if any. If set the user must to unlock the spinlock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: switch cancel_hash to use per entry spinlockHao Xu2022-07-241-31/+49
| | | | | | | | | | | | Add a new io_hash_bucket structure so that each bucket in cancel_hash has separate spinlock. Use per entry lock for cancel_hash, this removes some completion lock invocation and remove contension between different cancel_hash entries. Signed-off-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: poll: remove unnecessary req->ref setHao Xu2022-07-241-1/+0
| | | | | | | | | | We now don't need to set req->refcount for poll requests since the reworked poll code ensures no request release race. Signed-off-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec6fee45705890bdb968b0c175519242753c0215.1655371007.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add support for level triggered pollJens Axboe2022-07-241-5/+10
| | | | | | | | | | | By default, the POLL_ADD command does edge triggered poll - if we get a non-zero mask on the initial poll attempt, we complete the request successfully. Support level triggered by always waiting for a notification, regardless of whether or not the initial mask matches the file state. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: split provided buffers handling into its own fileJens Axboe2022-07-241-0/+1
| | | | | | | Move both the opcodes related to it, and the internals code dealing with it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: move poll handling into its own fileJens Axboe2022-07-241-0/+760
Add a io_poll_issue() rather than export the general task_work locking and io_issue_sqe(), and put the io_op_defs definition and structure into a separate header file so that poll can use it. Signed-off-by: Jens Axboe <axboe@kernel.dk>