summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* aio: sanitize the limit checking in io_submit(2)Al Viro2018-05-291-8/+6
| | | | | | | | | | | | | | | | | | as it is, the logics in native io_submit(2) is "if asked for more than LONG_MAX/sizeof(pointer) iocbs to submit, don't bother with more than LONG_MAX/sizeof(pointer)" (i.e. 512M requests on 32bit and 1E requests on 64bit) while compat io_submit(2) goes with "stop after the first PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so. Which is * inconsistent * *way* too much in native case * possibly too little in compat one and * wrong anyway, since the natural point where we ought to stop bothering is ctx->nr_events Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: fold do_io_submit() into callersAl Viro2018-05-291-54/+45
| | | | | | | | get rid of insane "copy array of 32bit pointers into an array of native ones" glue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: shift copyin of iocb into io_submit_one()Al Viro2018-05-291-24/+22
| | | | | Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio_read_events_ring(): make a bit more readableAl Viro2018-05-291-4/+3
| | | | | | | | | | | | The logics for 'avail' is * not past the tail of cyclic buffer * no more than asked * not past the end of buffer * not past the end of a page Unobfuscate the last part. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the ↵Al Viro2018-05-291-14/+12
| | | | | | | | | same way ... so just make them return 0 when caller does not need to destroy iocb Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: take list removal to (some) callers of aio_complete()Al Viro2018-05-291-17/+21
| | | | | | | We really want iocb out of io_cancel(2) reach before we start tearing it down. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: add missing break for the IOCB_CMD_FDSYNC caseChristoph Hellwig2018-05-281-0/+1
| | | | | | | Looks like this got lost in a merge. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* random: convert to ->poll_maskChristoph Hellwig2018-05-261-13/+16
| | | | | | | | | | | | The big change is that random_read_wait and random_write_wait are merged into a single waitqueue that uses keyed wakeups. Because wait_event_* doesn't know about that this will lead to occassional spurious wakeups in _random_read and add_hwgenerator_randomness, but wait_event_* is designed to handle these and were are not in a a hot path there. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* timerfd: convert to ->poll_maskChristoph Hellwig2018-05-261-11/+11
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* eventfd: switch to ->poll_maskChristoph Hellwig2018-05-261-4/+11
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* pipe: convert to ->poll_maskChristoph Hellwig2018-05-261-9/+13
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* crypto: af_alg: convert to ->poll_maskChristoph Hellwig2018-05-264-16/+8
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/rxrpc: convert to ->poll_maskChristoph Hellwig2018-05-261-7/+3
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/iucv: convert to ->poll_maskChristoph Hellwig2018-05-262-7/+2
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/phonet: convert to ->poll_maskChristoph Hellwig2018-05-261-5/+2
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/nfc: convert to ->poll_maskChristoph Hellwig2018-05-261-6/+3
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/caif: convert to ->poll_maskChristoph Hellwig2018-05-261-8/+4
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/bluetooth: convert to ->poll_maskChristoph Hellwig2018-05-265-9/+6
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/sctp: convert to ->poll_maskChristoph Hellwig2018-05-264-7/+4
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/tipc: convert to ->poll_maskChristoph Hellwig2018-05-261-9/+5
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/vmw_vsock: convert to ->poll_maskChristoph Hellwig2018-05-261-13/+6
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/atm: convert to ->poll_maskChristoph Hellwig2018-05-264-11/+6
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/dccp: convert to ->poll_maskChristoph Hellwig2018-05-264-15/+5
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net: convert datagram_poll users tp ->poll_maskChristoph Hellwig2018-05-2631-59/+52
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/unix: convert to ->poll_maskChristoph Hellwig2018-05-261-19/+11
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net/tcp: convert to ->poll_maskChristoph Hellwig2018-05-264-21/+9
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* net: remove sock_no_pollChristoph Hellwig2018-05-2610-17/+0
| | | | | | | Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need for a stub. Signed-off-by: Christoph Hellwig <hch@lst.de>
* net: add support for ->poll_mask in proto_opsChristoph Hellwig2018-05-262-5/+44
| | | | | | | The socket file operations still implement ->poll until all protocols are switched over. Signed-off-by: Christoph Hellwig <hch@lst.de>
* net: refactor socket_pollChristoph Hellwig2018-05-262-17/+19
| | | | | | | | Factor out two busy poll related helpers for late reuse, and remove a command that isn't very helpful, especially with the __poll_t annotations in place. Signed-off-by: Christoph Hellwig <hch@lst.de>
* aio: try to complete poll iocbs without context switchChristoph Hellwig2018-05-261-3/+17
| | | | | | | | If we can acquire ctx_lock without spinning we can just remove our iocb from the active_reqs list, and thus complete the iocbs from the wakeup context. Signed-off-by: Christoph Hellwig <hch@lst.de>
* aio: implement IOCB_CMD_POLLChristoph Hellwig2018-05-262-5/+135
| | | | | | | | | | | | | | Simple one-shot poll through the io_submit() interface. To poll for a file descriptor the application should submit an iocb of type IOCB_CMD_POLL. It will poll the fd for the events specified in the the first 32 bits of the aio_buf field of the iocb. Unlike poll or epoll without EPOLLONESHOT this interface always works in one shot mode, that is once the iocb is completed, it will have to be resubmitted. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
* aio: simplify cancellationChristoph Hellwig2018-05-261-42/+6
| | | | | | | | | | | | | With the current aio code there is no need for the magic KIOCB_CANCELLED value, as a cancelation just kicks the driver to queue the completion ASAP, with all actual completion handling done in another thread. Given that both the completion path and cancelation take the context lock there is no need for magic cmpxchg loops either. If we remove iocbs from the active list after calling ->ki_cancel (but with ctx_lock still held), we can also rely on the invariant thay anything found on the list has a ->ki_cancel callback and can be cancelled, further simplifing the code. Signed-off-by: Christoph Hellwig <hch@lst.de>
* aio: simplify KIOCB_KEY handlingChristoph Hellwig2018-05-262-9/+7
| | | | | | | | | No need to pass the key field to lookup_iocb to compare it with KIOCB_KEY, as we can do that right after retrieving it from userspace. Also move the KIOCB_KEY definition to aio.c as it is an internal value not used by any other place in the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de>
* fs: introduce new ->get_poll_head and ->poll_mask methodsChristoph Hellwig2018-05-265-7/+50
| | | | | | | | | | | | | | ->get_poll_head returns the waitqueue that the poll operation is going to sleep on. Note that this means we can only use a single waitqueue for the poll, unlike some current drivers that use two waitqueues for different events. But now that we have keyed wakeups and heavily use those for poll there aren't that many good reason left to keep the multiple waitqueues, and if there are any ->poll is still around, the driver just won't support aio poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
* fs: add new vfs_poll and file_can_poll helpersChristoph Hellwig2018-05-269-38/+32
| | | | | | | | | These abstract out calls to the poll method in preparation for changes in how we poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
* fs: update documentation to mention __poll_t and match the codeChristoph Hellwig2018-05-262-2/+2
| | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs: cleanup do_pollfdChristoph Hellwig2018-05-261-25/+23
| | | | | | | | | Use straightline code with failure handling gotos instead of a lot of nested conditionals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
* fs: unexport poll_schedule_timeoutChristoph Hellwig2018-05-262-4/+1
| | | | | | | | No users outside of select.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
* uapi: turn __poll_t sparse checks on by defaultChristoph Hellwig2018-05-261-4/+0
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'fixes' of ↵Christoph Hellwig2018-05-2626-120/+153
|\ | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into aio-base
| * fix io_destroy()/aio_complete() raceAl Viro2018-05-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If io_destroy() gets to cancelling everything that can be cancelled and gets to kiocb_cancel() calling the function driver has left in ->ki_cancel, it becomes vulnerable to a race with IO completion. At that point req is already taken off the list and aio_complete() does *NOT* spin until we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds to kiocb_free(), freing req just it gets passed to ->ki_cancel(). Fix is simple - remove from the list after the call of kiocb_cancel(). All instances of ->ki_cancel() already have to cope with the being called with iocb still on list - that's what happens in io_cancel(2). Cc: stable@kernel.org Fixes: 0460fef2a921 "aio: use cancellation list lazily" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * aio: fix io_destroy(2) vs. lookup_ioctx() raceAl Viro2018-05-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kill_ioctx() used to have an explicit RCU delay between removing the reference from ->ioctx_table and percpu_ref_kill() dropping the refcount. At some point that delay had been removed, on the theory that percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was the wrong kind of RCU delay and it didn't care about rcu_read_lock() used by lookup_ioctx(). As the result, we could get ctx freed right under lookup_ioctx(). Tejun has fixed that in a6d7cff472e ("fs/aio: Add explicit RCU grace period when freeing kioctx"); however, that fix is not enough. Suppose io_destroy() from one thread races with e.g. io_setup() from another; CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2 has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the refcount, getting it to 0 and triggering a call of free_ioctx_users(), which proceeds to drop the secondary refcount and once that reaches zero calls free_ioctx_reqs(). That does INIT_RCU_WORK(&ctx->free_rwork, free_ioctx); queue_rcu_work(system_wq, &ctx->free_rwork); and schedules freeing the whole thing after RCU delay. In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the refcount from 0 to 1 and returned the reference to io_setup(). Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get freed until after percpu_ref_get(). Sure, we'd increment the counter before ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it has grabbed the reference, ctx is *NOT* going away until it gets around to dropping that reference. The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss. It's not costlier than what we currently do in normal case, it's safe to call since freeing *is* delayed and it closes the race window - either lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx() fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see the object in question at all. Cc: stable@kernel.org Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ext2: fix a block leakAl Viro2018-05-211-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks *not* freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93b972 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * nfsd: vfs_mkdir() might succeed leaving dentry negative unhashedAl Viro2018-05-211-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashedAl Viro2018-05-211-0/+10
| | | | | | | | | | | | | | | | | | | | | | That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * unfuck sysfs_mount()Al Viro2018-05-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | new_sb is left uninitialized in case of early failures in kernfs_mount_ns(), and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb is not a solution - IS_ERR(root) is true in some cases when new_sb is true. Make sure new_sb is initialized (and matches the reality) in all cases and fix the condition for dropping kobj reference - we want it done precisely in those situations where the reference has not been transferred into a new super_block instance. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * kernfs: deal with kernfs_fill_super() failuresAl Viro2018-05-211-0/+1
| | | | | | | | | | | | | | make sure that info->node is initialized early, so that kernfs_kill_sb() can list_del() it safely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * cramfs: Fix IS_ENABLED typoJoe Perches2018-05-211-1/+1
| | | | | | | | | | | | | | | | | | There's an extra C here... Fixes: 99c18ce580c6 ("cramfs: direct memory access support") Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * befs_lookup(): use d_splice_alias()Al Viro2018-05-211-12/+5
| | | | | | | | | | | | | | | | RTFS(Documentation/filesystems/nfs/Exporting) if you try to make something exportable. Fixes: ac632f5b6301 "befs: add NFS export support" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * affs_lookup: switch to d_splice_alias()Al Viro2018-05-211-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Making something exportable takes more than providing ->s_export_ops. In particular, ->lookup() *MUST* use d_splice_alias() instead of d_add(). Reading Documentation/filesystems/nfs/Exporting would've been a good idea; as it is, exporting AFFS is badly (and exploitably) broken. Partially-Fixes: ed4433d72394 "fs/affs: make affs exportable" Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>