summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* procfs: new helper - PDE_DATA(inode)Al Viro2013-04-096-9/+9
| | | | | | | | | | The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* procfs: kill ->write_proc()Al Viro2013-04-091-25/+0
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: single_open_size()Al Viro2013-04-091-0/+18
| | | | | | | | | | | | | | | Same as single_open(), but preallocates the buffer of given size. Doesn't make any sense for sizes up to PAGE_SIZE and doesn't make sense if output of show() exceeds PAGE_SIZE only rarely - seq_read() will take care of growing the buffer and redoing show(). If you _know_ that it will be large, it might make more sense to look into saner iterator, rather than go with single-shot one. If that's impossible, single_open_size() might be for you. Again, don't use that without a good reason; occasionally that's really the best way to go, but very often there are better solutions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* procfs: don't allow to use proc_create, create_proc_entry, etc. for directoriesAl Viro2013-04-092-34/+27
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: use proc_remove_subtree()Al Viro2013-04-091-21/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* procfs: switch /proc/self away from proc_dir_entryAl Viro2013-04-095-12/+55
| | | | | | | Just have it pinned in dcache all along and let procfs ->kill_sb() drop it before kill_anon_super(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mode_t, whack-a-mole at 11...Al Viro2013-04-093-3/+3
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of the last free_pipe_info() callersAl Viro2013-04-091-12/+6
| | | | | | and rename __free_pipe_info() to free_pipe_info() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of alloc_pipe_info() argumentAl Viro2013-04-092-4/+4
| | | | | | not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of pipe->inodeAl Viro2013-04-093-6/+5
| | | | | | | | it's used only as a flag to distinguish normal pipes/FIFOs from the internal per-task one used by file-to-file splice. And pipe->files would work just as well for that purpose... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* introduce variants of pipe_lock/pipe_unlock for real pipes/FIFOsAl Viro2013-04-091-15/+25
| | | | | | | fs/pipe.c file_operations methods *know* that pipe is not an internal one; no need to check pipe->inode for those callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: set file->private_data to ->i_pipeAl Viro2013-04-092-11/+10
| | | | | | simplify get_pipe_info(), while we are at it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: don't use ->i_mutexAl Viro2013-04-092-6/+5
| | | | | | | now it can be done - put mutex into pipe_inode_info, use it instead of ->i_mutex Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: take allocation and freeing of pipe_inode_info out of ->i_mutexAl Viro2013-04-091-21/+51
| | | | | | | | | | | | | | | | | | * new field - pipe->files; number of struct file over that pipe (all sharing the same inode, of course); protected by inode->i_lock. * pipe_release() decrements pipe->files, clears inode->i_pipe when if the counter has reached 0 (all under ->i_lock) and, in that case, frees pipe after having done pipe_unlock() * fifo_open() starts with grabbing ->i_lock, and either bumps pipe->files if ->i_pipe was non-NULL or allocates a new pipe (dropping and regaining ->i_lock) and rechecks ->i_pipe; if it's still NULL, inserts new pipe there, otherwise bumps ->i_pipe->files and frees the one we'd allocated. At that point we know that ->i_pipe is non-NULL and won't go away, so we can do pipe_lock() on it and proceed as we used to. If we end up failing, decrement pipe->files and if it reaches 0 clear ->i_pipe and free the sucker after pipe_unlock(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: preparation to new locking rulesAl Viro2013-04-091-23/+15
| | | | | | | | * use the fact that file_inode(file)->i_pipe doesn't change while the file is opened - no locks needed to access that. * switch to pipe_lock/pipe_unlock where it's easy to do Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: switch wait_for_partner() and wake_up_partner() to pipe_inode_infoAl Viro2013-04-091-9/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: fold file_operations instances in oneAl Viro2013-04-093-190/+38
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold fifo.c into pipe.cAl Viro2013-04-093-154/+139
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write out of ->splice_write()Al Viro2013-04-091-6/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write into default_file_splice_write()Al Viro2013-04-091-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write() out of ->write()Al Viro2013-04-095-10/+22
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch compat readv/writev variants to COMPAT_SYSCALL_DEFINEAl Viro2013-04-093-192/+195
| | | | | | ... and take to fs/read_write.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* f2fs: use mnt_want_write_file() in ioctlAl Viro2013-04-091-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write/sb_end_write out of ->aio_write()Al Viro2013-04-099-20/+14
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* hpfs: move setting hpfs-private i_dirty to ->write_end()Al Viro2013-04-091-16/+20
| | | | | | ... so that writev(2) doesn't miss it. Get rid of hpfs_file_write(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: don't wank with EFBIG before calling do_sync_write()Al Viro2013-04-091-60/+1
| | | | | | look for file_capable() in there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold release_mounts() into namespace_unlock()Al Viro2013-04-091-23/+30
| | | | | | | | | ... and provide namespace_lock() as a trivial wrapper; switch to those two consistently. Result is patterned after rtnl_lock/rtnl_unlock pair. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch unlock_mount() to namespace_unlock(), convert all umount_tree() callersAl Viro2013-04-093-24/+16
| | | | | | | which allows to kill the last argument of umount_tree() and make release_mounts() static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* more conversions to namespace_unlock()Al Viro2013-04-091-14/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of the second argument of shrink_submounts()Al Viro2013-04-091-4/+4
| | | | | | ... it's always &unmounted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* saner umount_tree()/release_mounts(), part 1Al Viro2013-04-091-4/+13
| | | | | | | | | global list of release_mounts() fodder, protected by namespace_sem; eventually, all umount_tree() callers will use it as kill list. Helper picking the contents of that list, releasing namespace_sem and doing release_mounts() on what it got. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of full-hash scan on detaching vfsmountsAl Viro2013-04-094-97/+149
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mnt: release locks on error path in do_loopbackAndrey Vagin2013-04-091-1/+1
| | | | | | | | | | | | | | | | | do_loopback calls lock_mount(path) and forget to unlock_mount if clone_mnt or copy_mnt fails. [ 77.661566] ================================================ [ 77.662939] [ BUG: lock held when returning to user space! ] [ 77.664104] 3.9.0-rc5+ #17 Not tainted [ 77.664982] ------------------------------------------------ [ 77.666488] mount/514 is leaving the kernel with locks still held! [ 77.668027] 2 locks held by mount/514: [ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0 [ 77.671755] #1: (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* procfs: add proc_remove_subtree()Al Viro2013-04-091-30/+89
| | | | | | | | | just what it sounds like; do that only to procfs subtrees you've created - doing that to something shared with another driver is not only antisocial, but might cause interesting races with proc_create() and its ilk. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ecryptfs: close rmmod raceAl Viro2013-04-091-12/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-03-264-6/+44
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "stable fodder; assorted deadlock fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vt: synchronize_rcu() under spinlock is not nice... Nest rename_lock inside vfsmount_lock Don't bother with redoing rw_verify_area() from default_file_splice_from()
| * Nest rename_lock inside vfsmount_lockAl Viro2013-03-261-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... lest we get livelocks between path_is_under() and d_path() and friends. The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks; it is possible to have thread B spin on attempt to take lock shared while thread A is already holding it shared, if B is on lower-numbered CPU than A and there's a thread C spinning on attempt to take the same lock exclusive. As the result, we need consistent ordering between vfsmount_lock (lglock) and rename_lock (seq_lock), even though everything that takes both is going to take vfsmount_lock only shared. Spotted-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Don't bother with redoing rw_verify_area() from default_file_splice_from()Al Viro2013-03-213-1/+33
| | | | | | | | | | | | | | | | | | | | default_file_splice_from() ends up calling vfs_write() (via very convoluted callchain). It's an overkill, since we already have done rw_verify_area() in the caller by the time we call vfs_write() we are under set_fs(KERNEL_DS), so access_ok() is also pointless. Add a new helper (__kernel_write()), use it instead of kernel_write() in there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-03-266-32/+89
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: - Fix an NFSv4 idmapper regression - Fix an Oops in the pNFS blocks client - Fix up various issues with pNFS layoutcommit - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked * tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked NFSv4.1: Add a helper pnfs_commit_and_return_layout NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn NFSv4.1: Fix a race in pNFS layoutcommit pnfs-block: removing DM device maybe cause oops when call dev_remove NFSv4: Fix the string length returned by the idmapper
| * | NFSv4.1: Add a helper pnfs_commit_and_return_layoutTrond Myklebust2013-03-213-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to safely return the layout in nfs4_proc_setattr, we need to block new uses of the layout, wait for all outstanding users of the layout to complete, commit the layout and then return it. This patch adds a helper in order to do all this safely. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Boaz Harrosh <bharrosh@panasas.com>
| * | NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturnTrond Myklebust2013-03-212-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout segments. The only two sites that need to do this are the ones that call pnfs_return_layout() without first doing a layout commit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org
| * | NFSv4.1: Fix a race in pNFS layoutcommitTrond Myklebust2013-03-212-15/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations where the two are out of sync. The first half of the problem is to ensure that pnfs_layoutcommit_inode clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg. We still need to keep the reference to those segments until the RPC call is finished, so in order to make it clear _where_ those references come from, we add a helper pnfs_list_write_lseg_done() that cleans up after pnfs_list_write_lseg. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org
| * | pnfs-block: removing DM device maybe cause oops when call dev_removefanchaoting2013-03-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when pnfs block using device mapper,if umounting later,it maybe cause oops. we apply "1 + sizeof(bl_umount_request)" memory for msg->data, the memory maybe overflow when we do "memcpy(&dataptr [sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))", because the size of bl_msg is more than 1 byte. Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFSv4: Fix the string length returned by the idmapperTrond Myklebust2013-03-201-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Functions like nfs_map_uid_to_name() and nfs_map_gid_to_group() are expected to return a string without any terminating NUL character. Regression introduced by commit 57e62324e469e092ecc6c94a7a86fe4bd6ac5172 (NFS: Store the legacy idmapper result in the keyring). Reported-by: Dave Chiluk <dave.chiluk@canonical.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org [>=3.4]
* | | Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2013-03-252-6/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd bugfixes from J Bruce Fields: "Fixes for a couple mistakes in the new DRC code. And thanks to Kent Overstreet for noticing we've been sync'ing the wrong range on stable writes since 3.8." * 'for-3.9' of git://linux-nfs.org/~bfields/linux: nfsd: fix bad offset use nfsd: fix startup order in nfsd_reply_cache_init nfsd: only unhash DRC entries that are in the hashtable
| * | | nfsd: fix bad offset useKent Overstreet2013-03-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vfs_writev() updates the offset argument - but the code then passes the offset to vfs_fsync_range(). Since offset now points to the offset after what was just written, this is probably not what was intended Introduced by face15025ffdf664de95e86ae831544154d26c9c "nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes". Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: stable@vger.kernel.org Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd: fix startup order in nfsd_reply_cache_initJeff Layton2013-03-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we end up doing "goto out_nomem" in this function, we'll call nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and free entries, but that list may not be initialized yet if the server is starting up for the first time. It's also possible for the shrinker to kick in before we've initialized the LRU list. Rearrange the initialization so that the LRU list_head and cache size are initialized before doing any of the allocations that might fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd: only unhash DRC entries that are in the hashtableJeff Layton2013-03-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not safe to call hlist_del() on a newly initialized hlist_node. That leads to a NULL pointer dereference. Only do that if the entry is hashed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | vfs,proc: guarantee unique inodes in /procLinus Torvalds2013-03-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Jones found another /proc issue with his Trinity tool: thanks to the namespace model, we can have multiple /proc dentries that point to the same inode, aliasing directories in /proc/<pid>/net/ for example. This ends up being a total disaster, because it acts like hardlinked directories, and causes locking problems. We rely on the topological sort of the inodes pointed to by dentries, and if we have aliased directories, that odering becomes unreliable. In short: don't do this. Multiple dentries with the same (directory) inode is just a bad idea, and the namespace code should never have exposed things this way. But we're kind of stuck with it. This solves things by just always allocating a new inode during /proc dentry lookup, instead of using "iget_locked()" to look up existing inodes by superblock and number. That actually simplies the code a bit, at the cost of potentially doing more inode [de]allocations. That said, the inode lookup wasn't free either (and did a lot of locking of inodes), so it is probably not that noticeable. We could easily keep the old lookup model for non-directory entries, but rather than try to be excessively clever this just implements the minimal and simplest workaround for the problem. Reported-and-tested-by: Dave Jones <davej@redhat.com> Analyzed-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2013-03-216-56/+43
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull CIFS fixes from Steve French: "Three small CIFS Fixes (the most important of the three fixes a recent problem authenticating to Windows 8 using cifs rather than SMB2)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: ignore everything in SPNEGO blob after mechTypes cifs: delay super block destruction until all cifsFileInfo objects are gone cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSY