summaryrefslogtreecommitdiffstats
path: root/fs/namespace.c
Commit message (Collapse)AuthorAgeFilesLines
* Push BKL down into do_remount_sb()Al Viro2009-06-111-8/+2
| | | | | | [folded fix from Jiri Slaby] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Push BKL down beyond VFS-only parts of do_mount()Al Viro2009-06-111-3/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Push BKL into do_mount()Al Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* dcache: extrace and use d_unlinked()Alexey Dobriyan2009-06-111-4/+4
| | | | | | | | | d_unlinked() will be used in middle-term to ban checkpointing when opened but unlinked file is detected, and in long term, to detect such situation and special case on it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: introduce mnt_clone_writenpiggin@suse.de2009-06-111-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up lmbench lat_mmap test by about another 2% after the first patch. Before: avg = 462.286 std = 5.46106 After: avg = 453.12 std = 9.58257 (50 runs of each, stddev gives a reasonable confidence) It does this by introducing mnt_clone_write, which avoids some heavyweight operations of mnt_want_write if called on a vfsmount which we know already has a write count; and mnt_want_write_file, which can call mnt_clone_write if the file is open for write. After these two patches, mnt_want_write and mnt_drop_write go from 7% on the profile down to 1.3% (including mnt_clone_write). [AV: mnt_want_write_file() should take file alone and derive mnt from it; not only all callers have that form, but that's the only mnt about which we know that it's already held for write if file is opened for write] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: mnt_want_write speedupnpiggin@suse.de2009-06-111-177/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it. A microbenchmark yes, but it exercises some important paths in the mm. Before: avg = 501.9 std = 14.7773 After: avg = 462.286 std = 5.46106 (50 runs of each, stddev gives a reasonable confidence, but there is quite a bit of variation there still) It does this by removing the complex per-cpu locking and counter-cache and replaces it with a percpu counter in struct vfsmount. This makes the code much simpler, and avoids spinlocks (although the msync is still pretty costly, unfortunately). It results in about 900 bytes smaller code too. It does increase the size of a vfsmount, however. It should also give a speedup on large systems if CPUs are frequently operating on different mounts (because the existing scheme has to operate on an atomic in the struct vfsmount when switching between mounts). But I'm most interested in the single threaded path performance for the moment. [AV: minor cleanup] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch lookup_mnt()Al Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch follow_down()Al Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Switch collect_mounts() to struct pathAl Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't bother with check_mnt() in do_add_mount() on shrinkable onesAl Viro2009-06-111-1/+1
| | | | | | | | These guys are what we add as submounts; checks for "is that attached in our namespace" are simply irrelevant for those and counterproductive for use of private vfsmount trees a-la what NFS folks want. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Fix races around the access to ->s_optionsAl Viro2009-05-091-3/+18
| | | | | | | | | Put generic_show_options read access to s_options under rcu_read_lock, split save_mount_options() into "we are setting it the first time" (uses in foo_fill_super()) and "we are relacing and freeing the old one", synchronize_rcu() before kfree() in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: umount_begin BKL pushdownAlessio Igor Bogani2009-05-091-2/+0
| | | | | | | Push BKL down into ->umount_begin() Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Touch all affected namespaces on propagation of mountAl Viro2009-04-201-1/+1
| | | | | | | We shouldn't just touch the namespace of current process Caught-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't set relatime when noatime is specifiedAndi Kleen2009-04-191-2/+3
| | | | | | | | | | | | | | | | | Since commit 0a1c01c9477602ee8b44548a9405b2c1d587b5a2 ("Make relatime default") when a file system is mounted explicitely with noatime it gets both the MNT_RELATIME and MNT_NOATIME bits set. This shows up like this in /proc/mounts: /dev/xxx /yyy ext3 rw,noatime,relatime,errors=continue,data=writeback 0 0 That looks strange. The VFS uses noatime in this case, but both flags are set. So it's more a cosmetic issue, but still better to fix. Cc: mjg@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Get rid of indirect include of fs_struct.hAl Viro2009-03-311-0/+1
| | | | | | | | Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Take fs_struct handling to new file (fs/fs_struct.c)Al Viro2009-03-311-68/+0
| | | | | | | | | | Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Get rid of bumping fs_struct refcount in pivot_root(2)Al Viro2009-03-311-9/+17
| | | | | | | | Not because execve races with _that_ are serious - we really need a situation when final drop of fs_struct refcount is done by something that used to have it as current->fs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2009-03-271-2/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...
| * vfs: simple_set_mnt() should return voidSukadev Bhattiprolu2009-03-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Make relatime defaultMatthew Garrett2009-03-261-2/+3
| | | | | | | | | | | | | | | | | | Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Add a strictatime mount optionMatthew Garrett2009-03-261-1/+5
|/ | | | | | | | | Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix incomplete __mntput lockingAl Viro2009-02-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Getting this wrong caused WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() due to optimistically checking cpu_writer->mnt outside the spinlock. Here's what we really want: * we know that nobody will set cpu_writer->mnt to mnt from now on * all changes to that sucker are done under cpu_writer->lock * we want the laziest equivalent of spin_lock(&cpu_writer->lock); if (likely(cpu_writer->mnt != mnt)) { spin_unlock(&cpu_writer->lock); continue; } /* do stuff */ that would make sure we won't miss earlier setting of ->mnt done by another CPU. Anyway, for now we just move the spin_lock() earlier and move the test into the properly locked region. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-and-tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [CVE-2009-0029] System call wrappers part 14Heiko Carstens2009-01-141-2/+2
| | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* [CVE-2009-0029] System call wrappers part 10Heiko Carstens2009-01-141-5/+4
| | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* fs/namespace.c: drop code after returnJulia Lawall2008-12-311-1/+1
| | | | | | | | | | | The extra semicolon serves no purpose. Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Richard Genoud <richard.genoud@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'master' into nextJames Morris2008-11-141-2/+2
|\ | | | | | | | | | | | | | | | | | | | | Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>
| * vfs: fix shrink_submountsEric W. Biederman2008-11-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the last refactoring of shrink_submounts a variable was not completely renamed. So finish the renaming of mnt to m now. Without this if you attempt to mount an nfs mount that has both automatic nfs sub mounts on it, and has normal mounts on it. The unmount will succeed when it should not. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | CRED: Wrap task credential accesses in the filesystem subsystemDavid Howells2008-11-141-1/+1
|/ | | | | | | | | | | | | | | | | Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
* [RFC PATCH] touch_mnt_namespace when the mount flags changeDan Williams2008-10-231-1/+6
| | | | | | | | | | Daemons that need to be launched while the rootfs is read-only can now poll /proc/mounts to be notified when their O_RDWR requests may no longer end in EROFS. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* [PATCH] no need for noinline stuff in fs/namespace.c anymoreAl Viro2008-10-231-12/+5
| | | | | | | Stack footprint from hell had been due to many struct nameidata in there. No more. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] finally get rid of nameidata in namespace.cAl Viro2008-10-231-60/+59
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] pass struct path * to do_add_mount()Al Viro2008-08-011-8/+8
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] sanitize __user_walk_fd() et.al.Al Viro2008-07-261-38/+36
| | | | | | | | | | | | | * do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] vfs: use kstrdup() and check failing allocationLi Zefan2008-07-261-11/+13
| | | | | | | | | | - use kstrdup() instead of kmalloc() + memcpy() - return NULL if allocating ->mnt_devname failed - mnt_devname should be const Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] kill altrootAl Viro2008-07-261-7/+1
| | | | | | long overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Use WARN() in fs/Arjan van de Ven2008-07-261-2/+1
| | | | | | | | | Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* LSM/SELinux: show LSM mount options in /proc/mountsEric Paris2008-07-141-3/+11
| | | | | | | | | | | This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>
* fs: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-04-301-2/+2
| | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: remove lives_below_in_same_fs()Jan Blunck2008-04-291-12/+1
| | | | | | | | | | | | Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is providing the same functionality. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* quota: remove superfluous DQUOT_OFF() in fs/namespace.cJan Kara2008-04-281-2/+0
| | | | | | | | | We don't need to turn quotas off before remounting root ro, because do_remount_sb() already handles this. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] restore sane ->umount_begin() APIAl Viro2008-04-251-4/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 7/7] vfs: mountinfo: show dominating group idMiklos Szeredi2008-04-231-2/+7
| | | | | | | | Show peer group ID of nearest dominating group that has intersection with the mount's namespace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfoRam Pai2008-04-231-23/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [mszeredi@suse.cz] rewrite and split big patch into managable chunks /proc/mounts in its current form lacks important information: - propagation state - root of mount for bind mounts - the st_dev value used within the filesystem - identifier for each mount and it's parent It also suffers from the following problems: - not easily extendable - ambiguity of mountpoints within a chrooted environment - doesn't distinguish between filesystem dependent and independent options - doesn't distinguish between per mount and per super block options This patch introduces /proc/<pid>/mountinfo which attempts to address all these deficiencies. Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is extracted into separate functions. Thanks to Al Viro for the help in getting the design right. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 5/7] vfs: mountinfo: allow using process rootMiklos Szeredi2008-04-231-6/+8
| | | | | | | | | | | | | | | | | | Allow /proc/<pid>/mountinfo to use the root of <pid> to calculate mountpoints. - move definition of 'struct proc_mounts' to <linux/mnt_namespace.h> - add the process's namespace and root to this structure - pass a pointer to 'struct proc_mounts' into seq_operations In addition the following cleanups are made: - use a common open function for /proc/<pid>/{mounts,mountstat} - surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS - make the seq_operations structures const Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 4/7] vfs: mountinfo: add mount peer group IDMiklos Szeredi2008-04-231-3/+90
| | | | | | | | | | | | | | | | | | | | | | | | Add a unique ID to each peer group using the IDR infrastructure. The identifiers are reused after the peer group dissolves. The IDR structures are protected by holding namepspace_sem for write while allocating or deallocating IDs. IDs are allocated when a previously unshared vfsmount becomes the first member of a peer group. When a new member is added to an existing group, the ID is copied from one of the old members. IDs are freed when the last member of a peer group is unshared. Setting the MNT_SHARED flag on members of a subtree is done as a separate step, after all the IDs have been allocated. This way an allocation failure can be cleaned up easilty, without affecting the propagation state. Based on design sketch by Al Viro. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 3/7] vfs: mountinfo: add mount IDMiklos Szeredi2008-04-231-0/+34
| | | | | | | | Add a unique ID to each vfsmount using the IDR infrastructure. The identifiers are reused after the vfsmount is freed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] get rid of more nameidata passing in namespace.cAl Viro2008-04-211-28/+25
| | | | | | | Further reduction of stack footprint (sys_pivot_root()); lose useless BKL in there, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] switch a bunch of LSM hooks from nameidata to pathAl Viro2008-04-211-5/+6
| | | | | | Namely, ones from namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] lock exclusively in collect_mounts() and drop_collected_mounts()Al Viro2008-04-211-4/+4
| | | | | | | | | | Taking namespace_sem shared there isn't worth the trouble, especially with vfsmount ID allocation about to be added. That way we know that umount_tree(), copy_tree() and clone_mnt() are _always_ serialized by namespace_sem. umount_tree() still needs vfsmount_lock (it manipulates hash chains, among other things), but that's a separate story. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] r/o bind mounts: honor mount writer counts at remountDave Hansen2008-04-191-7/+43
| | | | | | | | | | | | | | | | | | | | | Originally from: Herbert Poetzl <herbert@13thfloor.at> This is the core of the read-only bind mount patch set. Note that this does _not_ add a "ro" option directly to the bind mount operation. If you require such a mount, you must first do the bind, then follow it up with a 'mount -o remount,ro' operation: If you wish to have a r/o bind mount of /foo on bar: mount --bind /foo /bar mount -o remount,ro /bar Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>