summaryrefslogtreecommitdiffstats
path: root/include/linux/namei.h
Commit message (Collapse)AuthorAgeFilesLines
* audit: ignore fcaps on umountRichard Guy Briggs2019-01-301-0/+3
| | | | | | | | | | | | | | | | | | | | Don't fetch fcaps when umount2 is called to avoid a process hang while it waits for the missing resource to (possibly never) re-appear. Note the comment above user_path_mountpoint_at(): * A umount is a special case for path walking. We're not actually interested * in the inode in this situation, and ESTALE errors can be a problem. We * simply want track down the dentry and vfsmount attached at the mountpoint * and avoid revalidating the last component. This can happen on ceph, cifs, 9p, lustre, fuse (gluster) or NFS. Please see the github issue tracker https://github.com/linux-audit/audit-kernel/issues/100 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: merge fuzz in audit_log_fcaps()] Signed-off-by: Paul Moore <paul@paul-moore.com>
* afs: Display manually added cells in dynamic root mountDavid Howells2018-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Alter the dynroot mount so that cells created by manipulation of /proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root cell as a module parameter will cause directories for those cells to be created in the dynamic root superblock for the network namespace[*]. To this end: (1) Only one dynamic root superblock is now created per network namespace and this is shared between all attempts to mount it. This makes it easier to find the superblock to modify. (2) When a dynamic root superblock is created, the list of cells is walked and directories created for each cell already defined. (3) When a new cell is added, if a dynamic root superblock exists, a directory is created for it. (4) When a cell is destroyed, the directory is removed. (5) These directories are created by calling lookup_one_len() on the root dir which automatically creates them if they don't exist. [*] Inasmuch as network namespaces are currently supported here. Signed-off-by: David Howells <dhowells@redhat.com>
* License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* make sure that mntns_install() doesn't end up with referral for rootAl Viro2017-04-211-0/+1
| | | | | | | | | | new flag: LOOKUP_DOWN. If the starting point is overmounted, cross into whatever's mounted on top, triggering referrals et.al. Use that instead of follow_down_one() loop in mntns_install(), handle errors properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Revert "vfs: add lookup_hash() helper"Linus Torvalds2016-07-291-2/+0
| | | | | | | | | | | | | | | | | | | | | This reverts commit 3c9fe8cdff1b889a059a30d22f130372f2b3885f. As Miklos points out in commit c1b2cc1a765a, the "lookup_hash()" helper is now unused, and in fact, with the hash salting changes, since the hash of a dentry name now depends on the directory dentry it is in, the helper function isn't even really likely to be useful. So rather than keep it around in case somebody else might end up finding a use for it, let's just remove the helper and not trick people into thinking it might be a useful thing. For example, I had obviously completely missed how the helper didn't follow the normal dentry hashing patterns, and how the hash salting patch broke overlayfs. Things would quietly build and look sane, but not work. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* devpts: Make each mount of devpts an independent filesystem.Eric W. Biederman2016-06-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The /dev/ptmx device node is changed to lookup the directory entry "pts" in the same directory as the /dev/ptmx device node was opened in. If there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx uses that filesystem. Otherwise the open of /dev/ptmx fails. The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that userspace can now safely depend on each mount of devpts creating a new instance of the filesystem. Each mount of devpts is now a separate and equal filesystem. Reserved ttys are now available to all instances of devpts where the mounter is in the initial mount namespace. A new vfs helper path_pts is introduced that finds a directory entry named "pts" in the directory of the passed in path, and changes the passed in path to point to it. The helper path_pts uses a function path_parent_directory that was factored out of follow_dotdot. In the implementation of devpts: - devpts_mnt is killed as it is no longer meaningful if all mounts of devpts are equal. - pts_sb_from_inode is replaced by just inode->i_sb as all cached inodes in the tty layer are now from the devpts filesystem. - devpts_add_ref is rolled into the new function devpts_ptmx. And the unnecessary inode hold is removed. - devpts_del_ref is renamed devpts_release and reduced to just a deacrivate_super. - The newinstance mount option continues to be accepted but is now ignored. In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as they are never used. Documentation/filesystems/devices.txt is updated to describe the current situation. This has been verified to work properly on openwrt-15.05, centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the caveat that on centos6 and on slackware-14.1 that there wind up being two instances of the devpts filesystem mounted on /dev/pts, the lower copy does not end up getting used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: add lookup_hash() helperMiklos Szeredi2016-05-101-0/+2
| | | | | | | | | | | Overlayfs needs lookup without inode_permission() and already has the name hash (in form of dentry->d_name on overlayfs dentry). It also doesn't support filesystems with d_op->d_hash() so basically it only needs the actual hashed lookup from lookup_one_len_unlocked() So add a new helper that does unlocked lookup of a hashed name. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* namei: teach lookup_slow() to skip revalidateAl Viro2016-03-141-0/+1
| | | | | | | | ... and make mountpoint_last() use it. That makes all candidates for lookup with parent locked shared go through lookup_slow(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nfsd: don't hold i_mutex over userspace upcallsNeilBrown2016-01-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need information about exports when crossing mountpoints during lookup or NFSv4 readdir. If we don't already have that information cached, we may have to ask (and wait for) rpc.mountd. In both cases we currently hold the i_mutex on the parent of the directory we're asking rpc.mountd about. We've seen situations where rpc.mountd performs some operation on that directory that tries to take the i_mutex again, resulting in deadlock. With some care, we may be able to avoid that in rpc.mountd. But it seems better just to avoid holding a mutex while waiting on userspace. It appears that lookup_one_len is pretty much the only operation that needs the i_mutex. So we could just drop the i_mutex elsewhere and do something like mutex_lock() lookup_one_len() mutex_unlock() In many cases though the lookup would have been cached and not required the i_mutex, so it's more efficient to create a lookup_one_len() variant that only takes the i_mutex when necessary. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* turn user_{path_at,path,lpath,path_dir}() into static inlinesAl Viro2015-05-151-10/+24
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of assorted nameidata-related debrisAl Viro2015-05-151-1/+0
| | | | | | pointless forward declarations, stale comments Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't pass nameidata to ->follow_link()Al Viro2015-05-101-1/+1
| | | | | | | its only use is getting passed to nd_jump_link(), which can obtain it from current->nameidata Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* namei: remove restrictions on nesting depthAl Viro2015-05-101-0/+2
| | | | | | | The only restriction is that on the total amount of symlinks crossed; how they are nested does not matter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new ->follow_link() and ->put_link() calling conventionsAl Viro2015-05-101-2/+0
| | | | | | | | | | | | | | | | | | | a) instead of storing the symlink body (via nd_set_link()) and returning an opaque pointer later passed to ->put_link(), ->follow_link() _stores_ that opaque pointer (into void * passed by address by caller) and returns the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic symlinks) and pointer to symlink body for normal symlinks. Stored pointer is ignored in all cases except the last one. Storing NULL for opaque pointer (or not storing it at all) means no call of ->put_link(). b) the body used to be passed to ->put_link() implicitly (via nameidata). Now only the opaque pointer is. In the cases when we used the symlink body to free stuff, ->follow_link() now should store it as opaque pointer in addition to returning it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make nameidata completely opaque outside of fs/namei.cAl Viro2014-12-101-22/+3
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* RCU'd vfsmountsAl Viro2013-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * RCU-delayed freeing of vfsmounts * vfsmount_lock replaced with a seqlock (mount_lock) * sequence number from mount_lock is stored in nameidata->m_seq and used when we exit RCU mode * new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its caller knows that vfsmount will have no surviving references. * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock() and doing pending mntput(). * new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence number against seq, then grabs reference to mnt. Then it rechecks mount_lock again to close the race and either returns success or drops the reference it has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can simply decrement the refcount and sod off - aforementioned synchronize_rcu() makes sure that final mntput() won't come until we leave RCU mode. We need that, since we don't want to end up with some lazy pathwalk racing with umount() and stealing the final mntput() from it - caller of umount() may expect it to return only once the fs is shut down and we don't want to break that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do full-blown mntput() in case of mount_lock sequence number mismatch happening just as we'd grabbed the reference, but in those cases we won't be stealing the final mntput() from anything that would care. * mntput_no_expire() doesn't lock anything on the fast path now. Incidentally, SMP and UP cases are handled the same way - no ifdefs there. * normal pathname resolution does *not* do any writes to mount_lock. It does, of course, bump the refcounts of vfsmount and dentry in the very end, but that's it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* introduce kern_path_mountpoint()Al Viro2013-09-081-0/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* rename user_path_umountat() to user_path_mountpoint_at()Al Viro2013-09-081-3/+0
| | | | | | | ... and move the extern from linux/namei.h to fs/internal.h, along with that of vfs_path_lookup(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: allow umount to handle mountpoints without revalidating themJeff Layton2013-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Christopher reported a regression where he was unable to unmount a NFS filesystem where the root had gone stale. The problem is that d_revalidate handles the root of the filesystem differently from other dentries, but d_weak_revalidate does not. We could simply fix this by making d_weak_revalidate return success on IS_ROOT dentries, but there are cases where we do want to revalidate the root of the fs. A umount is really a special case. We generally aren't interested in anything but the dentry and vfsmount that's attached at that point. If the inode turns out to be stale we just don't care since the intent is to stop using it anyway. Try to handle this situation better by treating umount as a special case in the lookup code. Have it resolve the parent using normal means, and then do a lookup of the final dentry without revalidating it. In most cases, the final lookup will come out of the dcache, but the case where there's a trailing symlink or !LAST_NORM entry on the end complicates things a bit. Cc: Neil Brown <neilb@suse.de> Reported-by: Christopher T Vogan <cvogan@us.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* namei.h: include errno.hStephen Warren2012-12-251-0/+1
| | | | | | | | | | | This solves: In file included from fs/ext3/symlink.c:20:0: include/linux/namei.h: In function 'retry_estale': include/linux/namei.h:114:19: error: 'ESTALE' undeclared (first use in this function) Signed-off-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: turn is_dir argument to kern_path_create into a lookup_flags argJeff Layton2012-12-201-2/+2
| | | | | | | | Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags passed in here are currently ignored. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: add a retry_estale helper function to handle retries on ESTALEJeff Layton2012-12-201-0/+16
| | | | | | | | | | | | | This function is expected to be called from path-based syscalls to help them decide whether to try the lookup and call again in the event that they got an -ESTALE return back on an earier try. Currently, we only retry the call once on an ESTALE error, but in the event that we decide that that's not enough in the future, we should be able to change the logic in this helper without too much effort. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: done_path_create()Al Viro2012-07-291-0/+1
| | | | | | releases what needs to be released after {kern,user}_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: add nd_jump_linkChristoph Hellwig2012-07-141-0/+2
| | | | | | | | | | Add a helper that abstracts out the jump to an already parsed struct path from ->follow_link operation from procfs. Not only does this clean up the code by moving the two sides of this game into a single helper, but it also prepares for making struct nameidata private to namei.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of kern_path_parent()Al Viro2012-07-141-1/+1
| | | | | | | | | all callers want the same thing, actually - a kinda-sorta analog of kern_path_create(). I.e. they want parent vfsmount/dentry (with ->i_mutex held, to make sure the child dentry is still their child) + the child dentry. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
* vfs: remove open intents from nameidataMiklos Szeredi2012-07-141-14/+0
| | | | | | | | | All users of open intents have been converted to use ->atomic_{open,create}. This patch gets rid of nd->intent.open and related infrastructure. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* readlinkat: ensure we return ENOENT for the empty pathname for normal lookupsAndy Whitcroft2011-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the commit below which added O_PATH support to the *at() calls, the error return for readlink/readlinkat for the empty pathname has switched from ENOENT to EINVAL: commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames This is both unexpected for userspace and makes readlink/readlinkat inconsistant with all other interfaces; and inconsistant with our stated return for these pathnames. As the readlinkat call does not have a flags parameter we cannot use the AT_EMPTY_PATH approach used in the other calls. Therefore expose whether the original path is infact entry via a new user_path_at_empty() path lookup function. Use this to determine whether to default to EINVAL or ENOENT for failures. Addresses http://bugs.launchpad.net/bugs/817187 [akpm@linux-foundation.org: remove unused getname_flags()] Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* vfs: remove LOOKUP_NO_AUTOMOUNT flagLinus Torvalds2011-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really *needed* to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for *all* of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs pathname lookup: Add LOOKUP_AUTOMOUNT flagLinus Torvalds2011-09-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* switch vfs_path_lookup() to struct pathAl Viro2011-07-201-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helpers: kern_path_create/user_path_createAl Viro2011-07-201-0/+2
| | | | | | | combination of kern_path_parent() and lookup_create(). Does *not* expose struct nameidata to caller. Syscalls converted to that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill LOOKUP_CONTINUEAl Viro2011-07-201-1/+0
| | | | | | LOOKUP_PARENT is equivalent to it now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't pass 'mounting_here' flag to follow_down()Al Viro2011-03-181-1/+1
| | | | | | it's always false now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* New AT_... flag: AT_EMPTY_PATHAl Viro2011-03-141-0/+1
| | | | | | | | | | For name_to_handle_at(2) we'll want both ...at()-style syscall that would be usable for non-directory descriptors (with empty relative pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to deal with the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reduce vfs_path_lookup() to do_path_lookup()Al Viro2011-03-141-0/+1
| | | | | | | | | | | | | | | New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller, path_init() starts walking from that place and all pathname resolution machinery never drops nd->root if that flag is set. That turns vfs_path_lookup() into a special case of do_path_lookup() *and* gets us down to 3 callers of link_path_walk(), making it finally feasible to rip the handling of trailing symlink out of link_path_walk(). That will not only simply the living hell out of it, but make life much simpler for unionfs merge. Trailing symlink handling will become iterative, which is a good thing for stack footprint in a lot of situations as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of nd->fileAl Viro2011-03-141-1/+0
| | | | | | | Don't stash the struct file * used as starting point of walk in nameidata; pass file ** to path_init() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* untangle the "need_reval_dot" messAl Viro2011-03-141-0/+2
| | | | | | | | | | | | | instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill path_lookup()Al Viro2011-03-141-1/+1
| | | | | | | all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Add an AT_NO_AUTOMOUNT flag to suppress terminal automountDavid Howells2011-01-151-0/+2
| | | | | | | | | | | Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount point directories. This can be used by fstatat() users to permit the gathering of attributes on an automount point and also prevent mass-automounting of a directory of automount points by ls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Add a dentry op to allow processes to be held during pathwalk transitDavid Howells2011-01-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (*d_manage)(struct path *path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: rcu-walk for path lookupNick Piggin2011-01-071-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* fs: dcache remove dcache_lockNick Piggin2011-01-071-1/+0
| | | | | | dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3)Al Viro2009-12-221-2/+0
| | | | | | | | | | | Just set f_flags when shoving struct file into nameidata; don't postpone that until __dentry_open(). do_filp_open() has correct value; lookup_instantiate_filp() doesn't - we lose the difference between O_RDWR and 3 by that point. We still set .intent.open.flags, so no fs code needs to be changed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* sysfs: Propagate renames to the vfs on demandEric W. Biederman2009-12-111-1/+0
| | | | | | | | | | | | | | | | | | By teaching sysfs_revalidate to hide a dentry for a sysfs_dirent if the sysfs_dirent has been renamed, and by teaching sysfs_lookup to return the original dentry if the sysfs dirent has been renamed. I can show the results of renames correctly without having to update the dcache during the directory rename. This massively simplifies the rename logic allowing a lot of weird sysfs special cases to be removed along with a lot of now unnecesary helper code. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* trivial: fix typo in namei.h commentOri Avtalion2009-09-211-1/+1
| | | | | Signed-off-by: Ori Avtalion <ori@avtalion.name> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* switch follow_down()Al Viro2009-06-111-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch follow_up() to struct pathAl Viro2009-06-111-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Cache root in nameidataAl Viro2009-06-111-0/+1
| | | | | | | | | | | | New field: nd->root. When pathname resolution wants to know the root, check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise copy current->fs->root there. After path_walk() is finished, we check if we'd got a cached value in nd->root and drop it. Before calling path_walk() we should either set nd->root.mnt to NULL *or* copy (and pin down) some path to nd->root. In the latter case we won't be looking at current->fs->root at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Switch open_exec() and sys_uselib() to do_open_filp()Al Viro2009-05-091-1/+0
| | | | | | ... and make path_lookup_open() static Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: introduce helper function to safely NUL-terminate symlinksDuane Griffin2008-12-311-0/+5
| | | | | | | | | | | A number of filesystems were potentially triggering kernel bugs due to corrupted symlink names on disk. This function helps safely terminate the names. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>