summaryrefslogtreecommitdiffstats
path: root/fs/overlayfs/util.c
Commit message (Collapse)AuthorAgeFilesLines
* ovl: handle idmappings in layer open helpersChristian Brauner2022-04-281-2/+3
| | | | | | | | | | | | | In earlier patches we already passed down the relevant upper or lower path to ovl_open_realfile(). Now let the open helpers actually take the idmapping of the relevant mount into account when checking permissions. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use ovl_copy_{real,upper}attr() wrappersChristian Brauner2022-04-281-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | When copying inode attributes from the upper or lower layer to ovl inodes we need to take the upper or lower layer's mount's idmapping into account. In a lot of places we call ovl_copyattr() only on upper inodes and in some we call it on either upper or lower inodes. Split this into two separate helpers. The first one should only be called on upper inodes and is thus called ovl_copy_upperattr(). The second one can be called on upper or lower inodes. We add ovl_copy_realattr() for this task. The new helper makes use of the previously added ovl_i_path_real() helper. This is needed to support idmapped base layers with overlay. When overlay copies the inode information from an upper or lower layer to the relevant overlay inode it will apply the idmapping of the upper or lower layer when doing so. The ovl inode ownership will thus always correctly reflect the ownership of the idmapped upper or lower layer. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: store lower path in ovl_inodeAmir Goldstein2022-04-281-1/+14
| | | | | | | | | | | | | | | | | | Create some ovl_i_* helpers to get real path from ovl inode. Instead of just stashing struct inode for the lower layer we stash struct path for the lower layer. The helpers allow to retrieve a struct path for the relevant upper or lower layer. This will be used when retrieving information based on struct inode when copying up inode attributes from upper or lower inodes to ovl inodes and when checking permissions in ovl_permission() in following patches. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use ovl_path_getxattr() wrapperChristian Brauner2022-04-281-15/+14
| | | | | | | | | | | | | | | | | | | | | Add a helper that allows to retrieve ovl xattrs from either lower or upper layers. To stop passing mnt and dentry separately everywhere use struct path which more accurately reflects the tight coupling between mount and dentry in this helper. Swich over all places to pass a path argument that can operate on either upper or lower layers. This is needed to support idmapped base layers with overlayfs. Some helpers are always called with an upper dentry, which is now utilized by these helpers to create the path. Make this usage explicit by renaming the argument to "upperdentry" and by renaming the function as well in some cases. Also add a check in ovl_do_getxattr() to catch misuse of these functions. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use ovl_lookup_upper() wrapperChristian Brauner2022-04-281-1/+1
| | | | | | | | | | | | | Introduce ovl_lookup_upper() as a simple wrapper around lookup_one(). Make it clear in the helper's name that this only operates on the upper layer. The wrapper will take upper layer's idmapping into account when checking permission in lookup_one(). Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: pass layer mnt to ovl_open_realfile()Amir Goldstein2022-04-281-0/+14
| | | | | | | | | | | | | | Ensure that ovl_open_realfile() takes the mount's idmapping into account. We add a new helper ovl_path_realdata() that can be used to easily retrieve the relevant path which we can pass down. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: pass ofs to creation operationsChristian Brauner2022-04-281-1/+1
| | | | | | | | | | | | | Pass down struct ovl_fs to all creation helpers so we can ultimately retrieve the relevant upper mount and take the mount's idmapping into account when creating new filesystem objects. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use wrappers to all vfs_*xattr() callsAmir Goldstein2022-04-281-8/+8
| | | | | | | | | | | | | | | | | | | Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs and use helpers ovl_do_*xattr() to access generic xattrs. This is a preparatory patch for using idmapped base layers with overlay. Note that a few of those places called vfs_*xattr() calls directly to reduce the amount of debug output. But as Miklos pointed out since overlayfs has been stable for quite some time the debug output isn't all that relevant anymore and the additional debug in all locations was actually quite helpful when developing this patch series. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: consistent behavior for immutable/append-only inodesAmir Goldstein2021-08-171-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a lower file has immutable/append-only fileattr flags, the behavior of overlayfs post copy up is inconsistent. Immediattely after copy up, ovl inode still has the S_IMMUTABLE/S_APPEND inode flags copied from lower inode, so vfs code still treats the ovl inode as immutable/append-only. After ovl inode evict or mount cycle, the ovl inode does not have these inode flags anymore. We cannot copy up the immutable and append-only fileattr flags, because immutable/append-only inodes cannot be linked and because overlayfs will not be able to set overlay.* xattr on the upper inodes. Instead, if any of the fileattr flags of interest exist on the lower inode, we store them in overlay.protattr xattr on the upper inode and we read the flags from xattr on lookup and on fileattr_get(). This gives consistent behavior post copy up regardless of inode eviction from cache. When user sets new fileattr flags, we update or remove the overlay.protattr xattr. Storing immutable/append-only fileattr flags in an xattr instead of upper fileattr also solves other non-standard behavior issues - overlayfs can now copy up children of "ovl-immutable" directories and lower aliases of "ovl-immutable" hardlinks. Reported-by: Chengguang Xu <cgxu519@mykernel.net> Link: https://lore.kernel.org/linux-unionfs/20201226104618.239739-1-cgxu519@mykernel.net/ Link: https://lore.kernel.org/linux-unionfs/20210210190334.1212210-5-amir73il@gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: pass ovl_fs to ovl_check_setxattr()Amir Goldstein2021-08-171-4/+3
| | | | | | | Instead of passing the overlay dentry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: invalidate readdir cache on changes to dir with originAmir Goldstein2021-04-121-22/+9
| | | | | | | | | | | | | | | | The test in ovl_dentry_version_inc() was out-dated and did not include the case where readdir cache is used on a non-merge dir that has origin xattr, indicating that it may contain leftover whiteouts. To make the code more robust, use the same helper ovl_dir_is_real() to determine if readdir cache should be used and if readdir cache should be invalidated. Fixes: b79e05aaa166 ("ovl: no direct iteration for dir with origin xattr") Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxht70nODhNHNwGFMSqDyOKLXOKrY0H6g849os4BQ7cokA@mail.gmail.com/ Cc: Chris Murphy <lists@colorremedies.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix misspellings using codespell toolXiong Zhenwu2021-04-121-1/+1
| | | | | | | | | | | | A typo is found out by codespell tool: $ codespell ./fs/overlayfs/ ./fs/overlayfs/util.c:217: dependig ==> depending Fix a typo found by codespell. Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds2021-02-231-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
| * inode: make init and permission helpers idmapped mount awareChristian Brauner2021-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode_owner_or_capable() helper determines whether the caller is the owner of the inode or is capable with respect to that inode. Allow it to handle idmapped mounts. If the inode is accessed through an idmapped mount it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Similarly, allow the inode_init_owner() helper to handle idmapped mounts. It initializes a new inode on idmapped mounts by mapping the fsuid and fsgid of the caller from the mount's user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * namei: make permission helpers idmapped mount awareChristian Brauner2021-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | ovl: implement volatile-specific fsync error behaviourSargun Dhillon2021-01-281-0/+27
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlayfs's volatile option allows the user to bypass all forced sync calls to the upperdir filesystem. This comes at the cost of safety. We can never ensure that the user's data is intact, but we can make a best effort to expose whether or not the data is likely to be in a bad state. The best way to handle this in the time being is that if an overlayfs's upperdir experiences an error after a volatile mount occurs, that error will be returned on fsync, fdatasync, sync, and syncfs. This is contradictory to the traditional behaviour of VFS which fails the call once, and only raises an error if a subsequent fsync error has occurred, and been raised by the filesystem. One awkward aspect of the patch is that we have to manually set the superblock's errseq_t after the sync_fs callback as opposed to just returning an error from syncfs. This is because the call chain looks something like this: sys_syncfs -> sync_filesystem -> __sync_filesystem -> /* The return value is ignored here sb->s_op->sync_fs(sb) _sync_blockdev /* Where the VFS fetches the error to raise to userspace */ errseq_check_and_advance Because of this we call errseq_set every time the sync_fs callback occurs. Due to the nature of this seen / unseen dichotomy, if the upperdir is an inconsistent state at the initial mount time, overlayfs will refuse to mount, as overlayfs cannot get a snapshot of the upperdir's errseq that will increment on error until the user calls syncfs. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Fixes: c86243b090bc ("ovl: provide a mount option "volatile"") Cc: stable@vger.kernel.org Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: do not get metacopy for userxattrMiklos Szeredi2020-12-141-0/+7
| | | | | | | | | | | When looking up an inode on the lower layer for which the mounter lacks read permisison the metacopy check will fail. This causes the lookup to fail as well, even though the directory is readable. So ignore EACCES for the "userxattr" case and assume no metacopy for the unreadable file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: user xattrMiklos Szeredi2020-12-141-2/+3
| | | | | | | | | | | | | | | | | | Optionally allow using "user.overlay." namespace instead of "trusted.overlay." This is necessary for overlayfs to be able to be mounted in an unprivileged namepsace. Make the option explicit, since it makes the filesystem format be incompatible. Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the "user.overlay.redirect" or "user.overlay.metacopy" xattrs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
* ovl: check privs before decoding file handleMiklos Szeredi2020-12-141-0/+3
| | | | | | | | | | | | | CAP_DAC_READ_SEARCH is required by open_by_handle_at(2) so check it in ovl_decode_real_fh() as well to prevent privilege escalation for unprivileged overlay mounts. [Amir] If the mounter is not capable in init ns, ovl_check_origin() and ovl_verify_index() will not function as expected and this will break index and nfs export features. So check capability in ovl_can_decode_fh(), to auto disable those features. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fhPavel Tikhomirov2020-11-121-1/+2
| | | | | | | | | This will be used in next patch to be able to change uuid checks and add uuid nullification based on ofs->config.index for a new "uuid=off" mode. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: enumerate private xattrsMiklos Szeredi2020-09-021-5/+26
| | | | | | | | Instead of passing the xattr name down to the ovl_do_*xattr() accessor functions, pass an enumerated value. The enum can use the same names as the the previous #define for each xattr name. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: pass ovl_fs down to functions accessing private xattrsMiklos Szeredi2020-09-021-10/+12
| | | | | | | This paves the way for optionally using the "user.overlay." xattr namespace. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: drop flags argument from ovl_do_setxattr()Miklos Szeredi2020-09-021-1/+1
| | | | | | All callers pass zero flags to ovl_do_setxattr(). So drop this argument. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use ovl_do_getxattr() for private xattrMiklos Szeredi2020-09-021-5/+5
| | | | | | | | | | Use the convention of calling ovl_do_foo() for operations which are overlay specific. This patch is a no-op, and will have significance for supporting "user.overlay." xattr namespace. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fold ovl_getxattr() into ovl_get_redirect_xattr()Miklos Szeredi2020-09-021-36/+17
| | | | | | | | This is a partial revert (with some cleanups) of commit 993a0b2aec52 ("ovl: Do not lose security.capability xattr over metadata file copy-up"), which introduced ovl_getxattr() in the first place. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* duplicate ovl_getxattr()Miklos Szeredi2020-09-021-2/+2
| | | | | | | | | | | | | ovl_getattr() returns the value of an xattr in a kmalloced buffer. There are two callers: ovl_copy_up_meta_inode_data() (copy_up.c) ovl_get_redirect_xattr() (util.c) This patch just copies ovl_getxattr() to copy_up.c, the following patches will deal with the differences in idividual callers. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: add accessor for ofs->upper_mntMiklos Szeredi2020-06-041-3/+3
| | | | | | | Next patch will remove ofs->upper_mnt, so add an accessor function for this field. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: verify permissions in ovl_path_open()Miklos Szeredi2020-06-021-1/+26
| | | | | | | | | | | | | | | | | | | | Check permission before opening a real file. ovl_path_open() is used by readdir and copy-up routines. ovl_permission() theoretically already checked copy up permissions, but it doesn't hurt to re-do these checks during the actual copy-up. For directory reading ovl_permission() only checks access to topmost underlying layer. Readdir on a merged directory accesses layers below the topmost one as well. Permission wasn't checked for these layers. Note: modifying ovl_permission() to perform this check would be far more complex and hence more bug prone. The result is less precise permissions returned in access(2). If this turns out to be an issue, we can revisit this bug. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: whiteout inode sharingChengguang Xu2020-05-131-1/+2
| | | | | | | | | | | | | | Share inode with different whiteout files for saving inode and speeding up delete operation. If EMLINK is encountered when linking a shared whiteout, create a new one. In case of any other error, disable sharing for this super block. Note: ofs->whiteout is protected by inode lock on workdir. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: allow remote upperMiklos Szeredi2020-03-171-0/+2
| | | | | | | | | | No reason to prevent upper layer being a remote filesystem. Do the revalidation in that case, just as we already do for lower layers. This lets virtiofs be used as upper layer, which appears to be a real use case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: decide if revalidate needed on a per-dentry basisMiklos Szeredi2020-03-171-0/+15
| | | | | | | | | | | | | | | | | | | Allow completely skipping ->revalidate() on a per-dentry basis, in case the underlying layers used for a dentry do not themselves have ->revalidate(). E.g. negative overlay dentry has no underlying layers, hence revalidate is unnecessary. Or if lower layer is remote but overlay dentry is pure-upper, then can skip revalidate. The following places need to update whether the dentry needs revalidate or not: - fill-super (root dentry) - lookup - create - fh_to_dentry Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: separate detection of remote upper layer from stacked overlayMiklos Szeredi2020-03-171-2/+1
| | | | | | | | | Following patch will allow remote as upper layer, but not overlay stacked on upper layer. Separate the two concepts. This patch is doesn't change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: simplify i_ino initializationAmir Goldstein2020-03-171-18/+0
| | | | | | | | | Move i_ino initialization to ovl_inode_init() to avoid the dance of setting i_ino in ovl_fill_inode() sometimes on the first call and sometimes on the seconds call. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix out of date comment and unreachable codeAmir Goldstein2020-03-171-2/+0
| | | | | | | | ovl_inode_update() is no longer called from create object code path. Fixes: 01b39dcc9568 ("ovl: use inode_insert5() to hash a newly...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix lock in ovl_llseek()Amir Goldstein2020-03-121-2/+2
| | | | | | | | | | | | | ovl_inode_lock() is interruptible. When inode_lock() in ovl_llseek() was replaced with ovl_inode_lock(), we did not add a check for error. Fix this by making ovl_inode_lock() uninterruptible and change the existing call sites to use an _interruptible variant. Reported-by: syzbot+66a9752fa927f745385e@syzkaller.appspotmail.com Fixes: b1f9d3858f72 ("ovl: use ovl_inode_lock in ovl_llseek()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: layer is constMiklos Szeredi2020-01-241-1/+1
| | | | | | The ovl_layer struct is never modified except at initialization. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: simplify ovl_same_sb() helperAmir Goldstein2020-01-241-12/+0
| | | | | | | | | | | | | | | | No code uses the sb returned from this helper, so make it retrun a boolean and rename it to ovl_same_fs(). The xino mode is irrelevant when all layers are on same fs, so instead of describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino enabled. Create a new helper ovl_same_dev(), to use instead of the common check for (ovl_same_fs() || xinobits). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use pr_fmt auto generate prefixlijiazi2020-01-221-7/+7
| | | | | | | Use pr_fmt auto generate "overlayfs: " prefix. Signed-off-by: lijiazi <lijiazi@xiaomi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2019-06-191-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ovl: detect overlapping layersAmir Goldstein2019-05-291-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlapping overlay layers are not supported and can cause unexpected behavior, but overlayfs does not currently check or warn about these configurations. User is not supposed to specify the same directory for upper and lower dirs or for different lower layers and user is not supposed to specify directories that are descendants of each other for overlay layers, but that is exactly what this zysbot repro did: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Moving layer root directories into other layers while overlayfs is mounted could also result in unexpected behavior. This commit places "traps" in the overlay inode hash table. Those traps are dummy overlay inodes that are hashed by the layers root inodes. On mount, the hash table trap entries are used to verify that overlay layers are not overlapping. While at it, we also verify that overlay layers are not overlapping with directories "in-use" by other overlay instances as upperdir/workdir. On lookup, the trap entries are used to verify that overlay layers root inodes have not been moved into other layers after mount. Some examples: $ ./run --ov --samefs -s ... ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt mount -o bind base/lower lower mount -o bind base/upper upper mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w) $ umount mnt $ mount -t overlay none mnt ... -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w [ 94.434900] overlayfs: overlapping upperdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w [ 151.350132] overlayfs: conflicting lowerdir path mount: none is already mounted or mnt busy $ mount -t overlay none mnt ... -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w [ 201.205045] overlayfs: overlapping lowerdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w $ mv base/upper/0/ base/lower/ $ find mnt/0 mnt/0 mnt/0/w find: 'mnt/0/w/work': Too many levels of symbolic links find: 'mnt/0/u': Too many levels of symbolic links Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: Do not lose security.capability xattr over metadata file copy-upVivek Goyal2019-02-131-20/+35
| | | | | | | | | | | | | | | | | If a file has been copied up metadata only, and later data is copied up, upper loses any security.capability xattr it has (underlying filesystem clears it as upon file write). From a user's point of view, this is just a file copy-up and that should not result in losing security.capability xattr. Hence, before data copy up, save security.capability xattr (if any) and restore it on upper after data copy up is complete. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Fixes: 0c2888749363 ("ovl: A new xattr OVL_XATTR_METACOPY for file on upper") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: abstract ovl_inode lock with a helperAmir Goldstein2018-10-261-12/+13
| | | | | | | The abstraction improves code readabilty (to some). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: remove the 'locked' argument of ovl_nlink_{start,end}Amir Goldstein2018-10-261-16/+12
| | | | | | | | | | It just makes the interface strange without adding any significant value. The only case where locked is false and return value is 0 is in ovl_rename() when new is negative, so handle that case explicitly in ovl_rename(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: relax requirement for non null uuid of lower fsAmir Goldstein2018-10-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | We use uuid to associate an overlay lower file handle with a lower layer, so we can accept lower fs with null uuid as long as all lower layers with null uuid are on the same fs. This change allows enabling index and nfs_export features for the setup of single lower fs of type squashfs - squashfs supports file handles, but has a null uuid. This change also allows enabling index and nfs_export features for nested overlayfs, where the lower overlay has nfs_export enabled. Enabling the index feature with single lower squashfs fixes the unionmount-testsuite test: ./run --ov --squashfs --verify As a by-product, if, like the lower squashfs, upper fs also uses the generic export_encode_fh() implementation to export 32bit inode file handles (e.g. ext4), then the xino_auto config/module/mount option will enable unique overlay inode numbers. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix memory leak on unlink of indexed fileAmir Goldstein2018-09-241-1/+2
| | | | | | | | | | The memory leak was detected by kmemleak when running xfstests overlay/051,053 Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: Check redirect on index as wellVivek Goyal2018-07-201-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we seem to check redirect only if upperdentry is found. But it is possible that there is no upperdentry but later we found an index. We need to check redirect on index as well and set it in ovl_inode->redirect. Otherwise link code can assume that dentry does not have redirect and place a new one which breaks things. In my testing overlay/033 test started failing in xfstests. Following are the details. For example do following. $ mkdir lower upper work merged - Make lower dir with 4 links. $ echo "foo" > lower/l0.txt $ ln lower/l0.txt lower/l1.txt $ ln lower/l0.txt lower/l2.txt $ ln lower/l0.txt lower/l3.txt - Mount with index on and metacopy on. $ mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work,\ index=on,metacopy=on none merged - Link lower $ ln merged/l0.txt merged/l4.txt (This will metadata copy up of l0.txt and put an absolute redirect /l0.txt) $ echo 2 > /proc/sys/vm/drop/caches $ ls merged/l1.txt (Now l1.txt will be looked up. There is no upper dentry but there is lower dentry and index will be found. We don't check for redirect on index, hence ovl_inode->redirect will be NULL.) - Link Upper $ ln merged/l4.txt merged/l5.txt (Lookup of l4.txt will use inode from l1.txt lookup which is still in cache. It has ovl_inode->redirect NULL, hence link will put a new redirect and replace /l0.txt with /l4.txt - Drop caches. echo 2 > /proc/sys/vm/drop_caches - List l1.txt and it returns -ESTALE $ ls merged/l0.txt (It returns stale because, we found a metacopy of l0.txt in upper and it has redirect l4.txt but there is no file named l4.txt in lower layer. So lower data copy is not found and -ESTALE is returned.) So problem here is that we did not process redirect on index. Check redirect on index as well and then problem is fixed. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: Do not set dentry type ORIGIN for broken hardlinksVivek Goyal2018-07-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far this decision was easy that we had to check only for oe->numlower and if it is non-zero, we knew there is copy up origin. (For non-dir we installed origin dentry in lowerstack[0]). But we don't create ORGIN xattr for broken hardlinks (index=off). And with metacopy feature it is possible that we will install lowerstack[0] but ORIGIN xattr is not there. It is data dentry of upper metacopy dentry which has been found using regular name based lookup or using REDIRECT. So with addition of this new case, just presence of oe->numlower is not sufficient to guarantee that ORIGIN xattr is present. So to differentiate between two cases, look at OVL_CONST_INO flag. If this flag is set and upperdentry is there, that means it can be marked as type ORIGIN. OVL_CONST_INO is not set if lower hardlink is broken or will be broken over copy up. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: Treat metacopy dentries as type OVL_PATH_MERGEVivek Goyal2018-07-201-1/+2
| | | | | | | | | | | | | Right now OVL_PATH_MERGE is used only for merged directories. But conceptually, a metacopy dentry (backed by a lower data dentry) is a merged entity as well. So mark metacopy dentries as OVL_PATH_MERGE and ovl_rename() makes use of this property later to set redirect on a metacopy file. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: Add helper ovl_inode_realdata()Vivek Goyal2018-07-201-0/+12
| | | | | | | | | | Add an helper to retrieve real data inode associated with overlay inode. This helper will ignore all metacopy inodes and will return only the real inode which has data. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: Store lower data inode in ovl_inodeVivek Goyal2018-07-201-1/+11
| | | | | | | | | | | | | | | | | Right now ovl_inode stores inode pointer for lower inode. This helps with quickly getting lower inode given overlay inode (ovl_inode_lower()). Now with metadata only copy-up, we can have metacopy inode in middle layer as well and inode containing data can be different from ->lower. I need to be able to open the real file in ovl_open_realfile() and for that I need to quickly find the lower data inode. Hence store lower data inode also in ovl_inode. Also provide an helper ovl_inode_lowerdata() to access this field. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>