summaryrefslogtreecommitdiffstats
path: root/fs/nfsd
Commit message (Collapse)AuthorAgeFilesLines
* NFSD: fix error handling in NFSv4.0 callbacksOlga Kornievskaia2021-03-111-0/+1
| | | | | | | | | | | | When the server tries to do a callback and a client fails it due to authentication problems, we need the server to set callback down flag in RENEW so that client can recover. Suggested-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/linux-nfs/FB84E90A-1A03-48B3-8BF7-D9D10AC2C9FE@oracle.com/T/#t
* NFSD: fix dest to src mount in inter-server COPYOlga Kornievskaia2021-03-091-1/+1
| | | | | | | | | | | A cleanup of the inter SSC copy needs to call fput() of the source file handle to make sure that file structure is freed as well as drop the reference on the superblock to unmount the source server. Fixes: 36e1e5ba90fb ("NFSD: Fix use-after-free warning when doing inter-server copy") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Dai Ngo <dai.ngo@oracle.com>
* Revert "nfsd4: a client's own opens needn't prevent delegations"J. Bruce Fields2021-03-091-40/+14
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 94415b06eb8aed13481646026dc995f04a3a534a. That commit claimed to allow a client to get a read delegation when it was the only writer. Actually it allowed a client to get a read delegation when *any* client has a write open! The main problem is that it's depending on nfs4_clnt_odstate structures that are actually only maintained for pnfs exports. This causes clients to miss writes performed by other clients, even when there have been intervening closes and opens, violating close-to-open cache consistency. We can do this a different way, but first we should just revert this. I've added pynfs 4.1 test DELEG19 to test for this, as I should have done originally! Cc: stable@vger.kernel.org Reported-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* Revert "nfsd4: remove check_conflicting_opens warning"J. Bruce Fields2021-03-091-0/+1
| | | | | | | | | | This reverts commit 50747dd5e47b "nfsd4: remove check_conflicting_opens warning", as a prerequisite for reverting 94415b06eb8a, which has a serious bug. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* nfsd: don't abort copies earlyJ. Bruce Fields2021-03-061-1/+1
| | | | | | | | | | | | The typical result of the backwards comparison here is that the source server in a server-to-server copy will return BAD_STATEID within a few seconds of the copy starting, instead of giving the copy a full lease period, so the copy_file_range() call will end up unnecessarily returning a short read. Fixes: 624322f1adc5 "NFSD add COPY_NOTIFY operation" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* fs: nfsd: fix kconfig dependency warning for NFSD_V4Julian Braha2021-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | When NFSD_V4 is enabled and CRYPTO is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for CRYPTO_SHA256 Depends on [n]: CRYPTO [=n] Selected by [y]: - NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y] WARNING: unmet direct dependencies detected for CRYPTO_MD5 Depends on [n]: CRYPTO [=n] Selected by [y]: - NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y] This is because NFSD_V4 selects CRYPTO_MD5 and CRYPTO_SHA256, without depending on or selecting CRYPTO, despite those config options being subordinate to CRYPTO. Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* nfsd: Don't keep looking up unhashed files in the nfsd file cacheTrond Myklebust2021-03-061-0/+2
| | | | | | | | | | | If a file is unhashed, then we're going to reject it anyway and retry, so make sure we skip it when we're doing the RCU lockless lookup. This avoids a number of unnecessary nfserr_jukebox returns from nfsd_file_acquire() Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds2021-02-238-33/+57
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
| * nfs: do not export idmapped mountsChristian Brauner2021-01-241-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Prevent nfs from exporting idmapped mounts until we have ported it to support exporting idmapped mounts. Link: https://lore.kernel.org/linux-api/20210123130958.3t6kvgkl634njpsm@wittgenstein Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: "J. Bruce Fields" <bfields@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * namei: prepare for idmapped mountsChristian Brauner2021-01-242-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The various vfs_*() helpers are called by filesystems or by the vfs itself to perform core operations such as create, link, mkdir, mknod, rename, rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace and pass it down. Afterwards the checks and operations are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * namei: introduce struct renamedataChristian Brauner2021-01-241-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to handle idmapped mounts we will extend the vfs rename helper to take two new arguments in follow up patches. Since this operations already takes a bunch of arguments add a simple struct renamedata and make the current helper use it before we extend it. Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * xattr: handle idmapped mountsTycho Andersen2021-01-241-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When interacting with extended attributes the vfs verifies that the caller is privileged over the inode with which the extended attribute is associated. For posix access and posix default extended attributes a uid or gid can be stored on-disk. Let the functions handle posix extended attributes on idmapped mounts. If the inode is accessed through an idmapped mount we need to map it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. This has no effect for e.g. security xattrs since they don't store uids or gids and don't perform permission checks on them like posix acls do. Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * acl: handle idmapped mountsChristian Brauner2021-01-243-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The posix acl permission checking helpers determine whether a caller is privileged over an inode according to the acls associated with the inode. Add helpers that make it possible to handle acls on idmapped mounts. The vfs and the filesystems targeted by this first iteration make use of posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to translate basic posix access and default permissions such as the ACL_USER and ACL_GROUP type according to the initial user namespace (or the superblock's user namespace) to and from the caller's current user namespace. Adapt these two helpers to handle idmapped mounts whereby we either map from or into the mount's user namespace depending on in which direction we're translating. Similarly, cap_convert_nscap() is used by the vfs to translate user namespace and non-user namespace aware filesystem capabilities from the superblock's user namespace to the caller's user namespace. Enable it to handle idmapped mounts by accounting for the mount's user namespace. In addition the fileystems targeted in the first iteration of this patch series make use of the posix_acl_chmod() and, posix_acl_update_mode() helpers. Both helpers perform permission checks on the target inode. Let them handle idmapped mounts. These two helpers are called when posix acls are set by the respective filesystems to handle this case we extend the ->set() method to take an additional user namespace argument to pass the mount's user namespace down. Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * attr: handle idmapped mountsChristian Brauner2021-01-242-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
| * namei: make permission helpers idmapped mount awareChristian Brauner2021-01-242-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | Merge tag 'nfsd-5.12-1' of ↵Linus Torvalds2021-02-221-7/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Here are a few additional NFSD commits for the merge window: Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat" * tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Further clean up svc_tcp_sendmsg() SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg() SUNRPC: Use TCP_CORK to optimise send performance on the server svcrdma: Hold private mutex while invoking rdma_accept() nfsd: register pernet ops last, unregister first
| * | nfsd: register pernet ops last, unregister firstJ. Bruce Fields2021-02-151-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These pernet operations may depend on stuff set up or torn down in the module init/exit functions. And they may be called at any time in between. So it makes more sense for them to be the last to be registered in the init function, and the first to be unregistered in the exit function. In particular, without this, the drc slab is being destroyed before all the per-net drcs are shut down, resulting in an "Objects remaining in nfsd_drc on __kmem_cache_shutdown()" warning in exit_nfsd. Reported-by: Zhi Li <yieli@redhat.com> Fixes: 3ba75830ce17 "nfsd4: drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds2021-02-211-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
| * | | block: remove unnecessary argument from blk_execute_rqGuoqing Jiang2021-01-241-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can remove 'q' from blk_execute_rq as well after the previous change in blk_execute_rq_nowait. And more importantly it never really was needed to start with given that we can trivial derive it from struct request. Cc: linux-scsi@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: linux-ide@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-nfs@vger.kernel.org Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | nfsd: skip some unnecessary stats in the v4 caseJ. Bruce Fields2021-01-301-17/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the typical case of v4 and an i_version-supporting filesystem, we can skip a stat which is only required to fake up a change attribute from ctime. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfs: use change attribute for NFS re-exportsJ. Bruce Fields2021-01-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When exporting NFS, we may as well use the real change attribute returned by the original server instead of faking up a change attribute from the ctime. Note we can't do that by setting I_VERSION--that would also turn on the logic in iversion.h which treats the lower bit specially, and that doesn't make sense for NFS. So instead we define a new export operation for filesystems like NFS that want to manage the change attribute themselves. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSv4_2: SSC helper should use its own config.Dai Ngo2021-01-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently NFSv4_2 SSC helper, nfs_ssc, incorrectly uses GRACE_PERIOD as its config. Fix by adding new config NFS_V4_2_SSC_HELPER which depends on NFS_V4_2 and is automatically selected when NFSD_V4 is enabled. Also removed the file name from a comment in nfs_ssc.c. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: cstate->session->se_client -> cstate->clpJ. Bruce Fields2021-01-282-11/+10
| | | | | | | | | | | | | | | | | | | | | I'm not sure why we're writing this out the hard way in so many places. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: simplify nfsd4_check_open_reclaimJ. Bruce Fields2021-01-283-19/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | The set_client() was already taken care of by process_open1(). The comments here are mostly redundant with the code. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: remove unused set_client argumentJ. Bruce Fields2021-01-281-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Every caller is setting this argument to false, so we don't need it. Also cut this comment a bit and remove an unnecessary warning. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: find_cpntf_state cleanupJ. Bruce Fields2021-01-251-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I think this unusual use of struct compound_state could cause confusion. It's not that much more complicated just to open-code this stateid lookup. The only change in behavior should be a different error return in the case the copy is using a source stateid that is a revoked delegation, but I doubt that matters. Signed-off-by: J. Bruce Fields <bfields@redhat.com> [ cel: squashed in fix reported by Coverity ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: refactor set_clientJ. Bruce Fields2021-01-251-16/+16
| | | | | | | | | | | | | | | | | | | | | This'll be useful elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: rename lookup_clientid->set_clientJ. Bruce Fields2021-01-251-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | I think this is a better name, and I'm going to reuse elsewhere the code that does the lookup itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: simplify nfsd_renewJ. Bruce Fields2021-01-251-6/+3
| | | | | | | | | | | | | | | | | | | | | You can take the single-exit thing too far, I think. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: simplify process_lockJ. Bruce Fields2021-01-251-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly, this STALE_CLIENTID check is already handled by: nfs4_preprocess_confirmed_seqid_op()-> nfs4_preprocess_seqid_op()-> nfsd4_lookup_stateid()-> set_client()-> STALE_CLIENTID() (This may cause it to return a different error in some cases where there are multiple things wrong; pynfs test SEQ10 regressed on this commit because of that, but I think that's the test's fault, and I've fixed it separately.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd4: simplify process_lookup1J. Bruce Fields2021-01-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This STALE_CLIENTID check is redundant with the one in lookup_clientid(). There's a difference in behavior is in case of memory allocation failure, which I think isn't a big deal. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: report per-export statsAmir Goldstein2021-01-257-16/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Collect some nfsd stats per export in addition to the global stats. A new nfsdfs export_stats file is created. It uses the same ops as the exports file to iterate the export entries and we use the file's name to determine the reported info per export. For example: $ cat /proc/fs/nfsd/export_stats # Version 1.1 # Path Client Start-time # Stats /test localhost 92 fh_stale: 0 io_read: 9 io_write: 1 Every export entry reports the start time when stats collection started, so stats collecting scripts can know if stats where reset between samples. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: protect concurrent access to nfsd stats countersAmir Goldstein2021-01-258-53/+192
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsd stats counters can be updated by concurrent nfsd threads without any protection. Convert some nfsd_stats and nfsd_net struct members to use percpu counters. The longest_chain* members of struct nfsd_net remain unprotected. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | nfsd: remove unused stats countersAmir Goldstein2021-01-252-35/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 501cb1849f86 ("nfsd: rip out the raparms cache") removed the code that updates read-ahead cache stats counters, commit 8bbfa9f3889b ("knfsd: remove the nfsd thread busy histogram") removed code that updates the thread busy stats counters back in 2009 and code that updated filehandle cache stats was removed back in 2002. Remove the unused stats counters from nfsd_stats struct and print hardcoded zeros in /proc/net/rpc/nfsd. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Clean up after updating NFSv3 ACL decodersChuck Lever2021-01-252-22/+0
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_streamChuck Lever2021-01-251-18/+13
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv3 GETACL argument decoder to use struct xdr_streamChuck Lever2021-01-253-6/+17
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Clean up after updating NFSv2 ACL decodersChuck Lever2021-01-252-19/+0
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 ACL ACCESS argument decoder to use struct xdr_streamChuck Lever2021-01-251-5/+6
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 ACL GETATTR argument decoder to use struct xdr_streamChuck Lever2021-01-251-11/+1
| | | | | | | | | | | | | | | | | | | | | Since the ACL GETATTR procedure is the same as the normal GETATTR procedure, simply re-use nfssvc_decode_fhandleargs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_streamChuck Lever2021-01-252-18/+13
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 GETACL argument decoder to use struct xdr_streamChuck Lever2021-01-254-7/+17
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Remove argument length checking in nfsd_dispatch()Chuck Lever2021-01-251-34/+0
| | | | | | | | | | | | | | | | | | | | | | | | Now that the argument decoders for NFSv2 and NFSv3 use the xdr_stream mechanism, the version-specific length checking logic in nfsd_dispatch() is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 SYMLINK argument decoder to use struct xdr_streamChuck Lever2021-01-251-103/+10
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 CREATE argument decoder to use struct xdr_streamChuck Lever2021-01-251-6/+4
| | | | | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 SETATTR argument decoder to use struct xdr_streamChuck Lever2021-01-251-6/+76
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 LINK argument decoder to use struct xdr_streamChuck Lever2021-01-251-6/+4
| | | | | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 RENAME argument decoder to use struct xdr_streamChuck Lever2021-01-251-7/+5
| | | | | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update NFSv2 diropargs decoding to use struct xdr_streamChuck Lever2021-01-251-5/+34
| | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFSD: Update the NFSv2 READDIR argument decoder to use struct xdr_streamChuck Lever2021-01-251-5/+7
| | | | | | | | | | | | | | | | | | | | | As an additional clean up, move code not related to XDR decoding into readdir's .pc_func call out. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>