linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	smack: Record transmuting in smk_transmuted	Roberto Sassu	2023-05-11	2	-12/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	smack_dentry_create_files_as() determines whether transmuting should occur based on the label of the parent directory the new inode will be added to, and not the label of the directory where it is created. This helps for example to do transmuting on overlayfs, since the latter first creates the inode in the working directory, and then moves it to the correct destination. However, despite smack_dentry_create_files_as() provides the correct label, smack_inode_init_security() does not know from passed information whether or not transmuting occurred. Without this information, smack_inode_init_security() cannot set SMK_INODE_CHANGED in smk_flags, which will result in the SMACK64TRANSMUTE xattr not being set in smack_d_instantiate(). Thus, add the smk_transmuted field to the task_smack structure, and set it in smack_dentry_create_files_as() to smk_task if transmuting occurred. If smk_task is equal to smk_transmuted in smack_inode_init_security(), act as if transmuting was successful but without taking the label from the parent directory (the inode label was already set correctly from the current credentials in smack_inode_alloc_security()). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	smack: Retrieve transmuting information in smack_inode_getsecurity()	Roberto Sassu	2023-05-11	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \|	Enhance smack_inode_getsecurity() to retrieve the value for SMACK64TRANSMUTE from the inode security blob, similarly to SMACK64. This helps to display accurate values in the situation where the security labels come from mount options and not from xattrs. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Merge tag 'Smack-for-6.4' of https://github.com/cschaufler/smack-next	Linus Torvalds	2023-04-24	1	-40/+24
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull smack updates from Casey Schaufler: "There are two changes, one small and one more substantial: - Remove of an unnecessary cast - The mount option processing introduced with the mount rework makes copies of mount option values. There is no good reason to make copies of Smack labels, as they are maintained on a list and never removed. The code now uses pointers to entries on the list, reducing processing time and memory use" * tag 'Smack-for-6.4' of https://github.com/cschaufler/smack-next: Smack: Improve mount process memory use smack_lsm: remove unnecessary type casting
\| *	Smack: Improve mount process memory use	Casey Schaufler	2023-04-05	1	-39/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing mount processing code in Smack makes many unnecessary copies of Smack labels. Because Smack labels never go away once imported it is safe to use pointers to them rather than copies. Replace the use of copies of label names to pointers to the global label list entries. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| *	smack_lsm: remove unnecessary type casting	XU pengfei	2023-03-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unnecessary type casting. The type of inode variable is struct inode *, so no type casting required. Signed-off-by: XU pengfei <xupengfei@nfschina.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
* \|	selinux: remove the runtime disable functionality	Paul Moore	2023-03-20	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After working with the larger SELinux-based distros for several years, we're finally at a place where we can disable the SELinux runtime disable functionality. The existing kernel deprecation notice explains the functionality and why we want to remove it: The selinuxfs "disable" node allows SELinux to be disabled at runtime prior to a policy being loaded into the kernel. If disabled via this mechanism, SELinux will remain disabled until the system is rebooted. The preferred method of disabling SELinux is via the "selinux=0" boot parameter, but the selinuxfs "disable" node was created to make it easier for systems with primitive bootloaders that did not allow for easy modification of the kernel command line. Unfortunately, allowing for SELinux to be disabled at runtime makes it difficult to secure the kernel's LSM hooks using the "__ro_after_init" feature. It is that last sentence, mentioning the '__ro_after_init' hardening, which is the real motivation for this change, and if you look at the diffstat you'll see that the impact of this patch reaches across all the different LSMs, helping prevent tampering at the LSM hook level. From a SELinux perspective, it is important to note that if you continue to disable SELinux via "/etc/selinux/config" it may appear that SELinux is disabled, but it is simply in an uninitialized state. If you load a policy with `load_policy -i`, you will see SELinux come alive just as if you had loaded the policy during early-boot. It is also worth noting that the "/sys/fs/selinux/disable" file is always writable now, regardless of the Kconfig settings, but writing to the file has no effect on the system, other than to display an error on the console if a non-zero/true value is written. Finally, in the several years where we have been working on deprecating this functionality, there has only been one instance of someone mentioning any user visible breakage. In this particular case it was an individual's kernel test system, and the workaround documented in the deprecation notice ("selinux=0" on the kernel command line) resolved the issue without problem. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
*	Merge tag 'Smack-for-6.3' of https://github.com/cschaufler/smack-next	Linus Torvalds	2023-02-22	1	-3/+14
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull smack update from Casey Schaufler: "One fix for resetting CIPSO labeling" * tag 'Smack-for-6.3' of https://github.com/cschaufler/smack-next: smackfs: Added check catlen
\| *	smackfs: Added check catlen	Denis Arefev	2023-02-21	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the catlen is 0, the memory for the netlbl_lsm_catmap structure must be allocated anyway, otherwise the check of such rules is not completed correctly. Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
* \|	fs: port acl to mnt_idmap	Christian Brauner	2023-01-19	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* \|	fs: port xattr to mnt_idmap	Christian Brauner	2023-01-19	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* \|	fs: port ->permission() to pass mnt_idmap	Christian Brauner	2023-01-19	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
*	Merge tag 'lsm-pr-20221212' of ↵	Linus Torvalds	2022-12-13	1	-9/+10
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Improve the error handling in the device cgroup such that memory allocation failures when updating the access policy do not potentially alter the policy. - Some minor fixes to reiserfs to ensure that it properly releases LSM-related xattr values. - Update the security_socket_getpeersec_stream() LSM hook to take sockptr_t values. Previously the net/BPF folks updated the getsockopt code in the network stack to leverage the sockptr_t type to make it easier to pass both kernel and __user pointers, but unfortunately when they did so they didn't convert the LSM hook. While there was/is no immediate risk by not converting the LSM hook, it seems like this is a mistake waiting to happen so this patch proactively does the LSM hook conversion. - Convert vfs_getxattr_alloc() to return an int instead of a ssize_t and cleanup the callers. Internally the function was never going to return anything larger than an int and the callers were doing some very odd things casting the return value; this patch fixes all that and helps bring a bit of sanity to vfs_getxattr_alloc() and its callers. - More verbose, and helpful, LSM debug output when the system is booted with "lsm.debug" on the command line. There are examples in the commit description, but the quick summary is that this patch provides better information about which LSMs are enabled and the ordering in which they are processed. - General comment and kernel-doc fixes and cleanups. * tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: Fix description of fs_context_parse_param lsm: Add/fix return values in lsm_hooks.h and fix formatting lsm: Clarify documentation of vm_enough_memory hook reiserfs: Add missing calls to reiserfs_security_free() lsm,fs: fix vfs_getxattr_alloc() return type and caller error paths device_cgroup: Roll back to original exceptions after copy failure LSM: Better reporting of actual LSMs at boot lsm: make security_socket_getpeersec_stream() sockptr_t safe audit: Fix some kernel-doc warnings lsm: remove obsoleted comments for security hooks fs: edit a comment made in bad taste
\| *	lsm: make security_socket_getpeersec_stream() sockptr_t safe	Paul Moore	2022-11-04	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 4ff09db1b79b ("bpf: net: Change sk_getsockopt() to take the sockptr_t argument") made it possible to call sk_getsockopt() with both user and kernel address space buffers through the use of the sockptr_t type. Unfortunately at the time of conversion the security_socket_getpeersec_stream() LSM hook was written to only accept userspace buffers, and in a desire to avoid having to change the LSM hook the commit author simply passed the sockptr_t's userspace buffer pointer. Since the only sk_getsockopt() callers at the time of conversion which used kernel sockptr_t buffers did not allow SO_PEERSEC, and hence the security_socket_getpeersec_stream() hook, this was acceptable but also very fragile as future changes presented the possibility of silently passing kernel space pointers to the LSM hook. There are several ways to protect against this, including careful code review of future commits, but since relying on code review to catch bugs is a recipe for disaster and the upstream eBPF maintainer is "strongly against defensive programming", this patch updates the LSM hook, and all of the implementations to support sockptr_t and safely handle both user and kernel space buffers. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
* \|	smack: implement get, set and remove acl hook	Christian Brauner	2022-10-20	1	-0/+71
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. So far posix acls were passed as a void blob to the security and integrity modules. Some of them like evm then proceed to interpret the void pointer and convert it into the kernel internal struct posix acl representation to perform their integrity checking magic. This is obviously pretty problematic as that requires knowledge that only the vfs is guaranteed to have and has lead to various bugs. Add a proper security hook for setting posix acls and pass down the posix acls in their appropriate vfs format instead of hacking it through a void pointer stored in the uapi format. I spent considerate time in the security module infrastructure and audited all codepaths. Smack has no restrictions based on the posix acl values passed through it. The capability hook doesn't need to be called either because it only has restrictions on security.* xattrs. So these all becomes very simple hooks for smack. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
*	Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds	2022-10-06	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull vfs constification updates from Al Viro: "whack-a-mole: constifying struct path " tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs: constify path spufs: constify path nd_jump_link(): constify path audit_init_parent(): constify path __io_setxattr(): constify path do_proc_readlink(): constify path overlayfs: constify path fs/notify: constify path may_linkat(): constify path do_sys_name_to_handle(): constify path ->getprocattr(): attribute name is const char *, TYVM...
\| *	->getprocattr(): attribute name is const char *, TYVM...	Al Viro	2022-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cast of ->d_name.name to char * is completely wrong - nothing is allowed to modify its contents. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next	Linus Torvalds	2022-10-03	2	-12/+17
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull smack updates from Casey Schaufler: "Two minor code clean-ups: one removes constants left over from the old mount API, while the other gets rid of an unneeded variable. The other change fixes a flaw in handling IPv6 labeling" * tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next: smack: cleanup obsolete mount option flags smack: lsm: remove the unneeded result variable SMACK: Add sk_clone_security LSM hook
\| * \|	smack: cleanup obsolete mount option flags	Xiu Jianfeng	2022-09-27	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These mount option flags are obsolete since commit 12085b14a444 ("smack: switch to private smack_mnt_opts"), remove them. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	smack: lsm: remove the unneeded result variable	Xu Panda	2022-09-27	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return the value smk_ptrace_rule_check() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	SMACK: Add sk_clone_security LSM hook	Lontke Michael	2022-09-27	1	-0/+16
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using smk_of_current() during sk_alloc_security hook leads in rare cases to a faulty initialization of the security context of the created socket. By adding the LSM hook sk_clone_security to SMACK this initialization fault is corrected by copying the security context of the old socket pointer to the newly cloned one. Co-authored-by: Martin Ostertag: <martin.ostertag@elektrobit.com> Signed-off-by: Lontke Michael <michael.lontke@elektrobit.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
* /	Smack: Provide read control for io_uring_cmd	Casey Schaufler	2022-08-26	1	-0/+32
\|/ \| \| \| \| \| \| \| \| \| \| \|	Limit io_uring "cmd" options to files for which the caller has Smack read access. There may be cases where the cmd option may be closer to a write access than a read, but there is no way to make that determination. Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
*	smack: Remove the redundant lsm_inode_alloc	Xiu Jianfeng	2022-08-01	1	-7/+0
\| \| \| \| \| \| \| \| \| \|	It's not possible for inode->i_security to be NULL here because every inode will call inode_init_always and then lsm_inode_alloc to alloc memory for inode->security, this is what LSM infrastructure management do, so remove this redundant code. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	smack: Replace kzalloc + strncpy with kstrndup	GONG, Ruiqi	2022-08-01	1	-5/+2
\| \| \| \| \| \| \| \| \| \|	Simplify the code by using kstrndup instead of kzalloc and strncpy in smk_parse_smack(), which meanwhile remove strncpy as [1] suggests. [1]: https://github.com/KSPP/linux/issues/90 Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Merge tag 'pull-18-rc1-work.mount' of ↵	Linus Torvalds	2022-06-04	1	-0/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount handling updates from Al Viro: "Cleanups (and one fix) around struct mount handling. The fix is usermode_driver.c one - once you've done kern_mount(), you must kern_unmount(); simple mntput() will end up with a leak. Several failure exits in there messed up that way... In practice you won't hit those particular failure exits without fault injection, though" * tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move mount-related externs from fs.h to mount.h blob_to_mnt(): kern_unmount() is needed to undo kern_mount() m->mnt_root->d_inode->i_sb is a weird way to spell m->mnt_sb... linux/mount.h: trim includes uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)
\| *	move mount-related externs from fs.h to mount.h	Al Viro	2022-05-19	1	-0/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	smack: Remove redundant assignments	Michal Orzel	2022-05-23	1	-1/+0
\|/ \| \| \| \| \| \| \| \| \|	Get rid of redundant assignments which end up in values not being read either because they are overwritten or the function ends. Reported by clang-tidy [deadcode.DeadStores] Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Fix incorrect type in assignment of ipv6 port for audit	Casey Schaufler	2022-02-28	1	-1/+1
\| \| \| \| \| \| \| \|	Remove inappropriate use of ntohs() and assign the port value directly. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()	Paul Moore	2021-11-22	2	-21/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The security_task_getsecid_subj() LSM hook invites misuse by allowing callers to specify a task even though the hook is only safe when the current task is referenced. Fix this by removing the task_struct argument to the hook, requiring LSM implementations to use the current task. While we are changing the hook declaration we also rename the function to security_current_getsecid_subj() in an effort to reinforce that the hook captures the subjective credentials of the current task and not an arbitrary task on the system. Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
*	Merge tag 'selinux-pr-20211101' of ↵	Linus Torvalds	2021-11-01	1	-0/+46
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add LSM/SELinux/Smack controls and auditing for io-uring. As usual, the individual commit descriptions have more detail, but we were basically missing two things which we're adding here: + establishment of a proper audit context so that auditing of io-uring ops works similarly to how it does for syscalls (with some io-uring additions because io-uring ops are not syscalls) + additional LSM hooks to enable access control points for some of the more unusual io-uring features, e.g. credential overrides. The additional audit callouts and LSM hooks were done in conjunction with the io-uring folks, based on conversations and RFC patches earlier in the year. - Fixup the binder credential handling so that the proper credentials are used in the LSM hooks; the commit description and the code comment which is removed in these patches are helpful to understand the background and why this is the proper fix. - Enable SELinux genfscon policy support for securityfs, allowing improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA. * tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: security: Return xattr name from security_dentry_init_security() selinux: fix a sock regression in selinux_ip_postroute_compat() binder: use cred instead of task for getsecid binder: use cred instead of task for selinux checks binder: use euid from cred instead of using task LSM: Avoid warnings about potentially unused hook variables selinux: fix all of the W=1 build warnings selinux: make better use of the nf_hook_state passed to the NF hooks selinux: fix race condition when computing ocontext SIDs selinux: remove unneeded ipv6 hook wrappers selinux: remove the SELinux lockdown implementation selinux: enable genfscon labeling for securityfs Smack: Brutalist io_uring support selinux: add support for the io_uring access controls lsm,io_uring: add LSM hooks to io_uring io_uring: convert io_uring to the secure anon inode interface fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() audit: add filtering for io_uring records audit,io_uring,io-wq: add some basic audit support to io_uring audit: prepare audit_context for use in calling contexts beyond syscalls
\| *	Smack: Brutalist io_uring support	Casey Schaufler	2021-09-19	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add Smack privilege checks for io_uring. Use CAP_MAC_OVERRIDE for the override_creds case and CAP_MAC_ADMIN for creating a polling thread. These choices are based on conjecture regarding the intent of the surrounding code. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: make the smack_uring_* funcs static, remove debug code] Signed-off-by: Paul Moore <paul@paul-moore.com>
* \|	Merge tag 'Smack-for-5.16' of https://github.com/cschaufler/smack-next	Linus Torvalds	2021-11-01	3	-44/+34
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull smack updates from Casey Schaufler: "Multiple corrections to smackfs: - a change for overlayfs support that corrects the initial attributes on created files - code clean-up for netlabel processing - several fixes in smackfs for a variety of reasons - Errors reported by W=1 have been addressed All told, nothing challenging" * tag 'Smack-for-5.16' of https://github.com/cschaufler/smack-next: smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi smackfs: use __GFP_NOFAIL for smk_cipso_doi() Smack: fix W=1 build warnings smack: remove duplicated hook function Smack:- Use overlay inode label in smack_inode_copy_up() smack: Guard smack_ipv6_lock definition within a SMACK_IPV6_PORT_LABELING block smackfs: Fix use-after-free in netlbl_catmap_walk()
\| * \|	smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi	Tetsuo Handa	2021-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	syzbot is reporting UAF at cipso_v4_doi_search() [1], for smk_cipso_doi() is calling kfree() without removing from the cipso_v4_doi_list list after netlbl_cfg_cipsov4_map_add() returned an error. We need to use netlbl_cfg_cipsov4_del() in order to remove from the list and wait for RCU grace period before kfree(). Link: https://syzkaller.appspot.com/bug?extid=93dba5b91f0fed312cbd [1] Reported-by: syzbot <syzbot+93dba5b91f0fed312cbd@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6c2e8ac0953fccdd ("netlabel: Update kernel configuration API") Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	smackfs: use __GFP_NOFAIL for smk_cipso_doi()	Tetsuo Handa	2021-10-22	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	syzbot is reporting kernel panic at smk_cipso_doi() due to memory allocation fault injection [1]. The reason for need to use panic() was not explained. But since no fix was proposed for 18 months, for now let's use __GFP_NOFAIL for utilizing syzbot resource on other bugs. Link: https://syzkaller.appspot.com/bug?extid=89731ccb6fec15ce1c22 [1] Reported-by: syzbot <syzbot+89731ccb6fec15ce1c22@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	Smack: fix W=1 build warnings	Casey Schaufler	2021-10-13	1	-12/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A couple of functions had malformed comment blocks. Namespace parameters were added without updating the comment blocks. These are all repaired in the Smack code, so "% make W=1 security/smack" is warning free. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	smack: remove duplicated hook function	Florian Westphal	2021-10-12	1	-23/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipv4 and ipv6 hook functions are identical, remove one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	Smack:- Use overlay inode label in smack_inode_copy_up()	Vishal Goel	2021-09-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently in "smack_inode_copy_up()" function, process label is changed with the label on parent inode. Due to which, process is assigned directory label and whatever file or directory created by the process are also getting directory label which is wrong label. Changes has been done to use label of overlay inode instead of parent inode. Signed-off-by: Vishal Goel <vishal.goel@samsung.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	smack: Guard smack_ipv6_lock definition within a SMACK_IPV6_PORT_LABELING block	Sebastian Andrzej Siewior	2021-09-24	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mutex smack_ipv6_lock is only used with the SMACK_IPV6_PORT_LABELING block but its definition is outside of the block. This leads to a defined-but-not-used warning on PREEMPT_RT. Moving smack_ipv6_lock down to the block where it is used where it used raises the question why is smk_ipv6_port_list read if nothing is added to it. Turns out, only smk_ipv6_port_check() is using it outside of an ifdef SMACK_IPV6_PORT_LABELING block. However two of three caller invoke smk_ipv6_port_check() from a ifdef block and only one is using __is_defined() macro which requires the function and smk_ipv6_port_list to be around. Put the lock and list inside an ifdef SMACK_IPV6_PORT_LABELING block to avoid the warning regarding unused mutex. Extend the ifdef-block to also cover smk_ipv6_port_check(). Make smack_socket_connect() use ifdef instead of __is_defined() to avoid complains about missing function. Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| * \|	smackfs: Fix use-after-free in netlbl_catmap_walk()	Pawan Gupta	2021-09-15	1	-1/+4
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Syzkaller reported use-after-free bug as described in [1]. The bug is triggered when smk_set_cipso() tries to free stale category bitmaps while there are concurrent reader(s) using the same bitmaps. Wait for RCU grace period to finish before freeing the category bitmaps in smk_set_cipso(). This makes sure that there are no more readers using the stale bitmaps and freeing them should be safe. [1] https://lore.kernel.org/netdev/000000000000a814c505ca657a4e@google.com/ Reported-by: syzbot+3f91de0b813cc3d19a80@syzkaller.appspotmail.com Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
* /	selinux,smack: fix subjective/objective credential use mixups	Paul Moore	2021-09-23	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jann Horn reported a problem with commit eb1231f73c4d ("selinux: clarify task subjective and objective credentials") where some LSM hooks were attempting to access the subjective credentials of a task other than the current task. Generally speaking, it is not safe to access another task's subjective credentials and doing so can cause a number of problems. Further, while looking into the problem, I realized that Smack was suffering from a similar problem brought about by a similar commit 1fb057dcde11 ("smack: differentiate between subjective and objective task credentials"). This patch addresses this problem by restoring the use of the task's objective credentials in those cases where the task is other than the current executing task. Not only does this resolve the problem reported by Jann, it is arguably the correct thing to do in these cases. Cc: stable@vger.kernel.org Fixes: eb1231f73c4d ("selinux: clarify task subjective and objective credentials") Fixes: 1fb057dcde11 ("smack: differentiate between subjective and objective task credentials") Reported-by: Jann Horn <jannh@google.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
*	smack: mark 'smack_enabled' global variable as __initdata	Austin Kim	2021-07-20	2	-2/+2
\| \| \| \| \| \| \| \|	Mark 'smack_enabled' as __initdata since it is only used during initialization code. Signed-off-by: Austin Kim <austin.kim@lge.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Smack: Fix wrong semantics in smk_access_entry()	Tianjia Zhang	2021-07-20	1	-9/+8
\| \| \| \| \| \| \| \| \| \|	In the smk_access_entry() function, if no matching rule is found in the rust_list, a negative error code will be used to perform bit operations with the MAY_ enumeration value. This is semantically wrong. This patch fixes this issue. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Smack: fix doc warning	ChenXiaoSong	2021-06-08	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Fix gcc W=1 warning: security/smack/smack_access.c:342: warning: Function parameter or member 'ad' not described in 'smack_log' security/smack/smack_access.c:403: warning: Function parameter or member 'skp' not described in 'smk_insert_entry' security/smack/smack_access.c:487: warning: Function parameter or member 'level' not described in 'smk_netlbl_mls' security/smack/smack_access.c:487: warning: Function parameter or member 'len' not described in 'smk_netlbl_mls' Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Revert "Smack: Handle io_uring kernel thread privileges"	Jens Axboe	2021-05-18	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 942cb357ae7d9249088e3687ee6a00ed2745a0c7. The io_uring PF_IO_WORKER threads no longer have PF_KTHREAD set, so no need to special case them for credential checks. Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	smackfs: restrict bytes count in smk_set_cipso()	Tetsuo Handa	2021-05-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oops, I failed to update subject line. From 07571157c91b98ce1a4aa70967531e64b78e8346 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Mon, 12 Apr 2021 22:25:06 +0900 Subject: [PATCH] smackfs: restrict bytes count in smk_set_cipso() Commit 7ef4c19d245f3dc2 ("smackfs: restrict bytes count in smackfs write functions") missed that count > SMK_CIPSOMAX check applies to only format == SMK_FIXED24_FMT case. Reported-by: syzbot <syzbot+77c53db50c9fff774e8e@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	security/smack/: fix misspellings using codespell tool	Xiong Zhenwu	2021-05-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	A typo is found out by codespell tool in 383th line of smackfs.c: $ codespell ./security/smack/ ./smackfs.c:383: numer ==> number Fix a typo found by codespell. Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Merge tag 'landlock_v34' of ↵	Linus Torvalds	2021-05-01	2	-26/+15
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull Landlock LSM from James Morris: "Add Landlock, a new LSM from Mickaël Salaün. Briefly, Landlock provides for unprivileged application sandboxing. From Mickaël's cover letter: "The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]" The cover letter and v34 posting is here: https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/ See also: https://landlock.io/ This code has had extensive design discussion and review over several years" Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1] Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2] * tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: landlock: Enable user space to infer supported features landlock: Add user and kernel documentation samples/landlock: Add a sandbox manager example selftests/landlock: Add user space tests landlock: Add syscall implementations arch: Wire up Landlock syscalls fs,security: Add sb_delete hook landlock: Support filesystem access-control LSM: Infrastructure management of the superblock landlock: Add ptrace restrictions landlock: Set up the security framework and manage credentials landlock: Add ruleset and domain management landlock: Add object management
\| *	LSM: Infrastructure management of the superblock	Casey Schaufler	2021-04-22	2	-26/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move management of the superblock->sb_security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules, the modules tell the infrastructure how much space is required, and the space is allocated there. Cc: John Johansen <john.johansen@canonical.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210422154123.13086-6-mic@digikod.net Signed-off-by: James Morris <jamorris@linux.microsoft.com>
* \|	smack: differentiate between subjective and objective task credentials	Paul Moore	2021-03-22	2	-14/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the split of the security_task_getsecid() into subjective and objective variants it's time to update Smack to ensure it is using the correct task creds. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
* \|	lsm: separate security_task_getsecid() into subjective and objective variants	Paul Moore	2021-03-22	1	-1/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Of the three LSMs that implement the security_task_getsecid() LSM hook, all three LSMs provide the task's objective security credentials. This turns out to be unfortunate as most of the hook's callers seem to expect the task's subjective credentials, although a small handful of callers do correctly expect the objective credentials. This patch is the first step towards fixing the problem: it splits the existing security_task_getsecid() hook into two variants, one for the subjective creds, one for the objective creds. void security_task_getsecid_subj(struct task_struct p, u32 secid); void security_task_getsecid_obj(struct task_struct p, u32 secid); While this patch does fix all of the callers to use the correct variant, in order to keep this patch focused on the callers and to ease review, the LSMs continue to use the same implementation for both hooks. The net effect is that this patch should not change the behavior of the kernel in any way, it will be up to the latter LSM specific patches in this series to change the hook implementations and return the correct credentials. Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
*	Merge tag 'idmapped-mounts-v5.12' of ↵	Linus Torvalds	2021-02-23	1	-9/+13
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...