summaryrefslogtreecommitdiffstats
path: root/fs/verity
Commit message (Collapse)AuthorAgeFilesLines
* fsverity: skip PKCS#7 parser when keyring is emptyEric Biggers2023-08-201-0/+16
| | | | | | | | | | | | | | | | If an fsverity builtin signature is given for a file but the ".fs-verity" keyring is empty, there's no real reason to run the PKCS#7 parser. Skip this to avoid the PKCS#7 attack surface when builtin signature support is configured into the kernel but is not being used. This is a hardening improvement, not a fix per se, but I've added Fixes and Cc stable to get it out to more users. Fixes: 432434c9f8e1 ("fs-verity: support builtin file signatures") Cc: stable@vger.kernel.org Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20230820173237.2579-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: move sysctl registration out of signature.cEric Biggers2023-07-113-32/+34
| | | | | | | | | | | | | | | | | | | | | Currently the registration of the fsverity sysctls happens in signature.c, which couples it to CONFIG_FS_VERITY_BUILTIN_SIGNATURES. This makes it hard to add new sysctls unrelated to builtin signatures. Also, some users have started checking whether the directory /proc/sys/fs/verity exists as a way to tell whether fsverity is supported. This isn't the intended method; instead, the existence of /sys/fs/$fstype/features/verity should be checked, or users should just try to use the fsverity ioctls. Regardless, it should be made to work as expected without a dependency on CONFIG_FS_VERITY_BUILTIN_SIGNATURES. Therefore, move the sysctl registration into init.c. With CONFIG_FS_VERITY_BUILTIN_SIGNATURES, nothing changes. Without it, but with CONFIG_FS_VERITY, an empty list of sysctls is now registered. Link: https://lore.kernel.org/r/20230705212743.42180-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: simplify handling of errors during initcallEric Biggers2023-07-115-78/+28
| | | | | | | | | | | | | | Since CONFIG_FS_VERITY is a bool, not a tristate, fs/verity/ can only be builtin or absent entirely; it can't be a loadable module. Therefore, the error code that gets returned from the fsverity_init() initcall is never used. If any part of the initcall does fail, which should never happen, the kernel will be left in a bad state. Following the usual convention for builtin code, just panic the kernel if any of part of the initcall fails. Link: https://lore.kernel.org/r/20230705212743.42180-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: explicitly check that there is no algorithm 0Eric Biggers2023-07-111-0/+8
| | | | | | | | | Since libfsverity and some other code would break if 0 is ever allocated as an FS_VERITY_HASH_ALG_* value, make fsverity_check_hash_algs() explicitly check that there is no algorithm 0. Link: https://lore.kernel.org/r/20230705211719.37713-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: improve documentation for builtin signature supportEric Biggers2023-06-205-15/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | fsverity builtin signatures (CONFIG_FS_VERITY_BUILTIN_SIGNATURES) aren't the only way to do signatures with fsverity, and they have some major limitations. Yet, more users have tried to use them, e.g. recently by https://github.com/ostreedev/ostree/pull/2640. In most cases this seems to be because users aren't sufficiently familiar with the limitations of this feature and what the alternatives are. Therefore, make some updates to the documentation to try to clarify the properties of this feature and nudge users in the right direction. Note that the Integrity Policy Enforcement (IPE) LSM, which is not yet upstream, is planned to use the builtin signatures. (This differs from IMA, which uses its own signature mechanism.) For that reason, my earlier patch "fsverity: mark builtin signatures as deprecated" (https://lore.kernel.org/r/20221208033548.122704-1-ebiggers@kernel.org), which marked builtin signatures as "deprecated", was controversial. This patch therefore stops short of marking the feature as deprecated. I've also revised the language to focus on better explaining the feature and what its alternatives are. Link: https://lore.kernel.org/r/20230620041937.5809-1-ebiggers@kernel.org Reviewed-by: Colin Walters <walters@verbum.org> Reviewed-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: rework fsverity_get_digest() againEric Biggers2023-06-141-11/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address several issues with the calling convention and documentation of fsverity_get_digest(): - Make it provide the hash algorithm as either a FS_VERITY_HASH_ALG_* value or HASH_ALGO_* value, at the caller's choice, rather than only a HASH_ALGO_* value as it did before. This allows callers to work with the fsverity native algorithm numbers if they want to. HASH_ALGO_* is what IMA uses, but other users (e.g. overlayfs) should use FS_VERITY_HASH_ALG_* to match fsverity-utils and the fsverity UAPI. - Make it return the digest size so that it doesn't need to be looked up separately. Use the return value for this, since 0 works nicely for the "file doesn't have fsverity enabled" case. This also makes it clear that no other errors are possible. - Rename the 'digest' parameter to 'raw_digest' and clearly document that it is only useful in combination with the algorithm ID. This hopefully clears up a point of confusion. - Export it to modules, since overlayfs will need it for checking the fsverity digests of lowerdata files (https://lore.kernel.org/r/dd294a44e8f401e6b5140029d8355f88748cd8fd.1686565330.git.alexl@redhat.com). Acked-by: Mimi Zohar <zohar@linux.ibm.com> # for the IMA piece Link: https://lore.kernel.org/r/20230612190047.59755-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: simplify error handling in verify_data_block()Eric Biggers2023-06-041-34/+21
| | | | | | | | | | Clean up the error handling in verify_data_block() to (a) eliminate the 'err' variable which has caused some confusion because the function actually returns a bool, (b) reduce the compiled code size slightly, and (c) execute one fewer branch in the success case. Link: https://lore.kernel.org/r/20230604022312.48532-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: don't use bio_first_page_all() in fsverity_verify_bio()Eric Biggers2023-06-041-5/+5
| | | | | | | | | | | | | | | | | | bio_first_page_all(bio)->mapping->host is not compatible with large folios, since the first page of the bio is not necessarily the head page of the folio, and therefore it might not have the mapping pointer set. Therefore, move the dereference of ->mapping->host into verify_data_blocks(), which works with a folio. (Like the commit that this Fixes, this hasn't actually been tested with large folios yet, since the filesystems that use fs/verity/ don't support that yet. But based on code review, I think this is needed.) Fixes: 5d0f0e57ed90 ("fsverity: support verifying data from large folios") Link: https://lore.kernel.org/r/20230604022101.48342-1-ebiggers@kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: constify fsverity_hash_algEric Biggers2023-06-043-11/+11
| | | | | | | | Now that fsverity_hash_alg doesn't have an embedded mempool, it can be 'const' almost everywhere. Add it. Link: https://lore.kernel.org/r/20230604022348.48658-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: use shash API instead of ahash APIEric Biggers2023-06-044-201/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "ahash" API, like the other scatterlist-based crypto APIs such as "skcipher", comes with some well-known limitations. First, it can't easily be used with vmalloc addresses. Second, the request struct can't be allocated on the stack. This adds complexity and a possible failure point that needs to be worked around, e.g. using a mempool. The only benefit of ahash over "shash" is that ahash is needed to access traditional memory-to-memory crypto accelerators, i.e. drivers/crypto/. However, this style of crypto acceleration has largely fallen out of favor and been superseded by CPU-based acceleration or inline crypto engines. Also, ahash needs to be used asynchronously to take full advantage of such hardware, but fs/verity/ has never done this. On all systems that aren't actually using one of these ahash-only crypto accelerators, ahash just adds unnecessary overhead as it sits between the user and the underlying shash algorithms. Also, XFS is planned to cache fsverity Merkle tree blocks in the existing XFS buffer cache. As a result, it will be possible for a single Merkle tree block to be split across discontiguous pages (https://lore.kernel.org/r/20230405233753.GU3223426@dread.disaster.area). This data will need to be hashed. It is easiest to work with a vmapped address in this case. However, ahash is incompatible with this. Therefore, let's convert fs/verity/ from ahash to shash. This simplifies the code, and it should also slightly improve performance for everyone who wasn't actually using one of these ahash-only crypto accelerators, i.e. almost everyone (or maybe even everyone)! Link: https://lore.kernel.org/r/20230516052306.99600-1-ebiggers@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fdsEric Biggers2023-04-111-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read the file's data, instead of direct pagecache accesses. An unintended consequence of this is that the 'WARN_ON_ONCE(!(file->f_mode & FMODE_READ))' in __kernel_read() became reachable by fuzz tests. This happens if FS_IOC_ENABLE_VERITY is called on a fd opened with access mode 3, which means "ioctl access only". Arguably, FS_IOC_ENABLE_VERITY should work on ioctl-only fds. But ioctl-only fds are a weird Linux extension that is rarely used and that few people even know about. (The documentation for FS_IOC_ENABLE_VERITY even specifically says it requires O_RDONLY.) It's probably not worthwhile to make the ioctl internally open a new fd just to handle this case. Thus, just reject the ioctl on such fds for now. Fixes: 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE") Reported-by: syzbot+51177e4144d764827c45@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=2281afcbbfa8fdb92f9887479cc0e4180f1c6b28 Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406215106.235829-1-ebiggers@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: explicitly check for buffer overflow in build_merkle_tree()Eric Biggers2023-04-111-0/+10
| | | | | | | | | | | | | | | | | | | | The new Merkle tree construction algorithm is a bit fragile in that it may overflow the 'root_hash' array if the tree actually generated does not match the calculated tree parameters. This should never happen unless there is a filesystem bug that allows the file size to change despite deny_write_access(), or a bug in the Merkle tree logic itself. Regardless, it's fairly easy to check for buffer overflow here, so let's do so. This is a robustness improvement only; this case is not currently known to be reachable. I've added a Fixes tag anyway, since I recommend that this be included in kernels that have the mentioned commit. Fixes: 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230328041505.110162-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: use WARN_ON_ONCE instead of WARN_ONEric Biggers2023-04-113-5/+5
| | | | | | | | | | | | | As per Linus's suggestion (https://lore.kernel.org/r/CAHk-=whefxRGyNGzCzG6BVeM=5vnvgb-XhSeFJVxJyAxAF8XRA@mail.gmail.com), use WARN_ON_ONCE instead of WARN_ON. This barely adds any extra overhead, and it makes it so that if any of these ever becomes reachable (they shouldn't, but that's the point), the logs can't be flooded. Link: https://lore.kernel.org/r/20230406181542.38894-1-ebiggers@kernel.org Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
* fs-verity: simplify sysctls with register_sysctl()Luis Chamberlain2023-03-271-8/+1
| | | | | | | | | | register_sysctl_paths() is only needed if your child (directories) have entries but this does not so just use register_sysctl() so to do away with the path specification. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230302202826.776286-10-mcgrof@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITYEric Biggers2023-03-151-12/+13
| | | | | | | | | | | | | | | | The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing performance problems and is hindering adoption of fsverity. It was intended to solve a race condition where unverified pages might be left in the pagecache. But actually it doesn't solve it fully. Since the incomplete solution for this race condition has too much performance impact for it to be worth it, let's remove it for now. Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl") Cc: stable@vger.kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: Remove WQ_UNBOUND from fsverity read workqueueNathan Huckleberry2023-03-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This is problematic for latency sensitive workloads, like I/O post-processing. Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related scheduler latency and improves app cold startup times by ~30ms. WQ_UNBOUND was also removed from the dm-verity workqueue for the same reason [1]. This code was tested by running Android app startup benchmarks and measuring how long the fsverity workqueue spent in the runnable state. Before Total workqueue scheduler latency: 553800us After Total workqueue scheduler latency: 18962us [1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/ Signed-off-by: Nathan Huckleberry <nhuck@google.com> Fixes: 8a1d0f9cacc9 ("fs-verity: add data verification hooks for ->readpages()") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: support verifying data from large foliosEric Biggers2023-01-271-21/+22
| | | | | | | | | | | | | Try to make fs/verity/verify.c aware of large folios. This includes making fsverity_verify_bio() support the case where the bio contains large folios, and adding a function fsverity_verify_folio() which is the equivalent of fsverity_verify_page(). There's no way to actually test this with large folios yet, but I've tested that this doesn't cause any regressions. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230127221529.299560-1-ebiggers@kernel.org
* fsverity: support enabling with tree block size < PAGE_SIZEEric Biggers2023-01-091-136/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make FS_IOC_ENABLE_VERITY support values of fsverity_enable_arg::block_size other than PAGE_SIZE. To make this possible, rework build_merkle_tree(), which was reading data and hash pages from the file and assuming that they were the same thing as "blocks". For reading the data blocks, just replace the direct pagecache access with __kernel_read(), to naturally read one block at a time. (A disadvantage of the above is that we lose the two optimizations of hashing the pagecache pages in-place and forcing the maximum readahead. That shouldn't be very important, though.) The hash block reads are a bit more difficult to handle, as the only way to do them is through fsverity_operations::read_merkle_tree_page(). Instead, let's switch to the single-pass tree construction algorithm that fsverity-utils uses. This eliminates the need to read back any hash blocks while the tree is being built, at the small cost of an extra block-sized memory buffer per Merkle tree level. This is probably what I should have done originally. Taken together, the above two changes result in page-size independent code that is also a bit simpler than what we had before. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-8-ebiggers@kernel.org
* fsverity: support verification with tree block size < PAGE_SIZEEric Biggers2023-01-093-98/+296
| | | | | | | | | | | | | | | | | | | | | | | | | | Add support for verifying data from verity files whose Merkle tree block size is less than the page size. The main use case for this is to allow a single Merkle tree block size to be used across all systems, so that only one set of fsverity file digests and signatures is needed. To do this, eliminate various assumptions that the Merkle tree block size and the page size are the same: - Make fsverity_verify_page() a wrapper around a new function fsverity_verify_blocks() which verifies one or more blocks in a page. - When a Merkle tree block is needed, get the corresponding page and only verify and use the needed portion. (The Merkle tree continues to be read and cached in page-sized chunks; that doesn't need to change.) - When the Merkle tree block size and page size differ, use a bitmap fsverity_info::hash_block_verified to keep track of which Merkle tree blocks have been verified, as PageChecked cannot be used directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-7-ebiggers@kernel.org
* fsverity: replace fsverity_hash_page() with fsverity_hash_block()Eric Biggers2023-01-094-22/+21
| | | | | | | | | | | | | | In preparation for allowing the Merkle tree block size to differ from PAGE_SIZE, replace fsverity_hash_page() with fsverity_hash_block(). The new function is similar to the old one, but it operates on the block at the given offset in the page instead of on the full page. (For now, all callers still pass a full page.) Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-6-ebiggers@kernel.org
* fsverity: use EFBIG for file too large to enable verityEric Biggers2023-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Currently, there is an implementation limit where files can't have more than 8 Merkle tree levels. With SHA-256 and 4K blocks, this limit is never reached, since a file would need to be larger than 2**64 bytes to need 9 levels. However, with SHA-512, 9 levels are needed for files larger than about 1.15 EB, which is possible on btrfs. Therefore, this limit technically became reachable when btrfs added fsverity support. Meanwhile, support for merkle_tree_block_size < PAGE_SIZE will introduce another implementation limit on file size, resulting from the use of an in-memory bitmap to track which Merkle tree blocks have been verified. In any case, currently FS_IOC_ENABLE_VERITY fails with EINVAL when the file is too large. This is undocumented, and also ambiguous since EINVAL can mean other things too. Let's change the error code to EFBIG, which is much clearer, and document it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-5-ebiggers@kernel.org
* fsverity: store log2(digest_size) precomputedEric Biggers2023-01-093-4/+6
| | | | | | | | | | Add log_digestsize to struct merkle_tree_params so that it can be used in verify.c. Also save memory by using u8 for all the log_* fields. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-4-ebiggers@kernel.org
* fsverity: simplify Merkle tree readahead size calculationEric Biggers2023-01-093-16/+10
| | | | | | | | | | | | | | First, calculate max_ra_pages more efficiently by using the bio size. Second, calculate the number of readahead pages from the hash page index, instead of calculating it ahead of time using the data page index. This ends up being a bit simpler, especially since level 0 is last in the tree, so we can just limit the readahead to the tree size. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-3-ebiggers@kernel.org
* fsverity: use unsigned long for level_startEric Biggers2023-01-092-6/+16
| | | | | | | | | | | | | | | | | | fs/verity/ isn't consistent with whether Merkle tree block indices are 'unsigned long' or 'u64'. There's no real point to using u64 for them, though, since (a) a Merkle tree with over ULONG_MAX blocks would only be needed for a file larger than MAX_LFS_FILESIZE, and (b) for reads, the status of all Merkle tree blocks has to be tracked in memory. Therefore, let's make things a bit more efficient on 32-bit systems by using 'unsigned long[]' for merkle_tree_params::level_start, instead of 'u64[]'. Also, to be extra safe, explicitly check that there aren't more than ULONG_MAX Merkle tree blocks. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-2-ebiggers@kernel.org
* fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUGEric Biggers2023-01-017-59/+2
| | | | | | | | | | | | | | | I've gotten very little use out of these debug messages, and I'm not aware of anyone else having used them. Indeed, sprinkling pr_debug around is not really a best practice these days, especially for filesystem code. Tracepoints are used instead. Let's just remove these and start from a clean slate. This change does not affect info, warning, and error messages. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221215060420.60692-1-ebiggers@kernel.org
* fsverity: pass pos and size to ->write_merkle_tree_blockEric Biggers2023-01-011-2/+2
| | | | | | | | | | | | | | | fsverity_operations::write_merkle_tree_block is passed the index of the block to write and the log base 2 of the block size. However, all implementations of it use these parameters only to calculate the position and the size of the block, in bytes. Therefore, make ->write_merkle_tree_block take 'pos' and 'size' parameters instead of 'index' and 'log_blocksize'. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-5-ebiggers@kernel.org
* fsverity: optimize fsverity_cleanup_inode() on non-verity filesEric Biggers2023-01-011-8/+2
| | | | | | | | | | | Make fsverity_cleanup_inode() an inline function that checks for non-NULL ->i_verity_info, then (if needed) calls __fsverity_cleanup_inode() to do the real work. This reduces the overhead on non-verity files. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-4-ebiggers@kernel.org
* fsverity: optimize fsverity_prepare_setattr() on non-verity filesEric Biggers2023-01-011-13/+3
| | | | | | | | | | Make fsverity_prepare_setattr() an inline function that does the IS_VERITY() check, then (if needed) calls __fsverity_prepare_setattr() to do the real work. This reduces the overhead on non-verity files. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-3-ebiggers@kernel.org
* fsverity: optimize fsverity_file_open() on non-verity filesEric Biggers2023-01-011-18/+2
| | | | | | | | | | Make fsverity_file_open() an inline function that does the IS_VERITY() check, then (if needed) calls __fsverity_file_open() to do the real work. This reduces the overhead on non-verity files. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-2-ebiggers@kernel.org
* fsverity: simplify fsverity_get_digest()Eric Biggers2022-11-293-17/+13
| | | | | | | | | | | | | | | | | Instead of looking up the algorithm by name in hash_algo_name[] to get its hash_algo ID, just store the hash_algo ID in the fsverity_hash_alg struct. Verify at boot time that every fsverity_hash_alg has a valid hash_algo ID with matching digest size. Remove an unnecessary memset() of the whole digest array to 0 before the digest is copied into it. Finally, remove the pr_debug statement. There is already a pr_debug for the fsverity digest when the file is opened. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/r/20221129045139.69803-1-ebiggers@kernel.org
* fsverity: stop using PG_error to track error statusEric Biggers2022-11-281-6/+6
| | | | | | | | | | | | | | | | | | | | As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track verity errors. Instead, if a verity error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. f2fs supports compression, which makes the f2fs changes a bit more complicated than desired, but the basic premise still works. Note: there are still a few uses of PageError in f2fs, but they are on the write path, so they are unrelated and this patch doesn't touch them. Reviewed-by: Chao Yu <chao@kernel.org> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
* Merge tag 'for-6.1-tag' of ↵Linus Torvalds2022-10-061-2/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There's a bunch of performance improvements, most notably the FIEMAP speedup, the new block group tree to speed up mount on large filesystems, more io_uring integration, some sysfs exports and the usual fixes and core updates. Summary: Performance: - outstanding FIEMAP speed improvement - algorithmic change how extents are enumerated leads to orders of magnitude speed boost (uncached and cached) - extent sharing check speedup (2.2x uncached, 3x cached) - add more cancellation points, allowing to interrupt seeking in files with large number of extents - more efficient hole and data seeking (4x uncached, 1.3x cached) - sample results: 256M, 32K extents: 4s -> 29ms (~150x) 512M, 64K extents: 30s -> 59ms (~550x) 1G, 128K extents: 225s -> 120ms (~1800x) - improved inode logging, especially for directories (on dbench workload throughput +25%, max latency -21%) - improved buffered IO, remove redundant extent state tracking, lowering memory consumption and avoiding rb tree traversal - add sysfs tunable to let qgroup temporarily skip exact accounting when deleting snapshot, leading to a speedup but requiring a rescan after that, will be used by snapper - support io_uring and buffered writes, until now it was just for direct IO, with the no-wait semantics implemented in the buffered write path it now works and leads to speed improvement in IOPS (2x), throughput (2.2x), latency (depends, 2x to 150x) - small performance improvements when dropping and searching for extent maps as well as when flushing delalloc in COW mode (throughput +5MB/s) User visible changes: - new incompatible feature block-group-tree adding a dedicated tree for tracking block groups, this allows a much faster load during mount and avoids seeking unlike when it's scattered in the extent tree items - this reduces mount time for many-terabyte sized filesystems - conversion tool will be provided so existing filesystem can also be updated in place - to reduce test matrix and feature combinations requires no-holes and free-space-tree (mkfs defaults since 5.15) - improved reporting of super block corruption detected by scrub - scrub also tries to repair super block and does not wait until next commit - discard stats and tunables are exported in sysfs (/sys/fs/btrfs/FSID/discard) - qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/) - verify that super block was not modified when thawing filesystem Fixes: - FIEMAP fixes - fix extent sharing status, does not depend on the cached status where merged - flush delalloc so compressed extents are reported correctly - fix alignment of VMA for memory mapped files on THP - send: fix failures when processing inodes with no links (orphan files and directories) - fix race between quota enable and quota rescan ioctl - handle more corner cases for read-only compat feature verification - fix missed extent on fsync after dropping extent maps Core: - lockdep annotations to validate various transactions states and state transitions - preliminary support for fs-verity in send - more effective memory use in scrub for subpage where sector is smaller than page - block group caching progress logic has been removed, load is now synchronous - simplify end IO callbacks and bio handling, use chained bios instead of own tracking - add no-wait semantics to several functions (tree search, nocow, flushing, buffered write - cleanups and refactoring MM changes: - export balance_dirty_pages_ratelimited_flags" * tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (177 commits) btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer btrfs: drop extent map range more efficiently btrfs: avoid pointless extent map tree search when flushing delalloc btrfs: remove unnecessary next extent map search btrfs: remove unnecessary NULL pointer checks when searching extent maps btrfs: assert tree is locked when clearing extent map from logging btrfs: remove unnecessary extent map initializations btrfs: remove the refcount warning/check at free_extent_map() btrfs: add helper to replace extent map range with a new extent map btrfs: move open coded extent map tree deletion out of inode eviction btrfs: use cond_resched_rwlock_write() during inode eviction btrfs: use extent_map_end() at btrfs_drop_extent_map_range() btrfs: move btrfs_drop_extent_cache() to extent_map.c btrfs: fix missed extent on fsync after dropping extent maps btrfs: remove stale prototype of btrfs_write_inode btrfs: enable nowait async buffered writes btrfs: assert nowait mode is not used for some btree search functions btrfs: make btrfs_buffered_write nowait compatible btrfs: plumb NOWAIT through the write path btrfs: make lock_and_cleanup_extent_if_need nowait compatible ...
| * btrfs: send: add support for fs-verityBoris Burkov2022-09-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preserve the fs-verity status of a btrfs file across send/recv. There is no facility for installing the Merkle tree contents directly on the receiving filesystem, so we package up the parameters used to enable verity found in the verity descriptor. This gives the receive side enough information to properly enable verity again. Note that this means that receive will have to re-compute the whole Merkle tree, similar to how compression worked before encoded_write. Since the file becomes read-only after verity is enabled, it is important that verity is added to the send stream after any file writes. Therefore, when we process a verity item, merely note that it happened, then actually create the command in the send stream during 'finish_inode_if_needed'. This also creates V3 of the send stream format, without any format changes besides adding the new commands and attributes. Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
* | fs-verity: use kmap_local_page() instead of kmap()Eric Biggers2022-08-191-3/+3
| | | | | | | | | | | | | | | | | | | | Convert the use of kmap() to its recommended replacement kmap_local_page(). This avoids the overhead of doing a non-local mapping, which is unnecessary in this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220818224010.43778-1-ebiggers@kernel.org
* | fs-verity: use memcpy_from_page()Eric Biggers2022-08-191-12/+2
|/ | | | | | | | | | | Replace extract_hash() with the memcpy_from_page() helper function. This is simpler, and it has the side effect of replacing the use of kmap_atomic() with its recommended replacement kmap_local_page(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220818223903.43710-1-ebiggers@kernel.org
* fs-verity: mention btrfs supportEric Biggers2022-07-151-5/+5
| | | | | | | | btrfs supports fs-verity since Linux v5.15. Document this. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: David Sterba <dsterba@suse.com> Link: https://lore.kernel.org/r/20220610000616.18225-1-ebiggers@kernel.org
* Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2022-05-241-15/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...
| * mm/readahead: Convert page_cache_async_readahead to take a folioMatthew Wilcox (Oracle)2022-05-081-15/+14
| | | | | | | | | | | | | | | | Removes a couple of calls to compound_head and saves a few bytes. Also convert verity's read_file_data_page() to be folio-based. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | Merge tag 'integrity-v5.19' of ↵Linus Torvalds2022-05-243-7/+44
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA updates from Mimi Zohar: "New is IMA support for including fs-verity file digests and signatures in the IMA measurement list as well as verifying the fs-verity file digest based signatures, both based on policy. In addition, are two bug fixes: - avoid reading UEFI variables, which cause a page fault, on Apple Macs with T2 chips. - remove the original "ima" template Kconfig option to address a boot command line ordering issue. The rest is a mixture of code/documentation cleanup" * tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: Fix sparse warnings in keyring_handler evm: Clean up some variables evm: Return INTEGRITY_PASS for enum integrity_status value '0' efi: Do not import certificates from UEFI Secure Boot for T2 Macs fsverity: update the documentation ima: support fs-verity file digest based version 3 signatures ima: permit fsverity's file digests in the IMA measurement list ima: define a new template field named 'd-ngv2' and templates fs-verity: define a function to return the integrity protected file digest ima: use IMA default hash algorithm for integrity violations ima: fix 'd-ng' comments and documentation ima: remove the IMA_TEMPLATE Kconfig option ima: remove redundant initialization of pointer 'file'.
| * | fs-verity: define a function to return the integrity protected file digestMimi Zohar2022-05-013-7/+44
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Define a function named fsverity_get_digest() to return the verity file digest and the associated hash algorithm (enum hash_algo). This assumes that before calling fsverity_get_digest() the file must have been opened, which is even true for the IMA measure/appraise on file open policy rule use case (func=FILE_CHECK). do_open() calls vfs_open() immediately prior to ima_file_check(). Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | fs-verity: Use struct_size() helper in enable_verity()Zhang Jianhua2022-05-191-1/+1
| | | | | | | | | | | | | | | | | | Follow the best practice for allocating a variable-sized structure. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> [ebiggers: adjusted commit message] Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220519022450.2434483-1-chris.zjh@huawei.com
* | fs-verity: remove unused parameter desc_size in fsverity_create_info()Zhang Jianhua2022-05-184-16/+9
|/ | | | | | | | | | | | | | | | The parameter desc_size in fsverity_create_info() is useless and it is not referenced anywhere. The greatest meaning of desc_size here is to indecate the size of struct fsverity_descriptor and futher calculate the size of signature. However, the desc->sig_size can do it also and it is indeed, so remove it. Therefore, it is no need to acquire desc_size by fsverity_get_descriptor() in ensure_verity_info(), so remove the parameter desc_ret in fsverity_get_descriptor() too. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220518132256.2297655-1-chris.zjh@huawei.com
* fs: Remove ->readpages address space operationMatthew Wilcox (Oracle)2022-04-011-2/+2
| | | | | | | | | | | All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
* fs-verity: fix signed integer overflow with i_size near S64_MAXEric Biggers2021-09-222-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | If the file size is almost S64_MAX, the calculated number of Merkle tree levels exceeds FS_VERITY_MAX_LEVELS, causing FS_IOC_ENABLE_VERITY to fail. This is unintentional, since as the comment above the definition of FS_VERITY_MAX_LEVELS states, it is enough for over U64_MAX bytes of data using SHA-256 and 4K blocks. (Specifically, 4096*128**8 >= 2**64.) The bug is actually that when the number of blocks in the first level is calculated from i_size, there is a signed integer overflow due to i_size being signed. Fix this by treating i_size as unsigned. This was found by the new test "generic: test fs-verity EFBIG scenarios" (https://lkml.kernel.org/r/b1d116cd4d0ea74b9cd86f349c672021e005a75c.1631558495.git.boris@bur.io). This didn't affect ext4 or f2fs since those have a smaller maximum file size, but it did affect btrfs which allows files up to S64_MAX bytes. Reported-by: Boris Burkov <boris@bur.io> Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl") Fixes: fd2d1acfcadf ("fs-verity: add the hook for file ->open()") Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Boris Burkov <boris@bur.io> Link: https://lore.kernel.org/r/20210916203424.113376-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* fsverity: relax build time dependency on CRYPTO_SHA256Ard Biesheuvel2021-04-221-2/+6
| | | | | | | | | | | | | | | | | | CONFIG_CRYPTO_SHA256 denotes the generic C implementation of the SHA-256 shash algorithm, which is selected as the default crypto shash provider for fsverity. However, fsverity has no strict link time dependency, and the same shash could be exposed by an optimized implementation, and arm64 has a number of those (scalar, NEON-based and one based on special crypto instructions). In such cases, it makes little sense to require that the generic C implementation is incorporated as well, given that it will never be called. To address this, relax the 'select' clause to 'imply' so that the generic driver can be omitted from the build if desired. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds2021-02-231-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
| * fs: add file and path permissions helpersChristian Brauner2021-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two simple helpers to check permissions on a file and path respectively and convert over some callers. It simplifies quite a few codepaths and also reduces the churn in later patches quite a bit. Christoph also correctly points out that this makes codepaths (e.g. ioctls) way easier to follow that would otherwise have to do more complex argument passing than necessary. Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.com Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | fs-verity: support reading signature with ioctlEric Biggers2021-02-071-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for FS_VERITY_METADATA_TYPE_SIGNATURE to FS_IOC_READ_VERITY_METADATA. This allows a userspace server program to retrieve the built-in signature (if present) of a verity file for serving to a client which implements fs-verity compatible verification. See the patch which introduced FS_IOC_READ_VERITY_METADATA for more details. The ability for userspace to read the built-in signatures is also useful because it allows a system that is using the in-kernel signature verification to migrate to userspace signature verification. This has been tested using a new xfstest which calls this ioctl via a new subcommand for the 'fsverity' program from fsverity-utils. Link: https://lore.kernel.org/r/20210115181819.34732-7-ebiggers@kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
* | fs-verity: support reading descriptor with ioctlEric Biggers2021-02-071-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for FS_VERITY_METADATA_TYPE_DESCRIPTOR to FS_IOC_READ_VERITY_METADATA. This allows a userspace server program to retrieve the fs-verity descriptor of a file for serving to a client which implements fs-verity compatible verification. See the patch which introduced FS_IOC_READ_VERITY_METADATA for more details. "fs-verity descriptor" here means only the part that userspace cares about because it is hashed to produce the file digest. It doesn't include the signature which ext4 and f2fs append to the fsverity_descriptor struct when storing it on-disk, since that way of storing the signature is an implementation detail. The next patch adds a separate metadata_type value for retrieving the signature separately. This has been tested using a new xfstest which calls this ioctl via a new subcommand for the 'fsverity' program from fsverity-utils. Link: https://lore.kernel.org/r/20210115181819.34732-6-ebiggers@kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
* | fs-verity: support reading Merkle tree with ioctlEric Biggers2021-02-071-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for FS_VERITY_METADATA_TYPE_MERKLE_TREE to FS_IOC_READ_VERITY_METADATA. This allows a userspace server program to retrieve the Merkle tree of a verity file for serving to a client which implements fs-verity compatible verification. See the patch which introduced FS_IOC_READ_VERITY_METADATA for more details. This has been tested using a new xfstest which calls this ioctl via a new subcommand for the 'fsverity' program from fsverity-utils. Link: https://lore.kernel.org/r/20210115181819.34732-5-ebiggers@kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com>