diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-05-07 21:12:44 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-05-07 21:12:44 -0700 |
commit | 5abe37954e9a315c35c9490f78d55f307c3c636b (patch) | |
tree | fa2034b03b270c48ac516a8fe308654443b4a7e2 /Documentation/admin-guide | |
parent | e5fef2a9732580c5bd30c0097f5e9091a3d58ce5 (diff) | |
parent | db90f41916cf04c020062f8d8b0385942248283e (diff) | |
download | linux-5abe37954e9a315c35c9490f78d55f307c3c636b.tar.gz linux-5abe37954e9a315c35c9490f78d55f307c3c636b.tar.bz2 linux-5abe37954e9a315c35c9490f78d55f307c3c636b.zip |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Add as a feature case-insensitive directories (the casefold feature)
using Unicode 12.1.
Also, the usual largish number of cleanups and bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
ext4: fix ext4_show_options for file systems w/o journal
unicode: refactor the rule for regenerating utf8data.h
docs: ext4.rst: document case-insensitive directories
ext4: Support case-insensitive file name lookups
ext4: include charset encoding information in the superblock
MAINTAINERS: add Unicode subsystem entry
unicode: update unicode database unicode version 12.1.0
unicode: introduce test module for normalized utf8 implementation
unicode: implement higher level API for string handling
unicode: reduce the size of utf8data[]
unicode: introduce code for UTF-8 normalization
unicode: introduce UTF-8 character database
ext4: actually request zeroing of inode table after grow
ext4: cond_resched in work-heavy group loops
ext4: fix use-after-free race with debug_want_extra_isize
ext4: avoid drop reference to iloc.bh twice
ext4: ignore e_value_offs for xattrs with value-in-ea-inode
ext4: protect journal inode's blocks using block_validity
ext4: use BUG() instead of BUG_ON(1)
...
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/ext4.rst | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst index e506d3dae510..059ddcbe769d 100644 --- a/Documentation/admin-guide/ext4.rst +++ b/Documentation/admin-guide/ext4.rst @@ -91,10 +91,48 @@ Currently Available * large block (up to pagesize) support * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force the ordering) +* Case-insensitive file name lookups [1] Filesystems with a block size of 1k may see a limit imposed by the directory hash tree having a maximum depth of two. +case-insensitive file name lookups +====================================================== + +The case-insensitive file name lookup feature is supported on a +per-directory basis, allowing the user to mix case-insensitive and +case-sensitive directories in the same filesystem. It is enabled by +flipping the +F inode attribute of an empty directory. The +case-insensitive string match operation is only defined when we know how +text in encoded in a byte sequence. For that reason, in order to enable +case-insensitive directories, the filesystem must have the +casefold feature, which stores the filesystem-wide encoding +model used. By default, the charset adopted is the latest version of +Unicode (12.1.0, by the time of this writing), encoded in the UTF-8 +form. The comparison algorithm is implemented by normalizing the +strings to the Canonical decomposition form, as defined by Unicode, +followed by a byte per byte comparison. + +The case-awareness is name-preserving on the disk, meaning that the file +name provided by userspace is a byte-per-byte match to what is actually +written in the disk. The Unicode normalization format used by the +kernel is thus an internal representation, and not exposed to the +userspace nor to the disk, with the important exception of disk hashes, +used on large case-insensitive directories with DX feature. On DX +directories, the hash must be calculated using the casefolded version of +the filename, meaning that the normalization format used actually has an +impact on where the directory entry is stored. + +When we change from viewing filenames as opaque byte sequences to seeing +them as encoded strings we need to address what happens when a program +tries to create a file with an invalid name. The Unicode subsystem +within the kernel leaves the decision of what to do in this case to the +filesystem, which select its preferred behavior by enabling/disabling +the strict mode. When Ext4 encounters one of those strings and the +filesystem did not require strict mode, it falls back to considering the +entire string as an opaque byte sequence, which still allows the user to +operate on that file, but the case-insensitive lookups won't work. + Options ======= |