| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted
------------------------------------------------------
syz-executor273/5027 is trying to acquire lock:
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
but task is already holding lock:
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fi->i_xattr_sem){.+.+}-{3:3}:
down_read+0x9c/0x470 kernel/locking/rwsem.c:1520
f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532
__f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:377 [inline]
f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420
f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558
f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740
f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
do_mkdirat+0x2a9/0x330 fs/namei.c:4140
__do_sys_mkdir fs/namei.c:4160 [inline]
__se_sys_mkdir fs/namei.c:4158 [inline]
__x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (&fi->i_sem){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
lock_acquire kernel/locking/lockdep.c:5761 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
down_write+0x93/0x200 kernel/locking/rwsem.c:1573
f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline]
ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146
ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309
ovl_make_workdir fs/overlayfs/super.c:711 [inline]
ovl_get_workdir fs/overlayfs/super.c:864 [inline]
ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400
vfs_get_super+0xf9/0x290 fs/super.c:1152
vfs_get_tree+0x88/0x350 fs/super.c:1519
do_new_mount fs/namespace.c:3335 [inline]
path_mount+0x1492/0x1ed0 fs/namespace.c:3662
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount fs/namespace.c:3861 [inline]
__x64_sys_mount+0x293/0x310 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(&fi->i_xattr_sem);
lock(&fi->i_sem);
lock(&fi->i_xattr_sem);
lock(&fi->i_sem);
Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com
Fixes: 5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock"
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
| |
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-41-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
|
|
|
|
|
|
| |
Use common implementation of file type conversion helpers.
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When converting an inline directory to a regular one, f2fs is leaking
uninitialized memory to disk because it doesn't initialize the entire
directory block. Fix this by zero-initializing the block.
This bug was introduced by commit 4ec17d688d74 ("f2fs: avoid unneeded
initializing when converting inline dentry"), which didn't consider the
security implications of leaking uninitialized memory to disk.
This was found by running xfstest generic/435 on a KMSAN-enabled kernel.
Fixes: 4ec17d688d74 ("f2fs: avoid unneeded initializing when converting inline dentry")
Cc: <stable@vger.kernel.org> # v4.3+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After below changes:
commit 14db0b3c7b83 ("fscrypt: stop using PG_error to track error status")
commit 98dc08bae678 ("fsverity: stop using PG_error to track error status")
There is no place in f2fs we will set PG_error flag in page, let's remove
other PG_error usage in f2fs, as a step towards freeing the PG_error flag
for other uses.
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
This patch supports to record detail reason of FSCORRUPTED error into
f2fs_super_block.s_errors[].
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
This is simpler, and as a side effect it replaces several uses of
kmap_atomic() with its recommended replacement kmap_local_page().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Yanming reported a kernel bug in Bugzilla kernel [1], which can be
reproduced. The bug message is:
The kernel message is shown below:
kernel BUG at fs/inode.c:611!
Call Trace:
evict+0x282/0x4e0
__dentry_kill+0x2b2/0x4d0
dput+0x2dd/0x720
do_renameat2+0x596/0x970
__x64_sys_rename+0x78/0x90
do_syscall_64+0x3b/0x90
[1] https://bugzilla.kernel.org/show_bug.cgi?id=215895
The bug is due to fuzzed inode has both inline_data and encrypted flags.
During f2fs_evict_inode(), as the inode was deleted by rename(), it
will cause inline data conversion due to conflicting flags. The page
cache will be polluted and the panic will be triggered in clear_inode().
Try fixing the bug by doing more sanity checks for inline data inode in
sanity_check_inode().
Cc: stable@vger.kernel.org
Reported-by: Ming Yan <yanming@tju.edu.cn>
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
f2fs rw_semaphores work better if writers can starve readers,
especially for the checkpoint thread, because writers are strictly
more important than reader threads. This prevents significant priority
inversion between low-priority readers that blocked while trying to
acquire the read lock and a second acquisition of the write lock that
might be blocking high priority work.
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch tries to mitigate lock contention between f2fs_write_checkpoint and
f2fs_get_node_info along with nat_tree_lock.
The idea is, if checkpoint is currently running, other threads that try to grab
nat_tree_lock would be better to wait for checkpoint.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new function f2fs_dquot_initialize() to wrap
dquot_initialize(), and it supports to inject fault into
f2fs_dquot_initialize() to simulate inner failure occurs in
dquot_initialize().
Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Restruct f2fs page private layout for below reasons:
There are some cases that f2fs wants to set a flag in a page to
indicate a specified status of page:
a) page is in transaction list for atomic write
b) page contains dummy data for aligned write
c) page is migrating for GC
d) page contains inline data for inline inode flush
e) page belongs to merkle tree, and is verified for fsverity
f) page is dirty and has filesystem/inode reference count for writeback
g) page is temporary and has decompress io context reference for compression
There are existed places in page structure we can use to store
f2fs private status/data:
- page.flags: PG_checked, PG_private
- page.private
However it was a mess when we using them, which may cause potential
confliction:
page.private PG_private PG_checked page._refcount (+1 at most)
a) -1 set +1
b) -2 set
c), d), e) set
f) 0 set +1
g) pointer set
The other problem is page.flags has no free slot, if we can avoid set
zero to page.private and set PG_private flag, then we use non-zero value
to indicate PG_private status, so that we may have chance to reclaim
PG_private slot for other usage. [1]
The other concern is f2fs has bad scalability in aspect of indicating
more page status.
So in this patch, let's restructure f2fs' page.private as below to
solve above issues:
Layout A: lowest bit should be 1
| bit0 = 1 | bit1 | bit2 | ... | bit MAX | private data .... |
bit 0 PAGE_PRIVATE_NOT_POINTER
bit 1 PAGE_PRIVATE_ATOMIC_WRITE
bit 2 PAGE_PRIVATE_DUMMY_WRITE
bit 3 PAGE_PRIVATE_ONGOING_MIGRATION
bit 4 PAGE_PRIVATE_INLINE_INODE
bit 5 PAGE_PRIVATE_REF_RESOURCE
bit 6- f2fs private data
Layout B: lowest bit should be 0
page.private is a wrapped pointer.
After the change:
page.private PG_private PG_checked page._refcount (+1 at most)
a) 11 set +1
b) 101 set +1
c) 1001 set +1
d) 10001 set +1
e) set
f) 100001 set +1
g) pointer set +1
[1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The uninitialized variable dn.node_changed does not get set when a
call to f2fs_get_node_page fails. This uninitialized value gets used
in the call to f2fs_balance_fs() that may or not may not balances
dirty node and dentry pages depending on the uninitialized state of
the variable. Fix this by only calling f2fs_balance_fs if err is
not set.
Thanks to Jaegeuk Kim for suggesting an appropriate fix.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 2a3407607028 ("f2fs: call f2fs_balance_fs only when node was changed")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Occasionally, quota data may be corrupted detected by fsck:
Info: checkpoint state = 45 : crc compacted_summary unmount
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1543036928, 762) != expected (1543032832, 762)
[ASSERT] (fsck_chk_quota_files:1986) --> Quota file is missing or invalid quota file content found.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1352478720, 344) != expected (1352474624, 344)
[ASSERT] (fsck_chk_quota_files:1986) --> Quota file is missing or invalid quota file content found.
[FSCK] Unreachable nat entries [Ok..] [0x0]
[FSCK] SIT valid block bitmap checking [Ok..]
[FSCK] Hard link checking for regular file [Ok..] [0x0]
[FSCK] valid_block_count matching with CP [Ok..] [0xdf299]
[FSCK] valid_node_count matcing with CP (de lookup) [Ok..] [0x2b01]
[FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0x2b01]
[FSCK] valid_inode_count matched with CP [Ok..] [0x2665]
[FSCK] free segment_count matched with CP [Ok..] [0xcb04]
[FSCK] next block offset is free [Ok..]
[FSCK] fixing SIT types
[FSCK] other corrupted bugs [Fail]
The root cause is:
If we open file w/ readonly flag, disk quota info won't be initialized
for this file, however, following mmap() will force to convert inline
inode via f2fs_convert_inline_inode(), which may increase block usage
for this inode w/o updating quota data, it causes inconsistent disk quota
info.
The issue will happen in following stack:
open(file, O_RDONLY)
mmap(file)
- f2fs_convert_inline_inode
- f2fs_convert_inline_page
- f2fs_reserve_block
- f2fs_reserve_new_block
- f2fs_reserve_new_blocks
- f2fs_i_blocks_write
- dquot_claim_block
inode->i_blocks increase, but the dqb_curspace keep the size for the dquots
is NULL.
To fix this issue, let's call dquot_initialize() anyway in both
f2fs_truncate() and f2fs_convert_inline_inode() functions to avoid potential
inconsistent quota data issue.
Fixes: 0abd675e97e6 ("f2fs: support plain user/group quota")
Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com>
Signed-off-by: Dehe Gu <gudehe@huawei.com>
Signed-off-by: Junchao Jiang <jiangjunchao1@huawei.com>
Signed-off-by: Ge Qiu <qiuge@huawei.com>
Signed-off-by: Yi Chen <chenyi77@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Miss to stat inline inode in f2fs_recover_inline_data.
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
In 3rd scene, it should remove data blocks instead of inline_data.
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic_make_request: Trying to write to read-only block-device dm-5 (partno 0)
WARNING: CPU: 7 PID: 546 at block/blk-core.c:2190 generic_make_request_checks+0x664/0x690
pc : generic_make_request_checks+0x664/0x690
lr : generic_make_request_checks+0x664/0x690
Call trace:
generic_make_request_checks+0x664/0x690
generic_make_request+0xf0/0x3a4
submit_bio+0x80/0x250
__submit_merged_bio+0x368/0x4e0
__submit_merged_write_cond.llvm.12294350193007536502+0xe0/0x3e8
f2fs_wait_on_page_writeback+0x84/0x128
f2fs_convert_inline_page+0x35c/0x6f8
f2fs_convert_inline_inode+0xe0/0x2e0
f2fs_file_mmap+0x48/0x9c
mmap_region+0x41c/0x74c
do_mmap+0x40c/0x4fc
vm_mmap_pgoff+0xb8/0x114
vm_mmap+0x34/0x48
elf_map+0x68/0x108
load_elf_binary+0x538/0xb70
search_binary_handler+0xac/0x1dc
exec_binprm+0x50/0x15c
__do_execve_file+0x620/0x740
__arm64_sys_execve+0x54/0x68
el0_svc_common+0x9c/0x168
el0_svc_handler+0x60/0x6c
el0_svc+0x8/0xc
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expand f2fs's casefolding support to include encrypted directories. To
index casefolded+encrypted directories, we use the SipHash of the
casefolded name, keyed by a key derived from the directory's fscrypt
master key. This ensures that the dirhash doesn't leak information
about the plaintext filenames.
Encryption keys are unavailable during roll-forward recovery, so we
can't compute the dirhash when recovering a new dentry in an encrypted +
casefolded directory. To avoid having to force a checkpoint when a new
file is fsync'ed, store the dirhash on-disk appended to i_name.
This patch incorporates work by Eric Biggers <ebiggers@google.com>
and Jaegeuk Kim <jaegeuk@kernel.org>.
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
After commit 0b6d4ca04a86 ("f2fs: don't return vmalloc() memory from
f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed
memory, so clean up to use kfree() instead of kvfree() to free
vmalloc()'ed memory.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
- don't panic kernel if f2fs_get_node_page() fails in
f2fs_recover_inline_data() or f2fs_recover_inline_xattr();
- return error number of f2fs_truncate_blocks() to
f2fs_recover_inline_data()'s caller;
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
to show f2fs_fiemap()'s result as below:
f2fs_fiemap: dev = (251,0), ino = 7, lblock:0, pblock:1625292800, len:2097152, flags:0, ret:0
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added some knobs to enhance compression feature
and harden testing environment. In addition, we've fixed several bugs
reported from Android devices such as long discarding latency, device
hanging during quota_sync, etc.
Enhancements:
- support lzo-rle algorithm
- add two ioctls to release and reserve blocks for compression
- support partial truncation/fiemap on compressed file
- introduce sysfs entries to attach IO flags explicitly
- add iostat trace point along with read io stat
Bug fixes:
- fix long discard latency
- flush quota data by f2fs_quota_sync correctly
- fix to recover parent inode number for power-cut recovery
- fix lz4/zstd output buffer budget
- parse checkpoint mount option correctly
- avoid inifinite loop to wait for flushing node/meta pages
- manage discard space correctly
And some refactoring and clean up patches were added"
* tag 'f2fs-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
f2fs: attach IO flags to the missing cases
f2fs: add node_io_flag for bio flags likewise data_io_flag
f2fs: remove unused parameter of f2fs_put_rpages_mapping()
f2fs: handle readonly filesystem in f2fs_ioc_shutdown()
f2fs: avoid utf8_strncasecmp() with unstable name
f2fs: don't return vmalloc() memory from f2fs_kmalloc()
f2fs: fix retry logic in f2fs_write_cache_pages()
f2fs: fix wrong discard space
f2fs: compress: don't compress any datas after cp stop
f2fs: remove unneeded return value of __insert_discard_tree()
f2fs: fix wrong value of tracepoint parameter
f2fs: protect new segment allocation in expand_inode_data
f2fs: code cleanup by removing ifdef macro surrounding
f2fs: avoid inifinite loop to wait for flushing node pages at cp_error
f2fs: flush dirty meta pages when flushing them
f2fs: fix checkpoint=disable:%u%%
f2fs: compress: fix zstd data corruption
f2fs: add compressed/gc data read IO stat
f2fs: fix potential use-after-free issue
f2fs: compress: don't handle non-compressed data in workqueue
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We need to call fscrypt_free_filename() to free the memory allocated by
fscrypt_setup_filename().
Fixes: b06af2aff28b ("f2fs: convert inline_dir early before starting rename")
Cc: <stable@vger.kernel.org> # v5.6+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
| |
No need to pull the fiemap definitions into almost every file in the
kernel build.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
| |
If we hit an error during rename, we'll get two dentries in different
directories.
Chao adds to check the room in inline_dir which can avoid needless
inversion. This should be done by inode_lock(&old_dir).
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
Otherwise, it can cause circular locking dependency reported by mm.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If inode is newly created, inode page may not synchronize with inode cache,
so fields like .i_inline or .i_extra_isize could be wrong, in below call
path, we may access such wrong fields, result in failing to migrate valid
target block.
Thread A Thread B
- f2fs_create
- f2fs_add_link
- f2fs_add_dentry
- f2fs_init_inode_metadata
- f2fs_add_inline_entry
- f2fs_new_inode_page
- f2fs_put_page
: inode page wasn't updated with inode cache
- gc_data_segment
- is_alive
- f2fs_get_node_page
- datablock_addr
- offset_in_addr
: access uninitialized fields
Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
In error path of f2fs_convert_inline_page(), we missed to truncate newly
reserved block in .i_addrs[0] once we failed in get_node_info(), fix it.
Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
Adjust f2fs_fiemap() to support fiemap() on directory inode.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.
This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.
EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add and use f2fs_<level> macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With below mkfs and mount option, generic/339 of fstest will report that
scratch image becomes corrupted.
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
[ASSERT] (f2fs_check_dirent_position:1315) --> Wrong position of dirent pino:1970, name: (...)
level:8, dir_level:0, pgofs:951, correct range:[900, 901]
In old kernel, inline data and directory always reserved 200 bytes in
inode layout, even if inline_xattr is disabled, then new kernel tries
to retrieve that space for non-inline xattr inode, but for inline dentry,
its layout size should be fixed, so we just keep that reserved space.
But the problem here is that, after inline dentry conversion, inline
dentry layout no longer exists, if we still reserve inline xattr space,
after dents updates, there will be a hole in inline xattr space, which
can break hierarchy hash directory structure.
This patch fixes this issue by retrieving inline xattr space after
inline dentry conversion.
Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Jiqun Li reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202883
sometimes, dead lock when make system call SYS_getdents64 with fsync() is
called by another process.
monkey running on android9.0
1. task 9785 held sbi->cp_rwsem and waiting lock_page()
2. task 10349 held mm_sem and waiting sbi->cp_rwsem
3. task 9709 held lock_page() and waiting mm_sem
so this is a dead lock scenario.
task stack is show by crash tools as following
crash_arm64> bt ffffffc03c354080
PID: 9785 TASK: ffffffc03c354080 CPU: 1 COMMAND: "RxIoScheduler-3"
>> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8
crash-arm64> bt 10349
PID: 10349 TASK: ffffffc018b83080 CPU: 1 COMMAND: "BUGLY_ASYNC_UPL"
>> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
PC: 00000033 LR: 00000000 SP: 00000000 PSTATE: ffffffffffffffff
crash-arm64> bt 9709
PID: 9709 TASK: ffffffc03e7f3080 CPU: 1 COMMAND: "IntentService[A"
>> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
>> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
PC: ffffff8008274114 [compat_filldir64+120]
LR: ffffff80083584d4 [f2fs_fill_dentries+448]
SP: ffffffc001e67b80 PSTATE: 80400145
X29: ffffffc001e67b80 X28: 0000000000000000 X27: 000000000000001a
X26: 00000000000093d7 X25: ffffffc070d52480 X24: 0000000000000008
X23: 0000000000000028 X22: 00000000d43dfd60 X21: ffffffc001e67e90
X20: 0000000000000011 X19: ffffff80093a4000 X18: 0000000000000000
X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000
X14: ffffffffffffffff X13: 0000000000000008 X12: 0101010101010101
X11: 7f7f7f7f7f7f7f7f X10: 6a6a6a6a6a6a6a6a X9: 7f7f7f7f7f7f7f7f
X8: 0000000080808000 X7: ffffff800827409c X6: 0000000080808000
X5: 0000000000000008 X4: 00000000000093d7 X3: 000000000000001a
X2: 0000000000000011 X1: ffffffc070d52480 X0: 0000000000800238
>> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
PC: 0000003c LR: 00000000 SP: 00000000 PSTATE: 000000d9
X12: f48a02ff X11: d4678960 X10: d43dfc00 X9: d4678ae4
X8: 00000058 X7: d4678994 X6: d43de800 X5: 000000d9
X4: d43dfc0c X3: d43dfc10 X2: d46799c8 X1: 00000000
X0: 00001068
Below potential deadlock will happen between three threads:
Thread A Thread B Thread C
- f2fs_do_sync_file
- f2fs_write_checkpoint
- down_write(&sbi->node_change) -- 1)
- do_page_fault
- down_write(&mm->mmap_sem) -- 2)
- do_wp_page
- f2fs_vm_page_mkwrite
- getdents64
- f2fs_read_inline_dir
- lock_page -- 3)
- f2fs_sync_node_pages
- lock_page -- 3)
- __do_map_lock
- down_read(&sbi->node_change) -- 1)
- f2fs_fill_dentries
- dir_emit
- compat_filldir64
- do_page_fault
- down_read(&mm->mmap_sem) -- 2)
Since f2fs_readdir is protected by inode.i_rwsem, there should not be
any updates in inode page, we're safe to lookup dents in inode page
without its lock held, so taking off the lock to improve concurrency
of readdir and avoid potential deadlock.
Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we changed lock from cp_rwsem to node_change, it solved
the deadlock issue which was caused by below race condition:
Thread A Thread B
- f2fs_setattr
- f2fs_lock_op -- read_lock
- dquot_transfer
- __dquot_transfer
- dquot_acquire
- commit_dqblk
- f2fs_quota_write
- f2fs_write_begin
- f2fs_write_failed
- write_checkpoint
- block_operations
- f2fs_lock_all -- write_lock
- f2fs_truncate_blocks
- f2fs_lock_op -- read_lock
But it breaks the sematics of cp_rwsem, in other callers like:
- f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
- f2fs_direct_IO -> f2fs_write_failed
We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
bitmap update, which can cause further data corruption.
So this patch reverts previous fix implementation, and try to fix
deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
only for quota file, and keep the preallocated data/node in the tail of
quota file, we can expecte that the preallocated space can be used to
store quota info latter soon.
Fixes: af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
For all ordered cases in f2fs_wait_on_page_writeback(), we need to
check PageWriteback status, so let's clean up to relocate the check
into f2fs_wait_on_page_writeback().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One report says memalloc failure during mount.
(unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
(show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
(dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
(warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
(__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
(kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
(build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
(f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
(mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
(f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
(mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
(vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
(do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
(SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull XArray conversion from Matthew Wilcox:
"The XArray provides an improved interface to the radix tree data
structure, providing locking as part of the API, specifying GFP flags
at allocation time, eliminating preloading, less re-walking the tree,
more efficient iterations and not exposing RCU-protected pointers to
its users.
This patch set
1. Introduces the XArray implementation
2. Converts the pagecache to use it
3. Converts memremap to use it
The page cache is the most complex and important user of the radix
tree, so converting it was most important. Converting the memremap
code removes the only other user of the multiorder code, which allows
us to remove the radix tree code that supported it.
I have 40+ followup patches to convert many other users of the radix
tree over to the XArray, but I'd like to get this part in first. The
other conversions haven't been in linux-next and aren't suitable for
applying yet, but you can see them in the xarray-conv branch if you're
interested"
* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
radix tree: Remove multiorder support
radix tree test: Convert multiorder tests to XArray
radix tree tests: Convert item_delete_rcu to XArray
radix tree tests: Convert item_kill_tree to XArray
radix tree tests: Move item_insert_order
radix tree test suite: Remove multiorder benchmarking
radix tree test suite: Remove __item_insert
memremap: Convert to XArray
xarray: Add range store functionality
xarray: Move multiorder_check to in-kernel tests
xarray: Move multiorder_shrink to kernel tests
xarray: Move multiorder account test in-kernel
radix tree test suite: Convert iteration test to XArray
radix tree test suite: Convert tag_tagged_items to XArray
radix tree: Remove radix_tree_clear_tags
radix tree: Remove radix_tree_maybe_preload_order
radix tree: Remove split/join code
radix tree: Remove radix_tree_update_node_t
page cache: Finish XArray conversion
dax: Convert page fault handlers to XArray
...
|
| |
| |
| |
| |
| |
| | |
This is a straightforward conversion.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|/
|
|
|
|
|
|
| |
Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
If caller of __get_meta_page() can handle error, let's propagate error
from __get_meta_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Wen Xu reported in bugzilla, after image was injected with random data
by fuzzing, inline inode would contain invalid reserved blkaddr, then
during inline conversion, we will encounter illegal memory accessing
reported by KASAN, the root cause of this is when writing out converted
inline page, we will use invalid reserved blkaddr to update sit bitmap,
result in accessing memory beyond sit bitmap boundary.
In order to fix this issue, let's do sanity check with reserved block
address of inline inode to avoid above condition.
https://bugzilla.kernel.org/show_bug.cgi?id=200179
[ 1428.846352] BUG: KASAN: use-after-free in update_sit_entry+0x80/0x7f0
[ 1428.846618] Read of size 4 at addr ffff880194483540 by task a.out/2741
[ 1428.846855] CPU: 0 PID: 2741 Comm: a.out Tainted: G W 4.17.0+ #1
[ 1428.846858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 1428.846860] Call Trace:
[ 1428.846868] dump_stack+0x71/0xab
[ 1428.846875] print_address_description+0x6b/0x290
[ 1428.846881] kasan_report+0x28e/0x390
[ 1428.846888] ? update_sit_entry+0x80/0x7f0
[ 1428.846898] update_sit_entry+0x80/0x7f0
[ 1428.846906] f2fs_allocate_data_block+0x6db/0xc70
[ 1428.846914] ? f2fs_get_node_info+0x14f/0x590
[ 1428.846920] do_write_page+0xc8/0x150
[ 1428.846928] f2fs_outplace_write_data+0xfe/0x210
[ 1428.846935] ? f2fs_do_write_node_page+0x170/0x170
[ 1428.846941] ? radix_tree_tag_clear+0xff/0x130
[ 1428.846946] ? __mod_node_page_state+0x22/0xa0
[ 1428.846951] ? inc_zone_page_state+0x54/0x100
[ 1428.846956] ? __test_set_page_writeback+0x336/0x5d0
[ 1428.846964] f2fs_convert_inline_page+0x407/0x6d0
[ 1428.846971] ? f2fs_read_inline_data+0x3b0/0x3b0
[ 1428.846978] ? __get_node_page+0x335/0x6b0
[ 1428.846987] f2fs_convert_inline_inode+0x41b/0x500
[ 1428.846994] ? f2fs_convert_inline_page+0x6d0/0x6d0
[ 1428.847000] ? kasan_unpoison_shadow+0x31/0x40
[ 1428.847005] ? kasan_kmalloc+0xa6/0xd0
[ 1428.847024] f2fs_file_mmap+0x79/0xc0
[ 1428.847029] mmap_region+0x58b/0x880
[ 1428.847037] ? arch_get_unmapped_area+0x370/0x370
[ 1428.847042] do_mmap+0x55b/0x7a0
[ 1428.847048] vm_mmap_pgoff+0x16f/0x1c0
[ 1428.847055] ? vma_is_stack_for_current+0x50/0x50
[ 1428.847062] ? __fsnotify_update_child_dentry_flags.part.1+0x160/0x160
[ 1428.847068] ? do_sys_open+0x206/0x2a0
[ 1428.847073] ? __fget+0xb4/0x100
[ 1428.847079] ksys_mmap_pgoff+0x278/0x360
[ 1428.847085] ? find_mergeable_anon_vma+0x50/0x50
[ 1428.847091] do_syscall_64+0x73/0x160
[ 1428.847098] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1428.847102] RIP: 0033:0x7fb1430766ba
[ 1428.847103] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 4d 89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 00
[ 1428.847162] RSP: 002b:00007ffc651d9388 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 1428.847167] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb1430766ba
[ 1428.847170] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[ 1428.847173] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000000
[ 1428.847176] R10: 0000000000008002 R11: 0000000000000246 R12: 0000000000000000
[ 1428.847179] R13: 0000000000001000 R14: 0000000000008002 R15: 0000000000000000
[ 1428.847252] Allocated by task 2683:
[ 1428.847372] kasan_kmalloc+0xa6/0xd0
[ 1428.847380] kmem_cache_alloc+0xc8/0x1e0
[ 1428.847385] getname_flags+0x73/0x2b0
[ 1428.847390] user_path_at_empty+0x1d/0x40
[ 1428.847395] vfs_statx+0xc1/0x150
[ 1428.847401] __do_sys_newlstat+0x7e/0xd0
[ 1428.847405] do_syscall_64+0x73/0x160
[ 1428.847411] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1428.847466] Freed by task 2683:
[ 1428.847566] __kasan_slab_free+0x137/0x190
[ 1428.847571] kmem_cache_free+0x85/0x1e0
[ 1428.847575] filename_lookup+0x191/0x280
[ 1428.847580] vfs_statx+0xc1/0x150
[ 1428.847585] __do_sys_newlstat+0x7e/0xd0
[ 1428.847590] do_syscall_64+0x73/0x160
[ 1428.847596] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1428.847648] The buggy address belongs to the object at ffff880194483300
which belongs to the cache names_cache of size 4096
[ 1428.847946] The buggy address is located 576 bytes inside of
4096-byte region [ffff880194483300, ffff880194484300)
[ 1428.848234] The buggy address belongs to the page:
[ 1428.848366] page:ffffea0006512000 count:1 mapcount:0 mapping:ffff8801f3586380 index:0x0 compound_mapcount: 0
[ 1428.848606] flags: 0x17fff8000008100(slab|head)
[ 1428.848737] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f3586380
[ 1428.848931] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1428.849122] page dumped because: kasan: bad access detected
[ 1428.849305] Memory state around the buggy address:
[ 1428.849436] ffff880194483400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.849620] ffff880194483480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.849804] >ffff880194483500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.849985] ^
[ 1428.850120] ffff880194483580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.850303] ffff880194483600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.850498] ==================================================================
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
In error path of f2fs_move_rehashed_dirents, inode page could be writeback
state, so we should wait on inode page writeback before updating it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the layout of regular dentry block is different from inline dentry
block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not
correct for regular dentry block, besides, bitmap is already copied and
used, so there is no necessary to zero page at all, so just remove the
zero_user_segment is OK.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|