summaryrefslogtreecommitdiffstats
path: root/fs/f2fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'f2fs-for-6-6-rc1' of ↵Linus Torvalds2023-09-0214-168/+261
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we don't have a highlighted feature enhancement, but mostly have fixed issues mainly in two parts: 1) zoned block device, and 2) compression support. For zoned block device, we've tried to improve the power-off recovery flow as much as possible. For compression, we found some corner cases caused by wrong compression policy and logics. Other than them, there were some reverts and stat corrections. Bug fixes: - use finish zone command when closing a zone - check zone type before sending async reset zone command - fix to assign compress_level for lz4 correctly - fix error path of f2fs_submit_page_read() - don't {,de}compress non-full cluster - send small discard commands during checkpoint back - flush inode if atomic file is aborted - correct to account gc/cp stats And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups" * tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits) f2fs: use finish zone command when closing a zone f2fs: compress: fix to assign compress_level for lz4 correctly f2fs: fix error path of f2fs_submit_page_read() f2fs: clean up error handling in sanity_check_{compress_,}inode() f2fs: avoid false alarm of circular locking Revert "f2fs: do not issue small discard commands during checkpoint" f2fs: doc: fix description of max_small_discards f2fs: should update REQ_TIME for direct write f2fs: fix to account cp stats correctly f2fs: fix to account gc stats correctly f2fs: remove unneeded check condition in __f2fs_setxattr() f2fs: fix to update i_ctime in __f2fs_setxattr() Revert "f2fs: fix to do sanity check on extent cache correctly" f2fs: increase usage of folio_next_index() helper f2fs: Only lfs mode is allowed with zoned block device feature f2fs: check zone type before sending async reset zone command f2fs: compress: don't {,de}compress non-full cluster f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted f2fs: don't reopen the main block device in f2fs_scan_devices f2fs: fix to avoid mmap vs set_compress_option case ...
| * f2fs: use finish zone command when closing a zoneDaeho Jeong2023-08-251-6/+13
| | | | | | | | | | | | | | Use the finish zone command first when a zone should be closed. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: compress: fix to assign compress_level for lz4 correctlyChao Yu2023-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After remount, F2FS_OPTION().compress_level was assgin to LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it. 1. mount /dev/vdb /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...) 2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/ 3. mount|grep f2fs /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...) Fixes: 00e120b5e4b5 ("f2fs: assign default compression level") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix error path of f2fs_submit_page_read()Chao Yu2023-08-231-0/+3
| | | | | | | | | | | | | | | | In error path of f2fs_submit_page_read(), it missed to call iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: clean up error handling in sanity_check_{compress_,}inode()Chao Yu2023-08-231-19/+4
| | | | | | | | | | | | | | | | | | In sanity_check_{compress_,}inode(), it doesn't need to set SBI_NEED_FSCK in each error case, instead, we can set the flag in do_read_inode() only once when sanity_check_inode() fails. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid false alarm of circular lockingJaegeuk Kim2023-08-212-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted ------------------------------------------------------ syz-executor273/5027 is trying to acquire lock: ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline] ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644 but task is already holding lock: ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline] ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fi->i_xattr_sem){.+.+}-{3:3}: down_read+0x9c/0x470 kernel/locking/rwsem.c:1520 f2fs_down_read fs/f2fs/f2fs.h:2108 [inline] f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532 __f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179 f2fs_acl_create fs/f2fs/acl.c:377 [inline] f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420 f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558 f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740 f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788 f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827 f2fs_add_link fs/f2fs/f2fs.h:3554 [inline] f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781 vfs_mkdir+0x532/0x7e0 fs/namei.c:4117 do_mkdirat+0x2a9/0x330 fs/namei.c:4140 __do_sys_mkdir fs/namei.c:4160 [inline] __se_sys_mkdir fs/namei.c:4158 [inline] __x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (&fi->i_sem){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 down_write+0x93/0x200 kernel/locking/rwsem.c:1573 f2fs_down_write fs/f2fs/f2fs.h:2133 [inline] f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644 f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784 f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827 f2fs_add_link fs/f2fs/f2fs.h:3554 [inline] f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781 vfs_mkdir+0x532/0x7e0 fs/namei.c:4117 ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline] ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146 ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309 ovl_make_workdir fs/overlayfs/super.c:711 [inline] ovl_get_workdir fs/overlayfs/super.c:864 [inline] ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400 vfs_get_super+0xf9/0x290 fs/super.c:1152 vfs_get_tree+0x88/0x350 fs/super.c:1519 do_new_mount fs/namespace.c:3335 [inline] path_mount+0x1492/0x1ed0 fs/namespace.c:3662 do_mount fs/namespace.c:3675 [inline] __do_sys_mount fs/namespace.c:3884 [inline] __se_sys_mount fs/namespace.c:3861 [inline] __x64_sys_mount+0x293/0x310 fs/namespace.c:3861 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&fi->i_xattr_sem); lock(&fi->i_sem); lock(&fi->i_xattr_sem); lock(&fi->i_sem); Cc: <stable@vger.kernel.org> Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com Fixes: 5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock" Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * Revert "f2fs: do not issue small discard commands during checkpoint"Chao Yu2023-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we have two mechanisms to cache & submit small discards: a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint will cache small discard candidates w/ configured maximum number. b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM interface is asynchronized, so it won't submit discard directly. Finally, discard thread will submit them in background periodically. However, after commit 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint"), the mechanism a) is broken, since no matter how we configure the sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small discard candidates any more. echo 0 > /sys/fs/f2fs/vdb/max_small_discards xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync" xfs_io /mnt/f2fs/file -c "fpunch 0 4k" sync cat /proc/fs/f2fs/vdb/discard_plist_info |head -2 echo 100 > /sys/fs/f2fs/vdb/max_small_discards rm /mnt/f2fs/file xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync" xfs_io /mnt/f2fs/file -c "fpunch 0 4k" sync cat /proc/fs/f2fs/vdb/discard_plist_info |head -2 Before the patch: Discard pend list(Show diacrd_cmd count on each entry, .:not exist): 0 . . . . . . . . Discard pend list(Show diacrd_cmd count on each entry, .:not exist): 0 3 1 . . . . . . After the patch: Discard pend list(Show diacrd_cmd count on each entry, .:not exist): 0 . . . . . . . . Discard pend list(Show diacrd_cmd count on each entry, .:not exist): 0 . . . . . . . . This patch reverts commit 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint") in order to fix this issue. Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: should update REQ_TIME for direct writeZhiguo Niu2023-08-141-0/+1
| | | | | | | | | | | | | | | | | | | | The sending interval of discard and GC should also consider direct write requests; filesystem is not idle if there is direct write. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to account cp stats correctlyChao Yu2023-08-148-17/+50
| | | | | | | | | | | | | | | | | | cp_foreground_calls sysfs entry shows total CP call count rather than foreground CP call count, fix it. Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to account gc stats correctlyChao Yu2023-08-147-35/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported, status debugfs entry shows inconsistent GC stats as below: GC calls: 6008 (BG: 6161) - data segments : 3053 (BG: 3053) - node segments : 2955 (BG: 2955) Total GC calls is larger than BGGC calls, the reason is: - f2fs_stat_info.call_count accounts total migrated section count by f2fs_gc() - f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from background gc_thread Another issue is gc_foreground_calls sysfs entry shows total GC call count rather than FGGC call count. This patch changes as below for fix: - account GC calls and migrated segment count separately - support to account migrated section count if it enables large section mode - fix to show correct value in gc_foreground_calls sysfs entry Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove unneeded check condition in __f2fs_setxattr()Chao Yu2023-08-141-1/+1
| | | | | | | | | | | | | | | | It has checked return value of write_all_xattrs(), remove unneeded following check condition. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to update i_ctime in __f2fs_setxattr()Chao Yu2023-08-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic/728 - output mismatch (see /media/fstests/results//generic/728.out.bad) --- tests/generic/728.out 2023-07-19 07:10:48.362711407 +0000 +++ /media/fstests/results//generic/728.out.bad 2023-07-19 08:39:57.000000000 +0000 QA output created by 728 +Expected ctime to change after setxattr. +Expected ctime to change after removexattr. Silence is golden ... (Run 'diff -u /media/fstests/tests/generic/728.out /media/fstests/results//generic/728.out.bad' to see the entire diff) generic/729 1s It needs to update i_ctime after {set,remove}xattr, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * Revert "f2fs: fix to do sanity check on extent cache correctly"Chao Yu2023-08-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports a f2fs bug as below: UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3275:19 index 1409 is out of range for type '__le32[923]' (aka 'unsigned int[923]') Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 inline_data_addr fs/f2fs/f2fs.h:3275 [inline] __recover_inline_status fs/f2fs/inode.c:113 [inline] do_read_inode fs/f2fs/inode.c:480 [inline] f2fs_iget+0x4730/0x48b0 fs/f2fs/inode.c:604 f2fs_fill_super+0x640e/0x80c0 fs/f2fs/super.c:4601 mount_bdev+0x276/0x3b0 fs/super.c:1391 legacy_get_tree+0xef/0x190 fs/fs_context.c:611 vfs_get_tree+0x8c/0x270 fs/super.c:1519 do_new_mount+0x28f/0xae0 fs/namespace.c:3335 do_mount fs/namespace.c:3675 [inline] __do_sys_mount fs/namespace.c:3884 [inline] __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The issue was bisected to: commit d48a7b3a72f121655d95b5157c32c7d555e44c05 Author: Chao Yu <chao@kernel.org> Date: Mon Jan 9 03:49:20 2023 +0000 f2fs: fix to do sanity check on extent cache correctly The root cause is we applied both v1 and v2 of the patch, v2 is the right fix, so it needs to revert v1 in order to fix reported issue. v1: commit d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly") https://lore.kernel.org/lkml/20230109034920.492914-1-chao@kernel.org/ v2: commit 269d11948100 ("f2fs: fix to do sanity check on extent cache correctly") https://lore.kernel.org/lkml/20230207134808.1827869-1-chao@kernel.org/ Reported-by: syzbot+601018296973a481f302@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000fcf0690600e4d04d@google.com/ Fixes: d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: increase usage of folio_next_index() helperMinjie Du2023-08-141-2/+1
| | | | | | | | | | | | | | | | | | Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Only lfs mode is allowed with zoned block device featureChunhai Guo2023-08-141-5/+5
| | | | | | | | | | | | | | | | | | | | Now f2fs support four block allocation modes: lfs, adaptive, fragment:segment, fragment:block. Only lfs mode is allowed with zoned block device feature. Fixes: 6691d940b0e0 ("f2fs: introduce fragment allocation mode mount option") Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: check zone type before sending async reset zone commandShin'ichiro Kawasaki2023-08-142-11/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 25f9080576b9 ("f2fs: add async reset zone command support") introduced "async reset zone commands" by calling __submit_zone_reset_cmd() in async discard operations. However, __submit_zone_reset_cmd() is called regardless of zone type of discard target zone. When devices have conventional zones, zone reset commands are sent to the conventional zones and cause I/O errors. Avoid the I/O errors by checking that the discard target zone type is sequential write required. If not, handle the discard operation in same manner as non-zoned, regular block devices. For that purpose, add a new helper function f2fs_bdev_index() which gets index of the zone reset target device. Fixes: 25f9080576b9 ("f2fs: add async reset zone command support") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: compress: don't {,de}compress non-full clusterChao Yu2023-08-141-12/+8
| | | | | | | | | | | | | | | | f2fs won't compress non-full cluster in tail of file, let's skip dirtying and rewrite such cluster during f2fs_ioc_{,de}compress_file. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: allow f2fs_ioc_{,de}compress_file to be interruptedChao Yu2023-08-141-0/+12
| | | | | | | | | | | | | | | | | | This patch allows f2fs_ioc_{,de}compress_file() to be interrupted, so that, userspace won't be blocked when manual {,de}compression on large file is interrupted by signal. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't reopen the main block device in f2fs_scan_devicesChristoph Hellwig2023-08-141-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | f2fs_scan_devices reopens the main device since the very beginning, which has always been useless, and also means that we don't pass the right holder for the reopen, which now leads to a warning as the core super.c holder ops aren't passed in for the reopen. Fixes: 3c62be17d4f5 ("f2fs: support multiple devices") Fixes: 0718afd47f70 ("block: introduce holder ops") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to avoid mmap vs set_compress_option caseChao Yu2023-08-142-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compression option in inode should not be changed after they have been used, however, it may happen in below race case: Thread A Thread B - f2fs_ioc_set_compress_option - check f2fs_is_mmap_file() - check get_dirty_pages() - check F2FS_HAS_BLOCKS() - f2fs_file_mmap - set_inode_flag(FI_MMAP_FILE) - fault - do_page_mkwrite - f2fs_vm_page_mkwrite - f2fs_get_block_locked - fault_dirty_shared_page - set_page_dirty - update i_compress_algorithm - update i_log_cluster_size - update i_cluster_size Avoid such race condition by covering f2fs_file_mmap() w/ i_sem lock, meanwhile add mmap file check condition in f2fs_may_compress() as well. Fixes: e1e8debec656 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: get out of a repeat loop when getting a locked data pageJaegeuk Kim2023-08-141-6/+2
| | | | | | | | | | | | | | | | | | | | https://bugzilla.kernel.org/show_bug.cgi?id=216050 Somehow we're getting a page which has a different mapping. Let's avoid the infinite loop. Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: flush inode if atomic file is abortedJaegeuk Kim2023-08-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's flush the inode being aborted atomic operation to avoid stale dirty inode during eviction in this call stack: f2fs_mark_inode_dirty_sync+0x22/0x40 [f2fs] f2fs_abort_atomic_write+0xc4/0xf0 [f2fs] f2fs_evict_inode+0x3f/0x690 [f2fs] ? sugov_start+0x140/0x140 evict+0xc3/0x1c0 evict_inodes+0x17b/0x210 generic_shutdown_super+0x32/0x120 kill_block_super+0x21/0x50 deactivate_locked_super+0x31/0x90 cleanup_mnt+0x100/0x160 task_work_run+0x59/0x90 do_exit+0x33b/0xa50 do_group_exit+0x2d/0x80 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd This triggers f2fs_bug_on() in f2fs_evict_inode: f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE)); This fixes the syzbot report: loop0: detected capacity change from 0 to 131072 F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): Found nat_bits in checkpoint F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:869! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 5014 Comm: syz-executor220 Not tainted 6.4.0-syzkaller-11479-g6cd06ab12d1a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869 Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007 RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000 R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0 Call Trace: <TASK> evict+0x2ed/0x6b0 fs/inode.c:665 dispose_list+0x117/0x1e0 fs/inode.c:698 evict_inodes+0x345/0x440 fs/inode.c:748 generic_shutdown_super+0xaf/0x480 fs/super.c:478 kill_block_super+0x64/0xb0 fs/super.c:1417 kill_f2fs_super+0x2af/0x3c0 fs/f2fs/super.c:4704 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1254 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa9a/0x29a0 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f309be71a09 Code: Unable to access opcode bytes at 0x7f309be719df. RSP: 002b:00007fff171df518 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f309bef7330 RCX: 00007f309be71a09 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 00007f309bef1e40 R10: 0000000000010600 R11: 0000000000000246 R12: 00007f309bef7330 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869 Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007 RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000 R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0 Cc: <stable@vger.kernel.org> Reported-and-tested-by: syzbot+e1246909d526a9d470fa@syzkaller.appspotmail.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't handle error case of f2fs_compress_alloc_page()Chao Yu2023-08-141-13/+1
| | | | | | | | | | | | | | | | f2fs_compress_alloc_page() uses mempool to allocate memory, it never fail, don't handle error case in its callers. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * Revert "f2fs: clean up w/ sbi->log_sectors_per_block"Jaegeuk Kim2023-08-141-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bfd476623999118d9c509cb0fa9380f2912bc225. Shinichiro Kawasaki reported: When I ran workloads on f2fs using v6.5-rcX with fixes [1][2] and a zoned block devices with 4kb logical block size, I observe mount failure as follows. When I revert this commit, the failure goes away. [ 167.781975][ T1555] F2FS-fs (dm-0): IO Block Size: 4 KB [ 167.890728][ T1555] F2FS-fs (dm-0): Found nat_bits in checkpoint [ 171.482588][ T1555] F2FS-fs (dm-0): Zone without valid block has non-zero write pointer. Reset the write pointer: wp[0x1300,0x8] [ 171.496000][ T1555] F2FS-fs (dm-0): (0) : Unaligned zone reset attempted (block 280000 + 80000) [ 171.505037][ T1555] F2FS-fs (dm-0): Discard zone failed: (errno=-5) The patch replaced "sbi->log_blocksize - SECTOR_SHIFT" with "sbi->log_sectors_per_block". However, I think these two are not equal when the device has 4k logical block size. The former uses Linux kernel sector size 512 byte. The latter use 512b sector size or 4kb sector size depending on the device. mkfs.f2fs obtains logical block size via BLKSSZGET ioctl from the device and reflects it to the value sbi->log_sector_size_per_block. This causes unexpected write pointer calculations in check_zone_write_pointer(). This resulted in unexpected zone reset and the mount failure. [1] https://lkml.kernel.org/linux-f2fs-devel/20230711050101.GA19128@lst.de/ [2] https://lore.kernel.org/linux-f2fs-devel/20230804091556.2372567-1-shinichiro.kawasaki@wdc.com/ Cc: stable@vger.kernel.org Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: bfd476623999 ("f2fs: clean up w/ sbi->log_sectors_per_block") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds2023-08-292-1/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
| * | fs: add CONFIG_BUFFER_HEADChristoph Hellwig2023-08-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | fs: rename and move block_page_mkwrite_returnChristoph Hellwig2023-08-021-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | block_page_mkwrite_return is neither block nor mkwrite specific, and should not be under CONFIG_BLOCK. Move it to mm.h and rename it to vmf_fs_error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2023-08-282-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull iomap updates from Darrick Wong: "We've got some big changes for this release -- I'm very happy to be landing willy's work to enable large folios for the page cache for general read and write IOs when the fs can make contiguous space allocations, and Ritesh's work to track sub-folio dirty state to eliminate the write amplification problems inherent in using large folios. As a bonus, io_uring can now process write completions in the caller's context instead of bouncing through a workqueue, which should reduce io latency dramatically. IOWs, XFS should see a nice performance bump for both IO paths. Summary: - Make large writes to the page cache fill sparse parts of the cache with large folios, then use large memcpy calls for the large folio. - Track the per-block dirty state of each large folio so that a buffered write to a single byte on a large folio does not result in a (potentially) multi-megabyte writeback IO. - Allow some directio completions to be performed in the initiating task's context instead of punting through a workqueue. This will reduce latency for some io_uring requests" * tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits) iomap: support IOCB_DIO_CALLER_COMP io_uring/rw: add write support for IOCB_DIO_CALLER_COMP fs: add IOCB flags related to passing back dio completions iomap: add IOMAP_DIO_INLINE_COMP iomap: only set iocb->private for polled bio iomap: treat a write through cache the same as FUA iomap: use an unsigned type for IOMAP_DIO_* defines iomap: cleanup up iomap_dio_bio_end_io() iomap: Add per-block dirty state tracking to improve performance iomap: Allocate ifs in ->write_begin() early iomap: Refactor iomap_write_delalloc_punch() function out iomap: Use iomap_punch_t typedef iomap: Fix possible overflow condition in iomap_write_delalloc_scan iomap: Add some uptodate state handling helpers for ifs state bitmap iomap: Drop ifs argument from iomap_set_range_uptodate() iomap: Rename iomap_page to iomap_folio_state and others iomap: Copy larger chunks from userspace iomap: Create large folios in the buffered write path filemap: Allow __filemap_get_folio to allocate large folios filemap: Add fgf_t typedef ...
| * | filemap: Add fgf_t typedefMatthew Wilcox (Oracle)2023-07-242-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Similarly to gfp_t, define fgf_t as its own type to prevent various misuses and confusion. Leave the flags as FGP_* for now to reduce the size of this patch; they will be converted to FGF_* later. Move the documentation to the definition of the type insted of burying it in the __filemap_get_folio() documentation. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
* | Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds2023-08-282-7/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
| * \ Merge tag 'vfs-6.6-merge-2' of ↵Christian Brauner2023-08-231-3/+5
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://gitolite.kernel.org/pub/scm/fs/xfs/xfs-linux Pull filesystem freezing updates from Darrick Wong: New code for 6.6: * Allow the kernel to initiate a freeze of a filesystem. The kernel and userspace can both hold a freeze on a filesystem at the same time; the freeze is not lifted until /both/ holders lift it. This will enable us to fix a longstanding bug in XFS online fsck. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Message-Id: <20230822182604.GB11286@frogsfrogsfrogs> Signed-off-by: Christian Brauner <brauner@kernel.org>
| | * | fs: distinguish between user initiated freeze and kernel initiated freezeDarrick J. Wong2023-07-171-3/+5
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can freeze a filesystem using the FIFREEZE ioctl or by suspending the block device; this state persists until userspace thaws the filesystem with the FITHAW ioctl or resuming the block device. Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for the fsfreeze ioctl") we only allow the first freeze command to succeed. The kernel may decide that it is necessary to freeze a filesystem for its own internal purposes, such as suspends in progress, filesystem fsck activities, or quiescing a device prior to removal. Userspace thaw commands must never break a kernel freeze, and kernel thaw commands shouldn't undo userspace's freeze command. Introduce a couple of freeze holder flags and wire it into the sb_writers state. One kernel and one userspace freeze are allowed to coexist at the same time; the filesystem will not thaw until both are lifted. I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing behaviors. Cc: mcgrof@kernel.org Cc: jack@suse.cz Cc: hch@infradead.org Cc: ruansy.fnst@fujitsu.com Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
| * / fs: use the super_block as holder when mounting file systemsChristoph Hellwig2023-08-111-4/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | The file system type is not a very useful holder as it doesn't allow us to go back to the actual file system instance. Pass the super_block instead which is useful when passed back to the file system driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230802154131.2221419-7-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | fs: pass the request_mask to generic_fillattrJeff Layton2023-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | f2fs: convert to ctime accessor functionsJeff Layton2023-07-249-31/+33
|/ | | | | | | | | | | In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-41-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
* Merge tag 'f2fs-for-6.5-rc1' of ↵Linus Torvalds2023-07-0518-404/+1031
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we've mainly investigated the zoned block device support along with patches such as correcting write pointers between f2fs and storage, adding asynchronous zone reset flow, and managing the number of open zones. Other than them, f2fs adds another mount option, "errors=x" to specify how to handle when it detects an unexpected behavior at runtime. Enhancements: - support 'errors=remount-ro|continue|panic' mount option - enforce some inode flag policies - allow .tmp compression given extensions - add some ioctls to manage the f2fs compression - improve looped node chain flow - avoid issuing small-sized discard commands during checkpoint - implement an asynchronous zone reset Bug fixes: - fix deadlock in xattr and inode page lock - fix and add sanity check in some error paths - fix to avoid NULL pointer dereference f2fs_write_end_io() along with put_super - set proper flags to quota files - fix potential deadlock due to unpaired node_write lock use - fix over-estimating free section during FG GC - fix the wrong condition to determine atomic context As usual, also there are a number of patches with code refactoring and minor clean-ups" * tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits) f2fs: fix to do sanity check on direct node in truncate_dnode() f2fs: only set release for file that has compressed data f2fs: fix compile warning in f2fs_destroy_node_manager() f2fs: fix error path handling in truncate_dnode() f2fs: fix deadlock in i_xattr_sem and inode page lock f2fs: remove unneeded page uptodate check/set f2fs: update mtime and ctime in move file range method f2fs: compress tmp files given extension f2fs: refactor struct f2fs_attr macro f2fs: convert to use sbi directly f2fs: remove redundant assignment to variable err f2fs: do not issue small discard commands during checkpoint f2fs: check zone write pointer points to the end of zone f2fs: add f2fs_ioc_get_compress_blocks f2fs: cleanup MIN_INLINE_XATTR_SIZE f2fs: add helper to check compression level f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method f2fs: do more sanity check on inode f2fs: compress: fix to check validity of i_compress_flag field f2fs: add sanity compress level check for compressed file ...
| * f2fs: fix to do sanity check on direct node in truncate_dnode()Chao Yu2023-06-303-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports below bug: BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574 Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000 CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351 print_report mm/kasan/report.c:462 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:572 f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574 truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944 f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154 f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721 f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749 f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799 f2fs_truncate include/linux/fs.h:825 [inline] f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006 notify_change+0xb2c/0x1180 fs/attr.c:483 do_truncate+0x143/0x200 fs/open.c:66 handle_truncate fs/namei.c:3295 [inline] do_open fs/namei.c:3640 [inline] path_openat+0x2083/0x2750 fs/namei.c:3791 do_filp_open+0x1ba/0x410 fs/namei.c:3818 do_sys_openat2+0x16d/0x4c0 fs/open.c:1356 do_sys_open fs/open.c:1372 [inline] __do_sys_creat fs/open.c:1448 [inline] __se_sys_creat fs/open.c:1442 [inline] __x64_sys_creat+0xcd/0x120 fs/open.c:1442 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is, inodeA references inodeB via inodeB's ino, once inodeA is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's node page, it traverse mapping data from node->i.i_addr[0] to node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access. This patch fixes to add sanity check on dnode page in truncate_dnode(), so that, it can help to avoid triggering such issue, and once it encounters such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE error into superblock, later fsck can detect such issue and try repairing. Also, it removes f2fs_truncate_data_blocks() for cleanup due to the function has only one caller, and uses f2fs_truncate_data_blocks_range() instead. Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: only set release for file that has compressed dataSheng Yong2023-06-301-3/+5
| | | | | | | | | | | | | | | | | | | | If a file is not comprssed yet or does not have compressed data, for example, its data has a very low compression ratio, do not set FI_COMPRESS_RELEASED flag. Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix compile warning in f2fs_destroy_node_manager()Chao Yu2023-06-302-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | fs/f2fs/node.c: In function ‘f2fs_destroy_node_manager’: fs/f2fs/node.c:3390:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=] 3390 | } Merging below pointer arrays into common one, and reuse it by cast type. struct nat_entry *natvec[NATVEC_SIZE]; struct nat_entry_set *setvec[SETVEC_SIZE]; Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix error path handling in truncate_dnode()Chao Yu2023-06-301-1/+3
| | | | | | | | | | | | | | | | | | If truncate_node() fails in truncate_dnode(), it missed to call f2fs_put_page(), fix it. Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix deadlock in i_xattr_sem and inode page lockJaegeuk Kim2023-06-302-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thread #1: [122554.641906][ T92] f2fs_getxattr+0xd4/0x5fc -> waiting for f2fs_down_read(&F2FS_I(inode)->i_xattr_sem); [122554.641927][ T92] __f2fs_get_acl+0x50/0x284 [122554.641948][ T92] f2fs_init_acl+0x84/0x54c [122554.641969][ T92] f2fs_init_inode_metadata+0x460/0x5f0 [122554.641990][ T92] f2fs_add_inline_entry+0x11c/0x350 -> Locked dir->inode_page by f2fs_get_node_page() [122554.642009][ T92] f2fs_do_add_link+0x100/0x1e4 [122554.642025][ T92] f2fs_create+0xf4/0x22c [122554.642047][ T92] vfs_create+0x130/0x1f4 Thread #2: [123996.386358][ T92] __get_node_page+0x8c/0x504 -> waiting for dir->inode_page lock [123996.386383][ T92] read_all_xattrs+0x11c/0x1f4 [123996.386405][ T92] __f2fs_setxattr+0xcc/0x528 [123996.386424][ T92] f2fs_setxattr+0x158/0x1f4 -> f2fs_down_write(&F2FS_I(inode)->i_xattr_sem); [123996.386443][ T92] __f2fs_set_acl+0x328/0x430 [123996.386618][ T92] f2fs_set_acl+0x38/0x50 [123996.386642][ T92] posix_acl_chmod+0xc8/0x1c8 [123996.386669][ T92] f2fs_setattr+0x5e0/0x6bc [123996.386689][ T92] notify_change+0x4d8/0x580 [123996.386717][ T92] chmod_common+0xd8/0x184 [123996.386748][ T92] do_fchmodat+0x60/0x124 [123996.386766][ T92] __arm64_sys_fchmodat+0x28/0x3c Cc: <stable@vger.kernel.org> Fixes: 27161f13e3c3 "f2fs: avoid race in between read xattr & write xattr" Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove unneeded page uptodate check/setYunlei He2023-06-261-2/+0
| | | | | | | | | | | | | | | | | | This patch remove unneeded page uptodate check/set in f2fs_vm_page_mkwrite, which already done in set_page_dirty. Signed-off-by: Yunlei He <heyunlei@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: update mtime and ctime in move file range methodYunlei He2023-06-261-0/+11
| | | | | | | | | | | | | | | | | | Mtime and ctime stay old value without update after move file range ioctl. This patch add time update. Signed-off-by: Yunlei He <heyunlei@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: compress tmp files given extensionJaegeuk Kim2023-06-261-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's compress tmp files for the given extension list. This patch does not change the previous behavior, but allow the cases as below. Extention example: "ext" - abc.ext : allow - abc.ext.abc : allow - abc.extm : not allow Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: refactor struct f2fs_attr macroYangtao Li2023-06-261-91/+149
| | | | | | | | | | | | | | | | | | | | | | | | This patch provides a large number of variants of F2FS_RW_ATTR and F2FS_RO_ATTR macros, reducing the number of parameters required to initialize the f2fs_attr structure. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304152234.wjaY3IYm-lkp@intel.com/ Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: convert to use sbi directlyYangtao Li2023-06-261-6/+6
| | | | | | | | | | | | | | | | F2FS_I_SB(inode) is redundant. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove redundant assignment to variable errColin Ian King2023-06-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | The assignment to variable err is redundant since the code jumps to label next and err is then re-assigned a new value on the call to sanity_check_node_chain. Remove the assignment. Cleans up clang scan build warning: fs/f2fs/recovery.c:464:6: warning: Value stored to 'err' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: do not issue small discard commands during checkpointJaegeuk Kim2023-06-261-1/+1
| | | | | | | | | | | | | | If there're huge # of small discards, this will increase checkpoint latency insanely. Let's issue small discards only by trim. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: check zone write pointer points to the end of zoneDaeho Jeong2023-06-261-2/+6
| | | | | | | | | | | | | | | | | | We don't need to report an issue, when the zone write pointer already points to the end of the zone, since the zone mismatch is already taken care. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add f2fs_ioc_get_compress_blocksSheng Yong2023-06-261-6/+17
| | | | | | | | | | | | | | | | | | This patch adds f2fs_ioc_get_compress_blocks() to provide a common f2fs_get_compress_blocks(). Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>