summaryrefslogtreecommitdiffstats
path: root/fs/f2fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * f2fs: fix to limit gc_pin_file_thresholdChao Yu2024-05-094-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | type of f2fs_inode.i_gc_failures, f2fs_inode_info.i_gc_failures, and f2fs_sb_info.gc_pin_file_threshold is __le16, unsigned int, and u64, so it will cause truncation during comparison and persistence. Unifying variable of these three variables to unsigned short, and add an upper boundary limitation for gc_pin_file_threshold. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove unused GC_FAILURE_PINChao Yu2024-05-094-22/+13
| | | | | | | | | | | | | | | | | | | | | | | | After commit 3db1de0e582c ("f2fs: change the current atomic write way"), we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[] array to i_pin_failure for cleanup. Meanwhile, let's define i_current_depth and i_gc_failures as union variable due to they won't be valid at the same time. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use f2fs_{err,info}_ratelimited() for cleanupChao Yu2024-05-092-33/+26
| | | | | | | | | | | | | | | | Commit b1c9d3f833ba ("f2fs: support printk_ratelimited() in f2fs_printk()") missed some cases, cover all remains for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix block migration when section is not aligned to pow2Wu Bo2024-05-091-9/+8
| | | | | | | | | | | | | | | | | | | | | | As for zoned-UFS, f2fs section size is forced to zone size. And zone size may not aligned to pow2. Fixes: 859fca6b706e ("f2fs: swap: support migrating swapfile in aligned write mode") Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: zone: fix to don't trigger OPU on pinfile for direct IOChao Yu2024-04-291-2/+3
| | | | | | | | | | | | | | | | | | Otherwise, it breaks pinfile's sematics. Cc: Daeho Jeong <daeho43@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()Chao Yu2024-04-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports a kernel bug as below: F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ================================================================== BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is we missed to do sanity check on i_xattr_nid during f2fs_iget(), so that in fiemap() path, current_nat_addr() will access nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering kasan bug report, fix it. Reported-and-tested-by: syzbot+3694e283cf5c40df6d14@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/00000000000094036c0616e72a1d@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to avoid allocating WARM_DATA segment for direct IOChao Yu2024-04-294-6/+15
| | | | | | | | | | | | | | | | | | If active_log is not 6, we never use WARM_DATA segment, let's avoid allocating WARM_DATA segment for direct IO. Signed-off-by: Yunlei He <heyunlei@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove redundant parameter in is_next_segment_free()Yifan Zhao2024-04-291-3/+2
| | | | | | | | | | | | | | | | is_next_segment_free() takes a redundant `type` parameter. Remove it. Signed-off-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use helper to print zone conditionWu Bo2024-04-251-14/+3
| | | | | | | | | | | | | | | | To make code clean, use blk_zone_cond_str() to print debug information. Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix false alarm on invalid block addressJaegeuk Kim2024-04-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | f2fs_ra_meta_pages can try to read ahead on invalid block address which is not the corruption case. Cc: <stable@kernel.org> # v6.9+ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218770 Fixes: 31f85ccc84b8 ("f2fs: unify the error handling of f2fs_is_valid_blkaddr") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: clear writeback when compression failedJaegeuk Kim2024-04-251-2/+38
| | | | | | | | | | | | | | | | Let's stop issuing compressed writes and clear their writeback flags. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove unnecessary block size check in init_f2fs_fs()Chao Yu2024-04-191-6/+0
| | | | | | | | | | | | | | | | After commit d7e9a9037de2 ("f2fs: Support Block Size == Page Size"), F2FS_BLKSIZE equals to PAGE_SIZE, remove unnecessary check condition. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix comment in sanity_check_raw_super()Chao Yu2024-04-191-1/+1
| | | | | | | | | | | | | | | | Commit d7e9a9037de2 ("f2fs: Support Block Size == Page Size") missed to adjust comment in sanity_check_raw_super(), fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: convert f2fs__page tracepoint class to use folioChao Yu2024-04-193-9/+9
| | | | | | | | | | | | | | | | Convert f2fs__page tracepoint class() and its instances to use folio and related functionality, and rename it to f2fs__folio(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: convert f2fs_read_inline_data() to use folioChao Yu2024-04-193-25/+24
| | | | | | | | | | | | | | | | Convert f2fs_read_inline_data() to use folio and related functionality, and also convert its caller to use folio. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: convert f2fs_read_single_page() to use folioChao Yu2024-04-191-13/+14
| | | | | | | | | | | | | | | | Convert f2fs_read_single_page() to use folio and related functionality. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: convert f2fs_mpage_readpages() to use folioChao Yu2024-04-191-40/+40
| | | | | | | | | | | | | | | | Convert f2fs_mpage_readpages() to use folio and related functionality. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: assign the write hint per stream by defaultJaegeuk Kim2024-04-194-7/+89
| | | | | | | | | | | | | | | | | | This reverts commit 930e2607638d ("f2fs: remove obsolete whint_mode"), as we decide to pass write hints to the disk. Cc: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: allow direct io of pinned files for zoned storageDaeho Jeong2024-04-141-1/+2
| | | | | | | | | | | | | | | | | | Since the allocation happens in conventional LU for zoned storage, we can allow direct io for that. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: prevent writing without fallocate() for pinned filesDaeho Jeong2024-04-141-9/+16
| | | | | | | | | | | | | | | | | | | | In a case writing without fallocate(), we can't guarantee it's allocated in the conventional area for zoned stroage. To make it consistent across storage devices, we disallow it regardless of storage device types. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use folio_test_writebackJaegeuk Kim2024-04-128-13/+13
| | | | | | | | | | | | | | Let's convert PageWriteback to folio_test_writeback. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add REQ_TIME time update for some user behaviorsZhiguo Niu2024-04-121-3/+12
| | | | | | | | | | | | | | | | | | | | some user behaviors requested filesystem operations, which will cause filesystem not idle. Meanwhile adjust some f2fs_update_time(REQ_TIME) positions. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: write missing last sum blk of file pinning sectionDaeho Jeong2024-04-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | While do not allocating a new section in advance for file pinning area, I missed that we should write the sum block for the last segment of a file pinning section. Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't set RO when shutting down f2fsJaegeuk Kim2024-04-121-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shutdown does not check the error of thaw_super due to readonly, which causes a deadlock like below. f2fs_ioc_shutdown(F2FS_GOING_DOWN_FULLSYNC) issue_discard_thread - bdev_freeze - freeze_super - f2fs_stop_checkpoint() - f2fs_handle_critical_error - sb_start_write - set RO - waiting - bdev_thaw - thaw_super_locked - return -EINVAL, if sb_rdonly() - f2fs_stop_discard_thread -> wait for kthread_stop(discard_thread); Reported-by: "Light Hsieh (謝明燈)" <Light.Hsieh@mediatek.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to check pinfile flag in f2fs_move_file_range()Chao Yu2024-04-121-1/+2
| | | | | | | | | | | | | | | | | | ioctl(F2FS_IOC_MOVE_RANGE) can truncate or punch hole on pinned file, fix to disallow it. Fixes: 5fed0be8583f ("f2fs: do not allow partial truncation on pinned file") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to relocate check condition in f2fs_fallocate()Chao Yu2024-04-121-9/+11
| | | | | | | | | | | | | | | | | | | | compress and pinfile flag should be checked after inode lock held to avoid race condition, fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Fixes: 5fed0be8583f ("f2fs: do not allow partial truncation on pinned file") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: compress: fix to relocate check condition in f2fs_ioc_{,de}compress_file()Chao Yu2024-04-121-8/+4
| | | | | | | | | | | | | | | | | | | | | | Compress flag should be checked after inode lock held to avoid racing w/ f2fs_setflags_common() , fix it. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: compress: fix to relocate check condition in ↵Chao Yu2024-04-121-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | f2fs_{release,reserve}_compress_blocks() Compress flag should be checked after inode lock held to avoid racing w/ f2fs_setflags_common(), fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix zoned block device information initializationWenjie Qi2024-04-092-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | If the max open zones of zoned devices are less than the active logs of F2FS, the device may error due to insufficient zone resources when multiple active logs are being written at the same time. Signed-off-by: Wenjie Qi <qwjhust@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to adjust appropirate defragment pg_endZhiguo Niu2024-04-021-5/+7
| | | | | | | | | | | | | | | | | | | | | | A length that exceeds the real size of the inode may be specified from user, although these out-of-range areas are not mapped, but they still need to be check in while loop, which is unnecessary. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove clear SB_INLINECRYPT flag in default_optionsYunlei He2024-03-291-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set. If create new file or open file during this gap, these files will not use inlinecrypt. Worse case, it may lead to data corruption if wrappedkey_v0 is enable. Thread A: Thread B: -f2fs_remount -f2fs_file_open or f2fs_new_inode -default_options <- clear SB_INLINECRYPT flag -fscrypt_select_encryption_impl -parse_options <- set SB_INLINECRYPT again Signed-off-by: Yunlei He <heyunlei@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to wait on page writeback in __clone_blkaddrs()Chao Yu2024-03-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In below race condition, dst page may become writeback status in __clone_blkaddrs(), it needs to wait writeback before update, fix it. Thread A GC Thread - f2fs_move_file_range - filemap_write_and_wait_range(dst) - gc_data_segment - f2fs_down_write(dst) - move_data_page - set_page_writeback(dst_page) - f2fs_submit_page_write - f2fs_up_write(dst) - f2fs_down_write(dst) - __exchange_data_block - __clone_blkaddrs - f2fs_get_new_data_page - memcpy_page Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: support to map continuous holes or preallocated addressChao Yu2024-03-291-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch supports to map continuous holes or preallocated addresses to improve performace of lookuping mapping info during read DIO. [testcase 1] xfs_io -f /mnt/f2fs/hole -c "truncate 1m" -c "fsync" xfs_io -d /mnt/f2fs/hole -c "pread -b 1m 0 1m" [before] f2fs_direct_IO_enter: dev = (253,16), ino = 6 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 0, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 1, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 2, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 3, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 4, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 5, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 6, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 7, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 8, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 9, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 10, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 11, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 12, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 13, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 14, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 16, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 ...... f2fs_direct_IO_exit: dev = (253,16), ino = 6 pos = 0 len = 1048576 rw = 0 ret = 1048576 [after] f2fs_direct_IO_enter: dev = (253,16), ino = 6 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0 f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_direct_IO_exit: dev = (253,16), ino = 6 pos = 0 len = 1048576 rw = 0 ret = 1048576 [testcase 2] xfs_io -f /mnt/f2fs/preallocated -c "falloc 0 1m" -c "fsync" xfs_io -d /mnt/f2fs/preallocated -c "pread -b 1m 0 1m" [before] f2fs_direct_IO_enter: dev = (253,16), ino = 11 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 0, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 1, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 2, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 3, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 4, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 5, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 6, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 7, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 8, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 9, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 10, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 11, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 12, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 13, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 14, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 16, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 ...... f2fs_direct_IO_exit: dev = (253,16), ino = 11 pos = 0 len = 1048576 rw = 0 ret = 1048576 [after] f2fs_direct_IO_enter: dev = (253,16), ino = 11 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0 f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 0, start blkaddr = 0xffffffff, len = 0x100, flags = 4, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0 f2fs_direct_IO_exit: dev = (253,16), ino = 11 pos = 0 len = 1048576 rw = 0 ret = 1048576 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: multidev: fix to recognize valid zero block addressChao Yu2024-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Yi Zhang in mailing list [1], kernel warning was catched during zbd/010 test as below: ./check zbd/010 zbd/010 (test gap zone support with F2FS) [failed] runtime ... 3.752s something found in dmesg: [ 4378.146781] run blktests zbd/010 at 2024-02-18 11:31:13 [ 4378.192349] null_blk: module loaded [ 4378.209860] null_blk: disk nullb0 created [ 4378.413285] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1) [ 4378.422334] scsi host15: scsi_debug: version 0191 [20210520] dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0 [ 4378.434922] scsi 15:0:0:0: Direct-Access-ZBC Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 4378.443343] scsi 15:0:0:0: Power-on or device reset occurred [ 4378.449371] sd 15:0:0:0: Attached scsi generic sg5 type 20 [ 4378.449418] sd 15:0:0:0: [sdf] Host-managed zoned block device ... (See '/mnt/tests/gitlab.com/api/v4/projects/19168116/repository/archive.zip/storage/blktests/blk/blktests/results/nodev/zbd/010.dmesg' WARNING: CPU: 22 PID: 44011 at fs/iomap/iter.c:51 CPU: 22 PID: 44011 Comm: fio Not tainted 6.8.0-rc3+ #1 RIP: 0010:iomap_iter+0x32b/0x350 Call Trace: <TASK> __iomap_dio_rw+0x1df/0x830 f2fs_file_read_iter+0x156/0x3d0 [f2fs] aio_read+0x138/0x210 io_submit_one+0x188/0x8c0 __x64_sys_io_submit+0x8c/0x1a0 do_syscall_64+0x86/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Shinichiro Kawasaki helps to analyse this issue and proposes a potential fixing patch in [2]. Quoted from reply of Shinichiro Kawasaki: "I confirmed that the trigger commit is dbf8e63f48af as Yi reported. I took a look in the commit, but it looks fine to me. So I thought the cause is not in the commit diff. I found the WARN is printed when the f2fs is set up with multiple devices, and read requests are mapped to the very first block of the second device in the direct read path. In this case, f2fs_map_blocks() and f2fs_map_blocks_cached() modify map->m_pblk as the physical block address from each block device. It becomes zero when it is mapped to the first block of the device. However, f2fs_iomap_begin() assumes that map->m_pblk is the physical block address of the whole f2fs, across the all block devices. It compares map->m_pblk against NULL_ADDR == 0, then go into the unexpected branch and sets the invalid iomap->length. The WARN catches the invalid iomap->length. This WARN is printed even for non-zoned block devices, by following steps. - Create two (non-zoned) null_blk devices memory backed with 128MB size each: nullb0 and nullb1. # mkfs.f2fs /dev/nullb0 -c /dev/nullb1 # mount -t f2fs /dev/nullb0 "${mount_dir}" # dd if=/dev/zero of="${mount_dir}/test.dat" bs=1M count=192 # dd if="${mount_dir}/test.dat" of=/dev/null bs=1M count=192 iflag=direct ..." So, the root cause of this issue is: when multi-devices feature is on, f2fs_map_blocks() may return zero blkaddr in non-primary device, which is a verified valid block address, however, f2fs_iomap_begin() treats it as an invalid block address, and then it triggers the warning in iomap framework code. Finally, as discussed, we decide to use a more simple and direct way that checking (map.m_flags & F2FS_MAP_MAPPED) condition instead of (map.m_pblk != NULL_ADDR) to fix this issue. Thanks a lot for the effort of Yi Zhang and Shinichiro Kawasaki on this issue. [1] https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/ [2] https://lore.kernel.org/linux-f2fs-devel/gngdj77k4picagsfdtiaa7gpgnup6fsgwzsltx6milmhegmjff@iax2n4wvrqye/ Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Fixes: 1517c1a7a445 ("f2fs: implement iomap operations") Fixes: 8d3c1fa3fa5e ("f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce map_is_mergeable() for cleanupChao Yu2024-03-261-6/+17
| | | | | | | | | | | | | | No logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to detect inconsistent nat entry during truncationChao Yu2024-03-261-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Roman Smirnov reported as below: " There is a possible bug in f2fs_truncate_inode_blocks(): if (err < 0 && err != -ENOENT) goto fail; ... offset[1] = 0; offset[0]++; nofs += err; If err = -ENOENT then nofs will sum with an error code, which is strange behaviour. Also if nofs < ENOENT this will cause an overflow. err will be equal to -ENOENT with the following call stack: truncate_nodes() f2fs_get_node_page() __get_node_page() read_node_page() " If nat is corrupted, truncate_nodes() may return -ENOENT, and f2fs_truncate_inode_blocks() doesn't handle such error correctly, fix it. Reported-by: Roman Smirnov <r.smirnov@omp.ru> Closes: https://lore.kernel.org/linux-f2fs-devel/085b27fd2b364a3c8c3a9ca77363e246@omp.ru Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Prevent s_writer rw_sem count mismatch in f2fs_evict_inodeYeongjin Gil2024-03-261-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If f2fs_evict_inode is called between freeze_super and thaw_super, the s_writer rwsem count may become negative, resulting in hang. CPU1 CPU2 f2fs_resize_fs() f2fs_evict_inode() f2fs_freeze set SBI_IS_FREEZING skip sb_start_intwrite f2fs_unfreeze clear SBI_IS_FREEZING sb_end_intwrite To solve this problem, the call to sb_end_write is determined by whether sb_start_intwrite is called, rather than the current freezing status. Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: support .shutdown in f2fs_sopsChao Yu2024-03-263-27/+51
| | | | | | | | | | | | | | | | Support .shutdown callback in f2fs_sops, then, it can be called to shut down the file system when underlying block device is marked dead. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds2024-05-191-2/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
| * | f2fs: convert f2fs_clear_page_cache_dirty_tag to use a folioMatthew Wilcox (Oracle)2024-05-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removes uses of page_mapping() and page_index(). Link: https://lkml.kernel.org/r/20240423225552.4113447-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | Merge tag 'vfs-6.10.misc' of ↵Linus Torvalds2024-05-131-1/+2
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
| * | fs: claw back a few FMODE_* bitsChristian Brauner2024-04-071-1/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to change for individual files. Imho, such flags don't need to live in f_mode they might as well live in the fops structs itself. And the fops struct already has that lonely mmap_supported_flags member. We might as well turn that into a generic fop_flags member and move a few flags from FMODE_* space into FOP_* space. That gets us four FMODE_* bits back and the ability for new static flags that are about file ops to not have to live in FMODE_* space but in their own FOP_* space. It's not the most beautiful thing ever but it gets the job done. Yes, there'll be an additional pointer chase but hopefully that won't matter for these flags. I suspect there's a few more we can move into there and that we can also redirect a bunch of new flag suggestions that follow this pattern into the fop_flags field instead of f_mode. Link: https://lore.kernel.org/r/20240328-gewendet-spargel-aa60a030ef74@brauner Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
* / fs,block: yield devices earlyChristian Brauner2024-03-271-1/+1
|/ | | | | | | | | | | | | | | | | Currently a device is only really released once the umount returns to userspace due to how file closing works. That ultimately could cause an old umount assumption to be violated that concurrent umount and mount don't fail. So an exclusively held device with a temporary holder should be yielded before the filesystem is gone. Add a helper that allows callers to do that. This also allows us to remove the two holder ops that Linus wasn't excited about. Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
* Merge tag 'f2fs-for-6.9-rc1' of ↵Linus Torvalds2024-03-1819-824/+999
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this round, there are a number of updates on mainly two areas: Zoned block device support and Per-file compression. For example, we've found several issues to support Zoned block device especially having large sections regarding to GC and file pinning used for Android devices. In compression side, we've fixed many corner race conditions that had broken the design assumption. Enhancements: - Support file pinning for Zoned block device having large section - Enhance the data recovery after sudden power cut on Zoned block device - Add more error injection cases to easily detect the kernel panics - add a proc entry show the entire disk layout - Improve various error paths paniced by BUG_ON in block allocation and GC - support SEEK_DATA and SEEK_HOLE for compression files Bug fixes: - avoid use-after-free issue in f2fs_filemap_fault - fix some race conditions to break the atomic write design assumption - fix to truncate meta inode pages forcely - resolve various per-file compression issues wrt the space management and compression policies - fix some swap-related bugs In addition, we removed deprecated codes such as io_bits and heap_allocation, and also fixed minor error handling routines with neat debugging messages" * tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits) f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault f2fs: truncate page cache before clearing flags when aborting atomic write f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag f2fs: prevent atomic write on pinned file f2fs: fix to handle error paths of {new,change}_curseg() f2fs: unify the error handling of f2fs_is_valid_blkaddr f2fs: zone: fix to remove pow2 check condition for zoned block device f2fs: fix to truncate meta inode pages forcely f2fs: compress: fix reserve_cblocks counting error when out of space f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks f2fs: add a proc entry show disk layout f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup f2fs: fix to check return value of f2fs_gc_range f2fs: fix to check return value __allocate_new_segment f2fs: fix to do sanity check in update_sit_entry f2fs: fix to reset fields for unloaded curseg f2fs: clean up new_curseg() f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate() f2fs: fix blkofs_end correctly in f2fs_migrate_blocks() f2fs: ro: don't start discard thread for readonly image ...
| * f2fs: fix to avoid use-after-free issue in f2fs_filemap_faultChao Yu2024-03-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports a f2fs bug as below: BUG: KASAN: slab-use-after-free in f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49 Read of size 8 at addr ffff88807bb22680 by task syz-executor184/5058 CPU: 0 PID: 5058 Comm: syz-executor184 Not tainted 6.7.0-syzkaller-09928-g052d534373b7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x163/0x540 mm/kasan/report.c:488 kasan_report+0x142/0x170 mm/kasan/report.c:601 f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49 __do_fault+0x131/0x450 mm/memory.c:4376 do_shared_fault mm/memory.c:4798 [inline] do_fault mm/memory.c:4872 [inline] do_pte_missing mm/memory.c:3745 [inline] handle_pte_fault mm/memory.c:5144 [inline] __handle_mm_fault+0x23b7/0x72b0 mm/memory.c:5285 handle_mm_fault+0x27e/0x770 mm/memory.c:5450 do_user_addr_fault arch/x86/mm/fault.c:1364 [inline] handle_page_fault arch/x86/mm/fault.c:1507 [inline] exc_page_fault+0x456/0x870 arch/x86/mm/fault.c:1563 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570 The root cause is: in f2fs_filemap_fault(), vmf->vma may be not alive after filemap_fault(), so it may cause use-after-free issue when accessing vmf->vma->vm_flags in trace_f2fs_filemap_fault(). So it needs to keep vm_flags in separated temporary variable for tracepoint use. Fixes: 87f3afd366f7 ("f2fs: add tracepoint for f2fs_vm_page_mkwrite()") Reported-and-tested-by: syzbot+763afad57075d3f862f2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000e8222b060f00db3b@google.com Cc: Ed Tsai <Ed.Tsai@mediatek.com> Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: truncate page cache before clearing flags when aborting atomic writeSunmin Jeong2024-03-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In f2fs_do_write_data_page, FI_ATOMIC_FILE flag selects the target inode between the original inode and COW inode. When aborting atomic write and writeback occur simultaneously, invalid data can be written to original inode if the FI_ATOMIC_FILE flag is cleared meanwhile. To prevent the problem, let's truncate all pages before clearing the flag Atomic write thread Writeback thread f2fs_abort_atomic_write clear_inode_flag(inode, FI_ATOMIC_FILE) __writeback_single_inode do_writepages f2fs_do_write_data_page - use dn of original inode truncate_inode_pages_final Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flagSunmin Jeong2024-03-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In f2fs_update_inode, i_size of the atomic file isn't updated until FI_ATOMIC_COMMITTED flag is set. When committing atomic write right after the writeback of the inode, i_size of the raw inode will not be updated. It can cause the atomicity corruption due to a mismatch between old file size and new data. To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED Atomic write thread Writeback thread __writeback_single_inode write_inode f2fs_update_inode - skip i_size update f2fs_ioc_commit_atomic_write f2fs_commit_atomic_write set_inode_flag(inode, FI_ATOMIC_COMMITTED) f2fs_do_sync_file f2fs_fsync_node_pages - skip f2fs_update_inode since the inode is clean Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: prevent atomic write on pinned fileDaeho Jeong2024-03-121-1/+2
| | | | | | | | | | | | | | | | | | Since atomic write way was changed to out-place-update, we should prevent it on pinned files. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to handle error paths of {new,change}_curseg()Zhiguo Niu2024-03-124-27/+45
| | | | | | | | | | | | | | | | | | | | | | {new,change}_curseg() may return error in some special cases, error handling should be did in their callers, and this will also facilitate subsequent error path expansion in {new,change}_curseg(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: unify the error handling of f2fs_is_valid_blkaddrZhiguo Niu2024-03-127-62/+29
| | | | | | | | | | | | | | | | | | | | | | | | There are some cases of f2fs_is_valid_blkaddr not handled as ERROR_INVALID_BLKADDR,so unify the error handling about all of f2fs_is_valid_blkaddr. Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>