summaryrefslogtreecommitdiffstats
path: root/fs/f2fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'f2fs-for-6.2-rc1' of ↵Linus Torvalds2022-12-1419-880/+1453
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE and a per-block age-based extent cache. F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic write feature which guarantees a per-file atomicity. It would be more efficient than AtomicFile implementation in Android framework. The per-block age-based extent cache implements another type of extent cache in memory which keeps the per-block age in a file, so that block allocator could split the hot and cold data blocks more accurately. Enhancements: - introduce F2FS_IOC_START_ATOMIC_REPLACE - refactor extent_cache to add a new per-block-age-based extent cache support - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs - add proc entry to show discard_plist info - optimize iteration over sparse directories - add barrier mount option Bug fixes: - avoid victim selection from previous victim section - fix to enable compress for newly created file if extension matches - set zstd compress level correctly - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot - correct i_size change for atomic writes - allow to read node block after shutdown - allow to set compression for inlined file - fix gc mode when gc_urgent_high_remaining is 1 - should put a page when checking the summary info Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and doc" * tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: reset wait_ms to default if any of the victims have been selected f2fs: fix some format WARNING in debug.c and sysfs.c f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super() f2fs: fix iostat parameter for discard f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache f2fs: add block_age-based extent cache f2fs: allocate the extent_cache by default f2fs: refactor extent_cache to support for read and more f2fs: remove unnecessary __init_extent_tree f2fs: move internal functions into extent_cache.c f2fs: specify extent cache for read explicitly f2fs: introduce f2fs_is_readonly() for readability f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro f2fs: do some cleanup for f2fs module init MAINTAINERS: Add f2fs bug tracker link f2fs: remove the unused flush argument to change_curseg f2fs: open code allocate_segment_by_default f2fs: remove struct segment_allocation default_salloc_ops f2fs: introduce discard_urgent_util sysfs node f2fs: define MIN_DISCARD_GRANULARITY macro ...
| * f2fs: reset wait_ms to default if any of the victims have been selectedYuwei Guan2022-12-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In non-foreground gc mode, if no victim is selected, the gc process will wait for no_gc_sleep_time before waking up again. In this subsequent time, even though a victim will be selected, the gc process still waits for no_gc_sleep_time before waking up. The configuration of wait_ms is not reasonable. After any of the victims have been selected, we need to reset wait_ms to default sleep time from no_gc_sleep_time. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix some format WARNING in debug.c and sysfs.cYangtao Li2022-12-122-27/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To fix: WARNING: function definition argument 'struct f2fs_attr *' should also have an identifier name + ssize_t (*show)(struct f2fs_attr *, struct f2fs_sb_info *, char *); WARNING: return sysfs_emit(...) formats should include a terminating newline + return sysfs_emit(buf, "(none)"); WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned npages = NODE_MAPPING(sbi)->nrpages; WARNING: Missing a blank line after declarations + unsigned npages = COMPRESS_MAPPING(sbi)->nrpages; + si->page_mem += (unsigned long long)npages << PAGE_SHIFT; WARNING: quoted string split across lines + seq_printf(s, "CP merge (Queued: %4d, Issued: %4d, Total: %4d, " + "Cur time: %4d(ms), Peak time: %4d(ms))\n", Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in ↵Yangtao Li2022-12-122-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | f2fs_put_super() No need to call f2fs_issue_discard_timeout() in f2fs_put_super, when no discard command requires issue. Since the caller of f2fs_issue_discard_timeout() usually judges the number of discard commands before using it. Let's move this logic to f2fs_issue_discard_timeout(). By the way, use f2fs_realtime_discard_enable to simplify the code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix iostat parameter for discardYangtao Li2022-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | Just like other data we count uses the number of bytes as the basic unit, but discard uses the number of cmds as the statistical unit. In fact the discard command contains the number of blocks, so let's change to the number of bytes as the base unit. Fixes: b0af6d491a6b ("f2fs: add app/fs io stat") Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> ↵Colin Ian King2022-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | free_bio_entry_cache There is a spelling mistake in a label name. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add block_age-based extent cacheJaegeuk Kim2022-12-1211-7/+329
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a runtime hot/cold data separation method for f2fs, in order to improve the accuracy for data temperature classification, reduce the garbage collection overhead after long-term data updates. Enhanced hot/cold data separation can record data block update frequency as "age" of the extent per inode, and take use of the age info to indicate better temperature type for data block allocation: - It records total data blocks allocated since mount; - When file extent has been updated, it calculate the count of data blocks allocated since last update as the age of the extent; - Before the data block allocated, it searches for the age info and chooses the suitable segment for allocation. Test and result: - Prepare: create about 30000 files * 3% for cold files (with cold file extension like .apk, from 3M to 10M) * 50% for warm files (with random file extension like .FcDxq, from 1K to 4M) * 47% for hot files (with hot file extension like .db, from 1K to 256K) - create(5%)/random update(90%)/delete(5%) the files * total write amount is about 70G * fsync will be called for .db files, and buffered write will be used for other files The storage of test device is large enough(128G) so that it will not switch to SSR mode during the test. Benefit: dirty segment count increment reduce about 14% - before: Dirty +21110 - after: Dirty +18286 Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: allocate the extent_cache by defaultJaegeuk Kim2022-12-124-24/+27
| | | | | | | | | | | | Let's allocate it to remove the runtime complexity. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: refactor extent_cache to support for read and moreJaegeuk Kim2022-12-1210-281/+434
| | | | | | | | | | | | | | This patch prepares extent_cache to be ready for addition. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove unnecessary __init_extent_treeJaegeuk Kim2022-12-121-16/+5
| | | | | | | | | | | | | | Added into the caller. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: move internal functions into extent_cache.cJaegeuk Kim2022-12-122-76/+81
| | | | | | | | | | | | | | No functional change. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: specify extent cache for read explicitlyJaegeuk Kim2022-12-127-18/+18
| | | | | | | | | | | | | | Let's descrbie it's read extent cache. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce f2fs_is_readonly() for readabilityYangtao Li2022-12-122-3/+7
| | | | | | | | | | | | | | | | Introduce f2fs_is_readonly() and use it to simplify code. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macroYangtao Li2022-12-121-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() have never been used since they were introduced by this commit 76f105a2dbcd("f2fs: add feature facility in superblock"). So let's remove them. BTW, convert f2fs_sb_has_##name to return bool. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: do some cleanup for f2fs module initYangtao Li2022-12-085-62/+14
| | | | | | | | | | | | | | Just for cleanup, no functional changes. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove the unused flush argument to change_cursegChristoph Hellwig2022-12-081-9/+7
| | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: open code allocate_segment_by_defaultChristoph Hellwig2022-12-081-26/+24
| | | | | | | | | | | | | | | | | | | | | | allocate_segment_by_default has just two callers, which use very different code pathes inside it based on the force paramter. Just open code the logic in the two callers using a new helper to decided if a new segment should be allocated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove struct segment_allocation default_salloc_opsChristoph Hellwig2022-12-082-15/+2
| | | | | | | | | | | | | | | | | | There is only single instance of these ops, so remove the indirection and call allocate_segment_by_default directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce discard_urgent_util sysfs nodeYangtao Li2022-11-283-1/+12
| | | | | | | | | | | | | | | | | | | | Through this node, you can control the background discard to run more aggressively or not aggressively when reach the utilization rate of the space. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: define MIN_DISCARD_GRANULARITY macroYangtao Li2022-11-283-3/+6
| | | | | | | | | | | | | | | | | | | | Do cleanup in f2fs_tuning_parameters() and __init_discard_policy(), let's use macro instead of number. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: init discard policy after thread wakeupYangtao Li2022-11-281-11/+9
| | | | | | | | | | | | | | | | | | | | | | Under the current logic, after the discard thread wakes up, it will not run according to the expected policy, but will use the expected policy before sleep. Move the strategy selection to after the thread wakes up, so that the running state of the thread meets expectations. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid victim selection from previous victim sectionYonggil Song2022-11-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When f2fs chooses GC victim in large section & LFS mode, next_victim_seg[gc_type] is referenced first. After segment is freed, next_victim_seg[gc_type] has the next segment number. However, next_victim_seg[gc_type] still has the last segment number even after the last segment of section is freed. In this case, when f2fs chooses a victim for the next GC round, the last segment of previous victim section is chosen as a victim. Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in large section. Fixes: e3080b0120a1 ("f2fs: support subsectional garbage collection") Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: truncate blocks in batch in __complete_revoke_list()Chao Yu2022-11-281-7/+2
| | | | | | | | | | | | | | | | Use f2fs_do_truncate_blocks() to truncate all blocks in-batch in __complete_revoke_list(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: make __queue_discard_cmd() return voidYangtao Li2022-11-281-5/+6
| | | | | | | | | | | | | | | | | | Since __queue_discard_cmd() never returns an error, let's make it return void. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: move set_file_temperature into f2fs_new_inodeSheng Yong2022-11-281-33/+29
| | | | | | | | | | | | | | | | | | Since the file name has already passed to f2fs_new_inode(), let's move set_file_temperature() into f2fs_new_inode(). Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to enable compress for newly created file if extension matchesSheng Yong2022-11-282-167/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If compress_extension is set, and a newly created file matches the extension, the file could be marked as compression file. However, if inline_data is also enabled, there is no chance to check its extension since f2fs_should_compress() always returns false. This patch moves set_compress_inode(), which do extension check, in f2fs_should_compress() to check extensions before setting inline data flag. Fixes: 7165841d578e ("f2fs: fix to check inline_data during compressed inode conversion") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: set zstd compress level correctlySheng Yong2022-11-281-1/+1
| | | | | | | | | | | | | | | | Fixes: cf30f6a5f0c6 ("lib: zstd: Add kernel-specific API") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: change type for 'sbi->readdir_ra'Yuwei Guan2022-11-284-3/+8
| | | | | | | | | | | | | | | | | | Before this patch, the varibale 'readdir_ra' takes effect if it's equal to '1' or not, so we can change type for it from 'int' to 'bool'. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: cleanup for 'f2fs_tuning_parameters' functionYuwei Guan2022-11-281-5/+3
| | | | | | | | | | | | | | | | A cleanup patch for 'f2fs_tuning_parameters' function. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to alloc_mode changed after remount on a small volume deviceYuwei Guan2022-11-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | The commit 84b89e5d943d8 ("f2fs: add auto tuning for small devices") add tuning for small volume device, now support to tune alloce_mode to 'reuse' if it's small size. But the alloc_mode will change to 'default' when do remount on this small size dievce. This patch fo fix alloc_mode changed when do remount for a small volume device. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove submit label in __submit_discard_cmd()Yangtao Li2022-11-281-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Complaint from Matthew Wilcox in another similar place: "submit? You don't submit anything at the 'submit' label. it should be called 'skip' or something. But I think this is just badly written and you don't need a goto at all." Let's remove submit label for readability. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to do sanity check on i_extra_isize in is_alive()Chao Yu2022-11-281-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot found a f2fs bug: BUG: KASAN: slab-out-of-bounds in data_blkaddr fs/f2fs/f2fs.h:2891 [inline] BUG: KASAN: slab-out-of-bounds in is_alive fs/f2fs/gc.c:1117 [inline] BUG: KASAN: slab-out-of-bounds in gc_data_segment fs/f2fs/gc.c:1520 [inline] BUG: KASAN: slab-out-of-bounds in do_garbage_collect+0x386a/0x3df0 fs/f2fs/gc.c:1734 Read of size 4 at addr ffff888076557568 by task kworker/u4:3/52 CPU: 1 PID: 52 Comm: kworker/u4:3 Not tainted 6.1.0-rc4-syzkaller-00362-gfef7fd48922d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: writeback wb_workfn (flush-7:0) Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbb/0x1f0 mm/kasan/report.c:495 data_blkaddr fs/f2fs/f2fs.h:2891 [inline] is_alive fs/f2fs/gc.c:1117 [inline] gc_data_segment fs/f2fs/gc.c:1520 [inline] do_garbage_collect+0x386a/0x3df0 fs/f2fs/gc.c:1734 f2fs_gc+0x88c/0x20a0 fs/f2fs/gc.c:1831 f2fs_balance_fs+0x544/0x6b0 fs/f2fs/segment.c:410 f2fs_write_inode+0x57e/0xe20 fs/f2fs/inode.c:753 write_inode fs/fs-writeback.c:1440 [inline] __writeback_single_inode+0xcfc/0x1440 fs/fs-writeback.c:1652 writeback_sb_inodes+0x54d/0xf90 fs/fs-writeback.c:1870 wb_writeback+0x2c5/0xd70 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x2dc/0x12f0 fs/fs-writeback.c:2227 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 The root cause is that we forgot to do sanity check on .i_extra_isize in below path, result in accessing invalid address later, fix it. - gc_data_segment - is_alive - data_blkaddr - offset_in_addr Reported-by: syzbot+f8f3dfa4abc489e768a1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-f2fs-devel/0000000000003cb3c405ed5c17f9@google.com/T/#u Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACEDaeho Jeong2022-11-284-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | introduce a new ioctl to replace the whole content of a file atomically, which means it induces truncate and content update at the same time. We can start it with F2FS_IOC_START_ATOMIC_REPLACE and complete it with F2FS_IOC_COMMIT_ATOMIC_WRITE. Or abort it with F2FS_IOC_ABORT_ATOMIC_WRITE. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to set flush_merge opt and show noflush_mergeYangtao Li2022-11-111-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | Some minor modifications to flush_merge and related parameters: 1.The FLUSH_MERGE opt is set by default only in non-ro mode. 2.When ro and merge are set at the same time, an error is reported. 3.Display noflush_merge mount opt. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: initialize locks earlier in f2fs_fill_super()Tetsuo Handa2022-11-111-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot is reporting lockdep warning at f2fs_handle_error() [1], for spin_lock(&sbi->error_lock) is called before spin_lock_init() is called. For safe locking in error handling, move initialization of locks (and obvious structures) in f2fs_fill_super() to immediately after memory allocation. Link: https://syzkaller.appspot.com/bug?extid=40642be9b7e0bb28e0df [1] Reported-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: optimize iteration over sparse directoriesChao Yu2022-11-114-19/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wei Chen reports a kernel bug as blew: INFO: task syz-executor.0:29056 blocked for more than 143 seconds. Not tainted 5.15.0-rc5 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor.0 state:D stack:14632 pid:29056 ppid: 6574 flags:0x00000004 Call Trace: __schedule+0x4a1/0x1720 schedule+0x36/0xe0 rwsem_down_write_slowpath+0x322/0x7a0 fscrypt_ioctl_set_policy+0x11f/0x2a0 __f2fs_ioctl+0x1a9f/0x5780 f2fs_ioctl+0x89/0x3a0 __x64_sys_ioctl+0xe8/0x140 do_syscall_64+0x34/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Eric did some investigation on this issue, quoted from reply of Eric: "Well, the quality of this bug report has a lot to be desired (not on upstream kernel, reproducer is full of totally irrelevant stuff, not sent to the mailing list of the filesystem whose disk image is being fuzzed, etc.). But what is going on is that f2fs_empty_dir() doesn't consider the case of a directory with an extremely large i_size on a malicious disk image. Specifically, the reproducer mounts an f2fs image with a directory that has an i_size of 14814520042850357248, then calls FS_IOC_SET_ENCRYPTION_POLICY on it. That results in a call to f2fs_empty_dir() to check whether the directory is empty. f2fs_empty_dir() then iterates through all 3616826182336513 blocks the directory allegedly contains to check whether any contain anything. i_rwsem is held during this, so anything else that tries to take it will hang." In order to solve this issue, let's use f2fs_get_next_page_offset() to speed up iteration by skipping holes for all below functions: - f2fs_empty_dir - f2fs_readdir - find_in_level The way why we can speed up iteration was described in 'commit 3cf4574705b4 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")'. Meanwhile, in f2fs_empty_dir(), let's use f2fs_find_data_page() instead f2fs_get_lock_data_page(), due to i_rwsem was held in caller of f2fs_empty_dir(), there shouldn't be any races, so it's fine to not lock dentry page during lookuping dirents in the page. Link: https://lore.kernel.org/lkml/536944df-a0ae-1dd8-148f-510b476e1347@kernel.org/T/ Reported-by: Wei Chen <harperchen1110@gmail.com> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to avoid accessing uninitialized spinlockChao Yu2022-11-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports a kernel bug: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 assign_lock_key+0x22a/0x240 kernel/locking/lockdep.c:981 register_lock_class+0x287/0x9b0 kernel/locking/lockdep.c:1294 __lock_acquire+0xe4/0x1f60 kernel/locking/lockdep.c:4934 lock_acquire+0x1a7/0x400 kernel/locking/lockdep.c:5668 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:350 [inline] f2fs_save_errors fs/f2fs/super.c:3868 [inline] f2fs_handle_error+0x29/0x230 fs/f2fs/super.c:3896 f2fs_iget+0x215/0x4bb0 fs/f2fs/inode.c:516 f2fs_fill_super+0x47d3/0x7b50 fs/f2fs/super.c:4222 mount_bdev+0x26c/0x3a0 fs/super.c:1401 legacy_get_tree+0xea/0x180 fs/fs_context.c:610 vfs_get_tree+0x88/0x270 fs/super.c:1531 do_new_mount+0x289/0xad0 fs/namespace.c:3040 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount+0x2e3/0x3d0 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd F2FS-fs (loop1): Failed to read F2FS meta data inode The root cause is if sbi->error_lock may be accessed before its initialization, fix it. Link: https://lore.kernel.org/linux-f2fs-devel/0000000000007edb6605ecbb6442@google.com/T/#u Reported-by: syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: correct i_size change for atomic writesDaeho Jeong2022-11-114-12/+33
| | | | | | | | | | | | | | | | | | We need to make sure i_size doesn't change until atomic write commit is successful and restore it when commit is failed. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add proc entry to show discard_plist infoYangtao Li2022-11-111-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new proc entry to show discard_plist information in more detail, which is very helpful to know the discard pend list count clearly. Such as: Discard pend list(Show diacrd_cmd count on each entry, .:not exist): 0 390 156 85 67 46 37 26 14 8 17 12 9 9 6 12 11 10 16 5 9 2 4 8 3 4 1 24 3 2 2 5 2 4 5 4 32 3 3 2 3 . 3 3 1 40 . 4 1 3 2 1 2 1 48 1 . 1 1 . 1 1 . 56 . 1 1 1 . 2 . 1 64 1 2 . . . . . . 72 . 1 . . . . . . 80 3 1 . . 1 1 . . 88 1 . . . 1 . . 1 ...... Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: allow to read node block after shutdownJaegeuk Kim2022-11-111-2/+1
| | | | | | | | | | | | | | | | | | | | If block address is still alive, we should give a valid node block even after shutdown. Otherwise, we can see zero data when reading out a file. Cc: stable@vger.kernel.org Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: replace ternary operator with max()wangkailong@jari.cn2022-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | Fix the following coccicheck warning: ./fs/f2fs/segment.c:877:24-25: WARNING opportunity for max() Signed-off-by: KaiLong Wang <wangkailong@jari.cn> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: replace gc_urgent_high_remaining with gc_remaining_trialsYangtao Li2022-11-014-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | The user can set the trial count limit for GC urgent and idle mode with replaced gc_remaining_trials.. If GC thread gets to the limit, the mode will turn back to GC normal mode finally. It was applied only to GC_URGENT, while this patch expands it for GC_IDLE. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use sysfs_emit instead of sprintfJaegeuk Kim2022-11-011-18/+18
| | | | | | | | | | | | | | Let's use sysfs_emit. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce gc_mode sysfs nodeYangtao Li2022-11-012-6/+10
| | | | | | | | | | | | | | | | | | Revert "f2fs: make gc_urgent and gc_segment_mode sysfs node readable". Add a gc_mode sysfs node to show the current gc_mode as a string. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to destroy sbi->post_read_wq in error path of f2fs_fill_super()Chao Yu2022-11-011-1/+1
| | | | | | | | | | | | | | | | | | In error path of f2fs_fill_super(), this patch fixes to call f2fs_destroy_post_read_wq() once if we fail in f2fs_start_ckpt_thread(). Fixes: 261eeb9c1585 ("f2fs: introduce checkpoint_merge mount option") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix return val in f2fs_start_ckpt_thread()Yangtao Li2022-11-013-11/+12
| | | | | | | | | | | | | | | | Return PTR_ERR(cprc->f2fs_issue_ckpt) instead of -ENOMEM; Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Fix typo in commentsKeoseong Park2022-11-011-1/+1
| | | | | | | | | | | | | | | | Change "truncateion" to "truncation". Signed-off-by: Keoseong Park <keosung.park@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce max_ordered_discard sysfs nodeYangtao Li2022-11-013-1/+16
| | | | | | | | | | | | | | | | | | The current max_ordered_discard is a fixed value, change it to be configurable through the sys node. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: allow to set compression for inlined fileJaegeuk Kim2022-11-011-0/+4
| | | | | | | | | | | | | | | | | | The below commit disallows to set compression on empty created file which has a inline_data. Let's fix it. Fixes: 7165841d578e ("f2fs: fix to check inline_data during compressed inode conversion") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add barrier mount optionYangtao Li2022-11-011-0/+7
| | | | | | | | | | | | | | | | | | | | This patch adds a mount option, barrier, in f2fs. The barrier option is the opposite of nobarrier. If this option is set, cache_flush commands are allowed to be issued. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>