summaryrefslogtreecommitdiffstats
path: root/fs/f2fs/super.c
diff options
context:
space:
mode:
authorChao Yu <yuchao0@huawei.com>2018-08-02 23:03:19 +0800
committerJaegeuk Kim <jaegeuk@kernel.org>2018-08-10 16:19:05 -0700
commit50fa53eccf9f911a5b435248a2b0bd484fd82e5e (patch)
treef2fbb65982fcd3dfc251509a2e1800207ae7ad3e /fs/f2fs/super.c
parent6e45f2a59ffb440f28719ab3a68bee5b8e9df16b (diff)
downloadlinux-stable-50fa53eccf9f911a5b435248a2b0bd484fd82e5e.tar.gz
linux-stable-50fa53eccf9f911a5b435248a2b0bd484fd82e5e.tar.bz2
linux-stable-50fa53eccf9f911a5b435248a2b0bd484fd82e5e.zip
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced file recovery depends on previous dnode's persistence in the list, so during fsync() we should wait on all regular inode's dnode writebacked before issuing flush. By this way, we can avoid dnode block list being broken by out-of-order IO submission due to IO scheduler or driver. Sheng Yong helps to do the test with this patch: Target:/data (f2fs, -) 64MB / 32768KB / 4KB / 8 1 / PERSIST / Index Base: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08 2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7 3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48 Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333 After: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 798.81 202.5 41143 40613.87 602.71 838.08 913.83 2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27 3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91 Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333 Patched/Original: 0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189 It looks like atomic write will suffer performance regression. I suspect that the criminal is that we forcing to wait all dnode being in storage cache before we issue PREFLUSH+FUA. BTW, will commit ("f2fs: don't need to wait for node writes for atomic write") cause the problem: we will lose data of last transaction after SPO, even if atomic write return no error: - atomic_open(); - write() P1, P2, P3; - atomic_commit(); - writeback data: P1, P2, P3; - writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing last transaction. - preflush + fua; - power-cut If we don't wait dnode writeback for atomic_write: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85 2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77 3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92 Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18 Patched/Original: 0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294 SQLite's performance recovers. Jaegeuk: "Practically, I don't see db corruption becase of this. We can excuse to lose the last transaction." Finally, we decide to keep original implementation of atomic write interface sematics that we don't wait all dnode writeback before preflush+fua submission. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Diffstat (limited to 'fs/f2fs/super.c')
-rw-r--r--fs/f2fs/super.c6
1 files changed, 6 insertions, 0 deletions
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 879ff1b22357..b0e2b017f390 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1036,6 +1036,10 @@ static void f2fs_put_super(struct super_block *sb)
/* our cp_error case, we can wait for any writeback page */
f2fs_flush_merged_writes(sbi);
+ f2fs_wait_on_all_pages_writeback(sbi);
+
+ f2fs_bug_on(sbi, sbi->fsync_node_num);
+
iput(sbi->node_inode);
iput(sbi->meta_inode);
@@ -2911,6 +2915,8 @@ try_onemore:
f2fs_init_ino_entry_info(sbi);
+ f2fs_init_fsync_node_info(sbi);
+
/* setup f2fs internal modules */
err = f2fs_build_segment_manager(sbi);
if (err) {