summaryrefslogtreecommitdiffstats
path: root/fs/f2fs/super.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-f2fs-4.10' of ↵Linus Torvalds2016-12-141-40/+241
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well" * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: fix a missing size change in f2fs_setattr f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting ...
| * f2fs: fix to access nullified flush_cmd_control pointerJaegeuk Kim2016-12-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | f2fs_sync_file() remount_ro - f2fs_readonly - destroy_flush_cmd_control - f2fs_issue_flush - no fcc pointer! So, this patch doesn't free fcc in this case, but just stop its kernel thread which sends flush commands. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: detect wrong layoutJaegeuk Kim2016-12-071-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect that as well. Refer this in f2fs-tools. mkfs.f2fs: detect small partition by overprovision ratio and # of segments Reported-and-Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * Revert "f2fs: use percpu_counter for # of dirty pages in inode"Jaegeuk Kim2016-12-051-6/+1
| | | | | | | | | | | | | | | | | | | | This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76. The perpcu_counter doesn't provide atomicity in single core and consume more DRAM. That incurs fs_mark test failure due to ENOMEM. Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix incorrect free inode count in ->statfsChao Yu2016-11-251-1/+2
| | | | | | | | | | | | | | | | | | While calculating inode count that we can create at most in the left space, we should consider space which data/node blocks occupied, since we create data/node mixly in main area. So fix the wrong calculation in ->statfs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: support multiple devicesJaegeuk Kim2016-11-251-29/+109
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements multiple devices support for f2fs. Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big volume under one f2fs instance. Internal block management is very simple, but we will modify block allocation and background GC policy to boost IO speed by exploiting them accoording to each device speed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove checkpoint in f2fs_freezeJaegeuk Kim2016-11-231-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic freeze_super() calls sync_filesystems() before f2fs_freeze(). So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068, it triggers circular locking problem below due to gc_mutex for checkpoint. ====================================================== [ INFO: possible circular locking dependency detected ] 4.9.0-rc1+ #132 Tainted: G OE ------------------------------------------------------- 1. wait for __sb_start_write() by [<ffffffff9845f353>] dump_stack+0x85/0xc2 [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230 [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs] [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs] [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs] [<ffffffff98289991>] iput+0x171/0x2c0 [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs] [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs] [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs] [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffff982a4f90>] ? do_fsync+0x70/0x70 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70 [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30 [<ffffffff9826ca3e>] iterate_supers+0xae/0x100 [<ffffffff982a50b5>] sys_sync+0x55/0x90 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6 2. wait for sbi->gc_mutex by [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220 [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0 [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs] [<ffffffff9826b6ef>] freeze_super+0xcf/0x190 [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0 [<ffffffff9827f099>] SyS_ioctl+0x79/0x90 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Cache zoned block devices zone typeDamien Le Moal2016-11-231-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With the zoned block device feature enabled, section discard need to do a zone reset for sections contained in sequential zones, and a regular discard (if supported) for sections stored in conventional zones. Avoid the need for a costly report zones to obtain a section zone type when discarding it by caching the types of the device zones in the super block information. This cache is initialized at mount time for mounts with the zoned block device feature enabled. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Do not allow adaptive mode for host-managed zoned block devicesDamien Le Moal2016-11-231-0/+7
| | | | | | | | | | | | | | | | | | The LFS mode is mandatory for host-managed zoned block devices as update in place optimizations are not possible for segments in sequential zones. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Always enable discard for zoned blocks devicesDamien Le Moal2016-11-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | Zone write pointer reset acts as discard for zoned block devices. So if the zoned block device feature is enabled, always declare that discard is enabled, even if the device does not actually support the command. For the same reason, prevent the use the "nodicard" mount option. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Suppress discard warning message for zoned block devicesDamien Le Moal2016-11-231-1/+1
| | | | | | | | | | | | | | | | For zoned block devices, discard is replaced by zone reset. So do not warn if the device does not supports discard. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Check zoned block feature for host-managed zoned block devicesDamien Le Moal2016-11-231-0/+20
| | | | | | | | | | | | | | | | | | | | The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted with zone alignment optimization. This is optional for host-aware devices, but mandatory for host-managed zoned block devices. So check that the feature is set in this latter case. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Use generic zoned block device terminologyDamien Le Moal2016-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | SMR stands for "Shingled Magnetic Recording" which makes sense only for hard disk drives (spinning rust). The ZBC/ZAC standards enable management of SMR disks, but solid state drives may also support those standards. So rename the HMSMR feature to BLKZONED to avoid a HDD centric terminology. For the same reason, rename f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Add missing break in switch-caseDamien Le Moal2016-11-231-0/+1
| | | | | | | | | | Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid infinite loop in the EIO case on recover_orphan_inodesJaegeuk Kim2016-11-231-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch should fix an infinite loop case below. F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs] F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix. ... [<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs] [<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs] [<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs] [<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs] [<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs] [<ffffffffb41dff21>] do_writepages+0x21/0x30 [<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760 [<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190 [<ffffffffb42a0959>] write_inode_now+0x99/0xc0 [<ffffffffb4289a16>] iput+0x1f6/0x2c0 [<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs] [<ffffffffb426c394>] ? sget_userns+0x4f4/0x530 [<ffffffffb426c692>] mount_bdev+0x182/0x1b0 [<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffffb426d038>] mount_fs+0x38/0x170 [<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160 [<ffffffffb4291d9e>] do_mount+0x1be/0xd60 [<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220 [<ffffffffb4292c54>] SyS_mount+0x94/0xd0 [<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove percpu_count due to performance regressionJaegeuk Kim2016-11-231-11/+5
| | | | | | | | | | | | | | This patch removes percpu_count usage due to performance regression in iozone. Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: keep dirty inodes selectively for checkpointJaegeuk Kim2016-11-231-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is to avoid no free segment bug during checkpoint caused by a number of dirty inodes. The case was reported by Chao like this. 1. mount with lazytime option 2. fill 4k file until disk is full 3. sync filesystem 4. read all files in the image 5. umount In this case, we actually don't need to flush dirty inode to inode page during checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to release discard entries during checkpointChao Yu2016-11-231-1/+0
| | | | | | | | | | | | | | | | | | | | In f2fs_fill_super, if there is any IO error occurs during recovery, cached discard entries will be leaked, in order to avoid this, make write_checkpoint() handle memory release by itself, besides, move clear_prefree_segments to write_checkpoint for readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | block,fs: use REQ_* flags directlyChristoph Hellwig2016-11-011-1/+1
|/ | | | | | | | | Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* f2fs: support checkpoint error injectionChao Yu2016-09-301-0/+1
| | | | | | | | | This patch adds to support checkpoint error injection in f2fs for testing fatal error tolerance, it will be useful that it can simulate abnormal power off by f2fs itself instead of calling godown ioctl by running apps. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to recover old fault injection config in ->remount_fsChao Yu2016-09-301-0/+6
| | | | | | | | In ->remount_fs, we didn't recover original fault injection config if we encounter error, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do fault injection initialization in default_optionsChao Yu2016-09-301-4/+4
| | | | | | | | Do fault injection initialization in default_options to keep consistent with other default option configurating. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support configuring fault injection per superblockChao Yu2016-09-301-42/+15
| | | | | | | | | | | | | | | | | Previously, we only support global fault injection configuration, so that when we configure type/rate of fault injection through sysfs, mount option, it will influence all f2fs partition which is being used. It is not make sence, since it will be not convenient if developer want to test separated partitions with different fault injection rate/type simultaneously, also it's not possible to enable fault injection in one partition and disable fault injection in other one. >From now on, we move global configuration of fault injection in module into per-superblock, hence injection testing can be more flexible. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: adjust display format of segment bitChao Yu2016-09-301-1/+1
| | | | | | | | | | | | | | | | | Just adjust segment bit info printed in procfs. Before: 1008 5|0 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1009 3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13 1010 3|1 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 After: 1008 5|0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1009 4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef 1010 4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove dirty inode pages in error pathJaegeuk Kim2016-09-301-0/+1
| | | | | | | | | When getting EIO while handling orphan inodes, we can get some dirty node pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try to flush node pages. But in this case, we should prevent to do that, since we will try again from the start. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: handle errors during recover_orphan_inodesJaegeuk Kim2016-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes to handle EIO during recover_orphan_inode() given the below panic. F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs] ------------[ cut here ]------------ RIP: 0010:[<ffffffffc0b244e3>] [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs] RSP: 0018:ffff92f8b7fb7c30 EFLAGS: 00010246 RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8 RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58 R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620 FS: 00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0 Stack: ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28 ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0 ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d Call Trace: [<ffffffffbc27ff55>] evict+0xc5/0x1a0 [<ffffffffbc28090d>] iput+0x1ad/0x2c0 [<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs] [<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs] [<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0 [<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffffbc264e49>] mount_fs+0x39/0x170 [<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160 [<ffffffffbc2881df>] do_mount+0x1cf/0xd00 [<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170 [<ffffffffbc289003>] SyS_mount+0x83/0xd0 [<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce cp_lock to protect updating of ckpt_flagsChao Yu2016-09-301-2/+3
| | | | | | | | | | This patch introduces spinlock to protect updating process of ckpt_flags field in struct f2fs_checkpoint, it avoids incorrectly updating in race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use crc and cp version to determine roll-forward recoveryJaegeuk Kim2016-09-301-1/+4
| | | | | | | | | | | | | | Previously, we used cp_version only to detect recoverable dnodes. In order to avoid same garbage cp_version, we needed to truncate the next dnode during checkpoint, resulting in additional discard or data write. If we can distinguish this by using crc in addition to cp_version, we can remove this overhead. There is backward compatibility concern where it changes node_footer layout. So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to detect new layout. New layout will be activated only when this flag is set. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support IO error injectionChao Yu2016-09-221-0/+1
| | | | | | | | This patch adds to support IO error injection for testing IO error tolerance of f2fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: enable inline_dentry by default and add noinline_dentry optionChao Yu2016-08-291-0/+8
| | | | | | | | | Make inline_dentry as default mount option to improve space usage and IO performance in scenario of numerous small directory. It adds noinline_dentry mount option, instead. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Revert "f2fs: use percpu_rw_semaphore"Jaegeuk Kim2016-08-191-5/+1
| | | | | | | | | LKP reported -36.3% regression of fsmark.files_per_sec due to this patch. I've confirmed that fxmark [1] has also slight regression for DWAL. [1] https://github.com/sslab-gatech/fxmark This reverts commit ec795418c41850056feb956534edf059dc1155d4.
* Merge branch 'work.misc' of ↵Linus Torvalds2016-07-281-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...
| * drop redundant ->owner initializationsAl Viro2016-05-291-1/+0
| | | | | | | | | | | | | | it's not needed for file_operations of inodes located on fs defined in the hosting module and for file_operations that go into procfs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | f2fs: fix to avoid data update racing between GC and DIOChao Yu2016-07-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Datas in file can be operated by GC and DIO simultaneously, so we will face race case as below: For write case: Thread A Thread B - generic_file_direct_write - invalidate_inode_pages2_range - f2fs_direct_IO - do_blockdev_direct_IO - do_direct_IO - get_more_blocks - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - do_write_data_page migrate data block to new block address - dio_bio_submit update user data to old block address For read case: Thread A Thread B - generic_file_direct_write - invalidate_inode_pages2_range - f2fs_direct_IO - do_blockdev_direct_IO - do_direct_IO - get_more_blocks - f2fs_balance_fs - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - do_write_data_page migrate data block to new block address - write_checkpoint - do_checkpoint - clear_prefree_segments - f2fs_issue_discard discard old block adress - dio_bio_submit update user buffer from obsolete block address In order to fix this, for one file, we should let DIO and GC getting exclusion against with each other. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: avoid mark_inode_dirtyJaegeuk Kim2016-07-081-17/+26
| | | | | | | | | | | | Let's check inode's dirtiness before calling mark_inode_dirty. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: fix incorrect f_bfree calculation in ->statfsChao Yu2016-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | As manual described, f_bfree indicates total free blocks in fs, in f2fs, it includes two parts: visible free blocks and over-provision blocks. This patch corrrects the calculation. fsblkcnt_t f_bfree; /* free blocks in fs */ Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: use percpu_rw_semaphoreJaegeuk Kim2016-07-081-1/+5
| | | | | | | | | | | | | | | | This patch replaces rw_semaphore with percpu_rw_semaphore for: sbi->cp_rwsem nm_i->nat_tree_lock Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: add nodiscard mount optionChao Yu2016-07-081-0/+4
| | | | | | | | | | | | | | This patch adds 'nodiscard' mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: detect host-managed SMR by feature flagJaegeuk Kim2016-07-061-6/+10
| | | | | | | | | | | | | | If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: call update_inode_page for orphan inodesJaegeuk Kim2016-07-061-15/+1
| | | | | | | | | | | | Let's store orphan inode pages right away. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: introduce mode=lfs mount optionJaegeuk Kim2016-06-131-0/+28
| | | | | | | | | | | | | | | | | | This mount option is to enable original log-structured filesystem forcefully. So, there should be no random writes for main area. Especially, this supports host-managed SMR device. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: avoid reverse IO order for NODE and DATAJaegeuk Kim2016-06-081-0/+2
| | | | | | | | | | | | | | There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(), which incur unnecessary reversed bio submission. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: remove obsolete parameter in f2fs_truncateJaegeuk Kim2016-06-071-1/+1
| | | | | | | | | | | | We don't need lock parameter, which is always true. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: avoid wrong count on dirty inodesJaegeuk Kim2016-06-071-2/+2
| | | | | | | | | | | | | | The number should be covered by spin_lock. Otherwise we can see wrong count in f2fs_stat. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: inject to produce some orphan inodesJaegeuk Kim2016-06-021-0/+1
| | | | | | | | Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: remove writepages lockJaegeuk Kim2016-06-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | This patch removes writepages lock. We can improve multi-threading performance. tiobench, 32 threads, 4KB write per fsync on SSD Before: 25.88 MB/s After: 28.03 MB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: set flush_merge by defaultJaegeuk Kim2016-06-021-0/+6
| | | | | | | | | | | | This patch sets flush_merge by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: add lazytime mount optionJaegeuk Kim2016-06-021-0/+14
| | | | | | | | | | | | This patch adds lazytime support. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: avoid unnecessary updating inode during fsyncJaegeuk Kim2016-06-021-0/+4
| | | | | | | | | | | | | | If roll-forward recovery can recover i_size, we don't need to update inode's metadata during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: flush inode metadata when checkpoint is doingJaegeuk Kim2016-06-021-2/+52
| | | | | | | | | | | | | | This patch registers all the inodes which have dirty metadata to sync when checkpoint is doing. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>