diff options
author | Zheng Liu <wenqing.lz@taobao.com> | 2013-07-01 08:12:37 -0400 |
---|---|---|
committer | Theodore Ts'o <tytso@mit.edu> | 2013-07-01 08:12:37 -0400 |
commit | d3922a777f9b4c4df898d326fa940f239af4f9b6 (patch) | |
tree | 76f63123d146bb7bbc4a3478e301bff68ab08ed7 /fs/ext4/super.c | |
parent | 2c00ef3ee309142041c7395f42aa1d49fc9f44b9 (diff) | |
download | linux-d3922a777f9b4c4df898d326fa940f239af4f9b6.tar.gz linux-d3922a777f9b4c4df898d326fa940f239af4f9b6.tar.bz2 linux-d3922a777f9b4c4df898d326fa940f239af4f9b6.zip |
ext4: improve extent cache shrink mechanism to avoid to burn CPU time
Now we maintain an proper in-order LRU list in ext4 to reclaim entries
from extent status tree when we are under heavy memory pressure. For
keeping this order, a spin lock is used to protect this list. But this
lock burns a lot of CPU time. We can use the following steps to trigger
it.
% cd /dev/shm
% dd if=/dev/zero of=ext4-img bs=1M count=2k
% mkfs.ext4 ext4-img
% mount -t ext4 -o loop ext4-img /mnt
% cd /mnt
% for ((i=0;i<160;i++)); do truncate -s 64g $i; done
% for ((i=0;i<160;i++)); do cp $i /dev/null &; done
% perf record -a -g
% perf report
This commit tries to fix this problem. Now a new member called
i_touch_when is added into ext4_inode_info to record the last access
time for an inode. Meanwhile we never need to keep a proper in-order
LRU list. So this can avoid to burns some CPU time. When we try to
reclaim some entries from extent status tree, we use list_sort() to get
a proper in-order list. Then we traverse this list to discard some
entries. In ext4_sb_info, we use s_es_last_sorted to record the last
time of sorting this list. When we traverse the list, we skip the inode
that is newer than this time, and move this inode to the tail of LRU
list. When the head of the list is newer than s_es_last_sorted, we will
sort the LRU list again.
In this commit, we break the loop if s_extent_cache_cnt == 0 because
that means that all extents in extent status tree have been reclaimed.
Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
changed to save a local variable in these functions.
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Diffstat (limited to 'fs/ext4/super.c')
-rw-r--r-- | fs/ext4/super.c | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 54701fca4515..cc8201180b30 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -773,7 +773,7 @@ static void ext4_put_super(struct super_block *sb) ext4_abort(sb, "Couldn't clean up the journal"); } - ext4_es_unregister_shrinker(sb); + ext4_es_unregister_shrinker(sbi); del_timer(&sbi->s_err_report); ext4_release_system_zone(sb); ext4_mb_release(sb); @@ -862,6 +862,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) rwlock_init(&ei->i_es_lock); INIT_LIST_HEAD(&ei->i_es_lru); ei->i_es_lru_nr = 0; + ei->i_touch_when = 0; ei->i_reserved_data_blocks = 0; ei->i_reserved_meta_blocks = 0; ei->i_allocated_meta_blocks = 0; @@ -3799,7 +3800,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_err_report.data = (unsigned long) sb; /* Register extent status tree shrinker */ - ext4_es_register_shrinker(sb); + ext4_es_register_shrinker(sbi); err = percpu_counter_init(&sbi->s_freeclusters_counter, ext4_count_free_clusters(sb)); @@ -4127,7 +4128,7 @@ failed_mount_wq: sbi->s_journal = NULL; } failed_mount3: - ext4_es_unregister_shrinker(sb); + ext4_es_unregister_shrinker(sbi); del_timer(&sbi->s_err_report); if (sbi->s_flex_groups) ext4_kvfree(sbi->s_flex_groups); |