summaryrefslogtreecommitdiffstats
path: root/fs/ext4/extents.c
diff options
context:
space:
mode:
authorAditya Kali <adityakali@google.com>2011-09-09 19:20:51 -0400
committerTheodore Ts'o <tytso@mit.edu>2011-09-09 19:20:51 -0400
commit5356f2615cd558c57a1f7d7528d1ad4de3640d96 (patch)
treee3590bf14d9a21c4eb365105886382bfb1131b95 /fs/ext4/extents.c
parentd8990240d8c911064447f8aa5a440f9345a6d692 (diff)
downloadlinux-5356f2615cd558c57a1f7d7528d1ad4de3640d96.tar.gz
linux-5356f2615cd558c57a1f7d7528d1ad4de3640d96.tar.bz2
linux-5356f2615cd558c57a1f7d7528d1ad4de3640d96.zip
ext4: attempt to fix race in bigalloc code path
Currently, there exists a race between delayed allocated writes and the writeback when bigalloc feature is in use. The race was because we wanted to determine what blocks in a cluster are under delayed allocation and we were using buffer_delayed(bh) check for it. But, the writeback codepath clears this bit without any synchronization which resulted in a race and an ext4 warning similar to: EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0 reserved data blocks The race existed in two places. (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from writeback code path. (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where buffer_delayed(bh) is set. To fix (1), this patch introduces a new buffer_head state bit - BH_Da_Mapped. This bit is set under the protection of EXT4_I(inode)->i_data_sem when we have actually mapped the delayed allocated blocks during the writeout time. We can now reliably check for this bit inside ext4_find_delalloc_range() to determine whether the reservation for the blocks have already been claimed or not. To fix (2), it was necessary to set buffer_delay(bh) under the protection of i_data_sem. So, I extracted the very beginning of ext4_map_blocks into a new function - ext4_da_map_blocks() - and performed the required setting of bh_delay bit and the quota reservation under the protection of i_data_sem. These two fixes makes the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent, thus removing the race. Tested: I was able to reproduce the problem by running 'dd' and 'fsync' in parallel. Also, xfstests sometimes used to reproduce this race. After the fix both my test and xfstests were successful and no race (warning message) was observed. Google-Bug-Id: 4997027 Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Diffstat (limited to 'fs/ext4/extents.c')
-rw-r--r--fs/ext4/extents.c38
1 files changed, 12 insertions, 26 deletions
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9b119308daea..ad39627c1fbc 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3296,28 +3296,9 @@ static int ext4_find_delalloc_range(struct inode *inode,
while ((i >= lblk_start) && (i <= lblk_end)) {
page = find_get_page(mapping, index);
- if (!page || !PageDirty(page))
+ if (!page)
goto nextpage;
- if (PageWriteback(page)) {
- /*
- * This might be a race with allocation and writeout. In
- * this case we just assume that the rest of the range
- * will eventually be written and there wont be any
- * delalloc blocks left.
- * TODO: the above assumption is troublesome, but might
- * work better in practice. other option could be note
- * somewhere that the cluster is getting written out and
- * detect that here.
- */
- page_cache_release(page);
- trace_ext4_find_delalloc_range(inode,
- lblk_start, lblk_end,
- search_hint_reverse,
- 0, i);
- return 0;
- }
-
if (!page_has_buffers(page))
goto nextpage;
@@ -3340,7 +3321,11 @@ static int ext4_find_delalloc_range(struct inode *inode,
continue;
}
- if (buffer_delay(bh)) {
+ /* Check if the buffer is delayed allocated and that it
+ * is not yet mapped. (when da-buffers are mapped during
+ * their writeout, their da_mapped bit is set.)
+ */
+ if (buffer_delay(bh) && !buffer_da_mapped(bh)) {
page_cache_release(page);
trace_ext4_find_delalloc_range(inode,
lblk_start, lblk_end,
@@ -4106,6 +4091,7 @@ got_allocated_blocks:
ext4_da_update_reserve_space(inode, allocated_clusters,
1);
if (reserved_clusters < allocated_clusters) {
+ struct ext4_inode_info *ei = EXT4_I(inode);
int reservation = allocated_clusters -
reserved_clusters;
/*
@@ -4148,11 +4134,11 @@ got_allocated_blocks:
* remaining blocks finally gets written, we
* could claim them.
*/
- while (reservation) {
- ext4_da_reserve_space(inode,
- map->m_lblk);
- reservation--;
- }
+ dquot_reserve_block(inode,
+ EXT4_C2B(sbi, reservation));
+ spin_lock(&ei->i_block_reservation_lock);
+ ei->i_reserved_data_blocks += reservation;
+ spin_unlock(&ei->i_block_reservation_lock);
}
}
}