diff options
author | Filipe Manana <fdmanana@suse.com> | 2020-07-22 12:28:37 +0100 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2020-07-27 12:55:47 +0200 |
commit | 3d6448e631591756da36efb3ea6355ff6f383c3a (patch) | |
tree | e685dc3f2eba36929926754ced2494b6e48a5f87 /fs/btrfs/block-group.c | |
parent | 88c4703f00a9e8ab60133690cfdaccea6c07e560 (diff) | |
download | linux-3d6448e631591756da36efb3ea6355ff6f383c3a.tar.gz linux-3d6448e631591756da36efb3ea6355ff6f383c3a.tar.bz2 linux-3d6448e631591756da36efb3ea6355ff6f383c3a.zip |
btrfs: fix race between page release and a fast fsync
When releasing an extent map, done through the page release callback, we
can race with an ongoing fast fsync and cause the fsync to miss a new
extent and not log it. The steps for this to happen are the following:
1) A page is dirtied for some inode I;
2) Writeback for that page is triggered by a path other than fsync, for
example by the system due to memory pressure;
3) When the ordered extent for the extent (a single 4K page) finishes,
we unpin the corresponding extent map and set its generation to N,
the current transaction's generation;
4) The btrfs_releasepage() callback is invoked by the system due to
memory pressure for that no longer dirty page of inode I;
5) At the same time, some task calls fsync on inode I, joins transaction
N, and at btrfs_log_inode() it sees that the inode does not have the
full sync flag set, so we proceed with a fast fsync. But before we get
into btrfs_log_changed_extents() and lock the inode's extent map tree:
6) Through btrfs_releasepage() we end up at try_release_extent_mapping()
and we remove the extent map for the new 4Kb extent, because it is
neither pinned anymore nor locked. By calling remove_extent_mapping(),
we remove the extent map from the list of modified extents, since the
extent map does not have the logging flag set. We unlock the inode's
extent map tree;
7) The task doing the fast fsync now enters btrfs_log_changed_extents(),
locks the inode's extent map tree and iterates its list of modified
extents, which no longer has the 4Kb extent in it, so it does not log
the extent;
8) The fsync finishes;
9) Before transaction N is committed, a power failure happens. After
replaying the log, the 4K extent of inode I will be missing, since
it was not logged due to the race with try_release_extent_mapping().
So fix this by teaching try_release_extent_mapping() to not remove an
extent map if it's still in the list of modified extents.
Fixes: ff44c6e36dc9dc ("Btrfs: do not hold the write_lock on the extent tree while logging")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/block-group.c')
0 files changed, 0 insertions, 0 deletions