btrfs: use tagged writepage to mitigate livelock of snapshot

[ Upstream commit 3cd24c698004d2f7668e0eb9fc1f096f533c791b ] Snapshot is expected to be fast. But if there are writers steadily creating dirty pages in our subvolume, the snapshot may take a very long time to complete. To fix the problem, we use tagged writepage for snapshot flusher as we do in the generic write_cache_pages(), so we can omit pages dirtied after the snapshot command. This does not change the semantics regarding which data get to the snapshot, if there are pages being dirtied during the snapshotting operation. There's a sync called before snapshot is taken in old/new case, any IO in flight just after that may be in the snapshot but this depends on other system effects that might still sync the IO. We do a simple snapshot speed test on a Intel D-1531 box: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 1m58sec patched: 6.54sec This is the best case for this patch since for a sequential write case, we omit nearly all pages dirtied after the snapshot command. For a multi writers, random write test: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 15.83sec patched: 10.35sec The improvement is smaller compared to the sequential write case, since we omit only half of the pages dirtied after snapshot command. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
author: Ethan Lien <ethanlien@synology.com> 2018-11-01 14:49:03 +0800
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2019-02-12 19:47:11 +0100
commit: f5d5b54349125f4765b51153960676ea4d81b82b (patch)
tree: f581137536914c2f1ce35f7b42142e383ea6625e /fs/btrfs/extent_io.c
parent: 4d54106091517b73e01204bfab981ed77d9d63a8 (diff)
download: linux-stable-f5d5b54349125f4765b51153960676ea4d81b82b.tar.gz
linux-stable-f5d5b54349125f4765b51153960676ea4d81b82b.tar.bz2
linux-stable-f5d5b54349125f4765b51153960676ea4d81b82b.zip
1 files changed, 15 insertions, 2 deletions
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4dd6faab02bb..79f82f2ec4d5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3928,12 +3928,25 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			range_whole = 1;
 		scanned = 1;
 	}
-	if (wbc->sync_mode == WB_SYNC_ALL)
+
+	/*
+	 * We do the tagged writepage as long as the snapshot flush bit is set
+	 * and we are the first one who do the filemap_flush() on this inode.
+	 *
+	 * The nr_to_write == LONG_MAX is needed to make sure other flushers do
+	 * not race in and drop the bit.
+	 */
+	if (range_whole && wbc->nr_to_write == LONG_MAX &&
+	    test_and_clear_bit(BTRFS_INODE_SNAPSHOT_FLUSH,
+			       &BTRFS_I(inode)->runtime_flags))
+		wbc->tagged_writepages = 1;
+
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
 		tag = PAGECACHE_TAG_TOWRITE;
 	else
 		tag = PAGECACHE_TAG_DIRTY;
 retry:
-	if (wbc->sync_mode == WB_SYNC_ALL)
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
 		tag_pages_for_writeback(mapping, index, end);
 	done_index = index;
 	while (!done && !nr_to_write_done && (index <= end) &&
author	Ethan Lien <ethanlien@synology.com>	2018-11-01 14:49:03 +0800
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-12 19:47:11 +0100
commit	f5d5b54349125f4765b51153960676ea4d81b82b (patch)
tree	f581137536914c2f1ce35f7b42142e383ea6625e /fs/btrfs/extent_io.c
parent	4d54106091517b73e01204bfab981ed77d9d63a8 (diff)
download	linux-stable-f5d5b54349125f4765b51153960676ea4d81b82b.tar.gz linux-stable-f5d5b54349125f4765b51153960676ea4d81b82b.tar.bz2 linux-stable-f5d5b54349125f4765b51153960676ea4d81b82b.zip