summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/btrfs_inode.h
diff options
context:
space:
mode:
authorFilipe Manana <fdmanana@suse.com>2016-05-12 13:53:36 +0100
committerFilipe Manana <fdmanana@suse.com>2016-05-13 01:59:36 +0100
commit5f9a8a51d8b95505d8de8b7191ae2ed8c504d4af (patch)
treed97a7f5d321694e09c3046e9027c23a02d6a5878 /fs/btrfs/btrfs_inode.h
parentf78c436c3931e7df713688028f2b4faf72bf9f2a (diff)
downloadlinux-stable-5f9a8a51d8b95505d8de8b7191ae2ed8c504d4af.tar.gz
linux-stable-5f9a8a51d8b95505d8de8b7191ae2ed8c504d4af.tar.bz2
linux-stable-5f9a8a51d8b95505d8de8b7191ae2ed8c504d4af.zip
Btrfs: add semaphore to synchronize direct IO writes with fsync
Due to the optimization of lockless direct IO writes (the inode's i_mutex is not held) introduced in commit 38851cc19adb ("Btrfs: implement unlocked dio write"), we started having races between such writes with concurrent fsync operations that use the fast fsync path. These races were addressed in the patches titled "Btrfs: fix race between fsync and lockless direct IO writes" and "Btrfs: fix race between fsync and direct IO writes for prealloc extents". The races happened because the direct IO path, like every other write path, does create extent maps followed by the corresponding ordered extents while the fast fsync path collected first ordered extents and then it collected extent maps. This made it possible to log file extent items (based on the collected extent maps) without waiting for the corresponding ordered extents to complete (get their IO done). The two fixes mentioned before added a solution that consists of making the direct IO path create first the ordered extents and then the extent maps, while the fsync path attempts to collect any new ordered extents once it collects the extent maps. This was simple and did not require adding any synchonization primitive to any data structure (struct btrfs_inode for example) but it makes things more fragile for future development endeavours and adds an exceptional approach compared to the other write paths. This change adds a read-write semaphore to the btrfs inode structure and makes the direct IO path create the extent maps and the ordered extents while holding read access on that semaphore, while the fast fsync path collects extent maps and ordered extents while holding write access on that semaphore. The logic for direct IO write path is encapsulated in a new helper function that is used both for cow and nocow direct IO writes. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com>
Diffstat (limited to 'fs/btrfs/btrfs_inode.h')
-rw-r--r--fs/btrfs/btrfs_inode.h10
1 files changed, 10 insertions, 0 deletions
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 61205e3bbefa..1da5753d886d 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -196,6 +196,16 @@ struct btrfs_inode {
struct list_head delayed_iput;
long delayed_iput_count;
+ /*
+ * To avoid races between lockless (i_mutex not held) direct IO writes
+ * and concurrent fsync requests. Direct IO writes must acquire read
+ * access on this semaphore for creating an extent map and its
+ * corresponding ordered extent. The fast fsync path must acquire write
+ * access on this semaphore before it collects ordered extents and
+ * extent maps.
+ */
+ struct rw_semaphore dio_sem;
+
struct inode vfs_inode;
};