gfs2: Rework freeze / thaw logic

So far, at mount time, gfs2 would take the freeze glock in shared mode and then immediately drop it again, turning it into a cached glock that can be reclaimed at any time. To freeze the filesystem cluster-wide, the node initiating the freeze would take the freeze glock in exclusive mode, which would cause the freeze glock's freeze_go_sync() callback to run on each node. There, gfs2 would freeze the filesystem and schedule gfs2_freeze_func() to run. gfs2_freeze_func() would re-acquire the freeze glock in shared mode, thaw the filesystem, and drop the freeze glock again. The initiating node would keep the freeze glock held in exclusive mode. To thaw the filesystem, the initiating node would drop the freeze glock again, which would allow gfs2_freeze_func() to resume on all nodes, leaving the filesystem in the thawed state. It turns out that in freeze_go_sync(), we cannot reliably and safely freeze the filesystem. This is primarily because the final unmount of a filesystem takes a write lock on the s_umount rw semaphore before calling into gfs2_put_super(), and freeze_go_sync() needs to call freeze_super() which also takes a write lock on the same semaphore, causing a deadlock. We could work around this by trying to take an active reference on the super block first, which would prevent unmount from running at the same time. But that can fail, and freeze_go_sync() isn't actually allowed to fail. To get around this, this patch changes the freeze glock locking scheme as follows: At mount time, each node takes the freeze glock in shared mode. To freeze a filesystem, the initiating node first freezes the filesystem locally and then drops and re-acquires the freeze glock in exclusive mode. All other nodes notice that there is contention on the freeze glock in their go_callback callbacks, and they schedule gfs2_freeze_func() to run. There, they freeze the filesystem locally and drop and re-acquire the freeze glock before re-thawing the filesystem. This is happening outside of the glock state engine, so there, we are allowed to fail. From a cluster point of view, taking and immediately dropping a glock is indistinguishable from taking the glock and only dropping it upon contention, so this new scheme is compatible with the old one. Thanks to Li Dong <lidong@vivo.com> for reporting a locking bug in gfs2_freeze_func() in a previous version of this commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
author: Andreas Gruenbacher <agruenba@redhat.com> 2022-11-14 23:34:50 +0100
committer: Andreas Gruenbacher <agruenba@redhat.com> 2023-07-03 22:25:02 +0200
commit: b77b4a4815a9651d1d6e07b8e6548eee9531a5eb (patch)
tree: a03c5c47deba377022ca106ea42c99f5765bccde /fs/gfs2/glops.c
parent: cad1e15804a83afd9a5c1d95a428d60d1f9c0340 (diff)
download: linux-b77b4a4815a9651d1d6e07b8e6548eee9531a5eb.tar.gz
linux-b77b4a4815a9651d1d6e07b8e6548eee9531a5eb.tar.bz2
linux-b77b4a4815a9651d1d6e07b8e6548eee9531a5eb.zip
1 files changed, 19 insertions, 33 deletions
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 01d433ed6ce7..7c48c7afd6a4 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -561,47 +561,33 @@ static void inode_go_dump(struct seq_file *seq, struct gfs2_glock *gl,
 }
 
 /**
- * freeze_go_sync - promote/demote the freeze glock
+ * freeze_go_callback - A cluster node is requesting a freeze
  * @gl: the glock
+ * @remote: true if this came from a different cluster node
  */
 
-static int freeze_go_sync(struct gfs2_glock *gl)
+static void freeze_go_callback(struct gfs2_glock *gl, bool remote)
 {
-	int error = 0;
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+	struct super_block *sb = sdp->sd_vfs;
+
+	if (!remote ||
+	    gl->gl_state != LM_ST_SHARED ||
+	    gl->gl_demote_state != LM_ST_UNLOCKED)
+		return;
 
 	/*
-	 * We need to check gl_state == LM_ST_SHARED here and not gl_req ==
-	 * LM_ST_EXCLUSIVE. That's because when any node does a freeze,
-	 * all the nodes should have the freeze glock in SH mode and they all
-	 * call do_xmote: One for EX and the others for UN. They ALL must
-	 * freeze locally, and they ALL must queue freeze work. The freeze_work
-	 * calls freeze_func, which tries to reacquire the freeze glock in SH,
-	 * effectively waiting for the thaw on the node who holds it in EX.
-	 * Once thawed, the work func acquires the freeze glock in
-	 * SH and everybody goes back to thawed.
+	 * Try to get an active super block reference to prevent racing with
+	 * unmount (see trylock_super()).  But note that unmount isn't the only
+	 * place where a write lock on s_umount is taken, and we can fail here
+	 * because of things like remount as well.
 	 */
-	if (gl->gl_state == LM_ST_SHARED && !gfs2_withdrawn(sdp) &&
-	    !test_bit(SDF_NORECOVERY, &sdp->sd_flags)) {
-		atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
-		error = freeze_super(sdp->sd_vfs);
-		if (error) {
-			fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
-				error);
-			if (gfs2_withdrawn(sdp)) {
-				atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
-				return 0;
-			}
-			gfs2_assert_withdraw(sdp, 0);
-		}
-		queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work);
-		if (test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))
-			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_FREEZE |
-				       GFS2_LFC_FREEZE_GO_SYNC);
-		else /* read-only mounts */
-			atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
+	if (down_read_trylock(&sb->s_umount)) {
+		atomic_inc(&sb->s_active);
+		up_read(&sb->s_umount);
+		if (!queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work))
+			deactivate_super(sb);
 	}
-	return 0;
 }
 
 /**
@@ -761,9 +747,9 @@ const struct gfs2_glock_operations gfs2_rgrp_glops = {
 };
 
 const struct gfs2_glock_operations gfs2_freeze_glops = {
-	.go_sync = freeze_go_sync,
 	.go_xmote_bh = freeze_go_xmote_bh,
 	.go_demote_ok = freeze_go_demote_ok,
+	.go_callback = freeze_go_callback,
 	.go_type = LM_TYPE_NONDISK,
 	.go_flags = GLOF_NONDISK,
 };
author	Andreas Gruenbacher <agruenba@redhat.com>	2022-11-14 23:34:50 +0100
committer	Andreas Gruenbacher <agruenba@redhat.com>	2023-07-03 22:25:02 +0200
commit	b77b4a4815a9651d1d6e07b8e6548eee9531a5eb (patch)
tree	a03c5c47deba377022ca106ea42c99f5765bccde /fs/gfs2/glops.c
parent	cad1e15804a83afd9a5c1d95a428d60d1f9c0340 (diff)
download	linux-b77b4a4815a9651d1d6e07b8e6548eee9531a5eb.tar.gz linux-b77b4a4815a9651d1d6e07b8e6548eee9531a5eb.tar.bz2 linux-b77b4a4815a9651d1d6e07b8e6548eee9531a5eb.zip