ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl = ocfs2_iop_get_acl, .set_acl = ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. [sfr@canb.auug.org.au remove some inlines] Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Eric Ren <zren@suse.com> 2017-02-22 15:40:41 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2017-02-22 16:41:27 -0800
commit: 439a36b8ef38657f765b80b775e2885338d72451 (patch)
tree: 7552739472bfd86a4850ec2289d3a1df08839b43 /fs/ocfs2/ocfs2.h
parent: ca376b37867875b6f661bb24a3238636b74f766e (diff)
download: linux-439a36b8ef38657f765b80b775e2885338d72451.tar.gz
linux-439a36b8ef38657f765b80b775e2885338d72451.tar.bz2
linux-439a36b8ef38657f765b80b775e2885338d72451.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 7e5958b0be6b..0c39d71c67a1 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -172,6 +172,7 @@ struct ocfs2_lock_res {
 
 	struct list_head         l_blocked_list;
 	struct list_head         l_mask_waiters;
+	struct list_head	 l_holders;
 
 	unsigned long		 l_flags;
 	char                     l_name[OCFS2_LOCK_ID_MAX_LEN];
author	Eric Ren <zren@suse.com>	2017-02-22 15:40:41 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2017-02-22 16:41:27 -0800
commit	439a36b8ef38657f765b80b775e2885338d72451 (patch)
tree	7552739472bfd86a4850ec2289d3a1df08839b43 /fs/ocfs2/ocfs2.h
parent	ca376b37867875b6f661bb24a3238636b74f766e (diff)
download	linux-439a36b8ef38657f765b80b775e2885338d72451.tar.gz linux-439a36b8ef38657f765b80b775e2885338d72451.tar.bz2 linux-439a36b8ef38657f765b80b775e2885338d72451.zip