linux.git - Linux kernel mainline tree

diff options

author	Xue jiufei <xuejiufei@huawei.com>	2014-06-04 16:06:14 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:53:54 -0700
commit	01c6222f876062355599e5a63560c514b6de25d2 (patch)
tree	bf29b34aac8ea3a95fefc0e11873ea3a77ffa858 /fs/gfs2
parent	a9e9acaeb0a981a6dfa54b32dd756103aeefa6a7 (diff)
download	linux-01c6222f876062355599e5a63560c514b6de25d2.tar.gz linux-01c6222f876062355599e5a63560c514b6de25d2.tar.bz2 linux-01c6222f876062355599e5a63560c514b6de25d2.zip

ocfs2/dlm: disallow node joining when recovery is on going

We found a race situation when dlm recovery and node joining occurs simultaneously if the network state is bad. N1 N4 start joining dlm and send query join to all live nodes set joining node to N1, return OK send query join to other live nodes and it may take a while call dlm_send_join_assert() to send assert join message when N2 is down, so keep trying to send message to N2 until find N2 is down send assert join message to N3, but connection is down with N3, so it may take a while become the recovery master for N2 and send begin reco message to other nodes in domain map but no N1 connection with N3 is rebuild, then send assert join to N4 call dlm_assert_joined_handler(), add N1 to domain_map dlm recovery done, send finalize message to nodes in domain map, including N1 receiving finalize message, trigger the BUG() because recovery master mismatch. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Diffstat (limited to 'fs/gfs2')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: