ocfs2: o2hb: add negotiate timer
This series of patches is to fix the issue that when storage down, all nodes will fence self due to write timeout. With this patch set, all nodes will keep going until storage back online, except if the following issue happens, then all nodes will do as before to fence self. 1. io error got 2. network between nodes down 3. nodes panic This patch (of 6): When storage down, all nodes will fence self due to write timeout. The negotiate timer is designed to avoid this, with it node will wait until storage up again. Negotiate timer working in the following way: 1. The timer expires before write timeout timer, its timeout is half of write timeout now. It is re-queued along with write timeout timer. If expires, it will send NEGO_TIMEOUT message to master node(node with lowest node number). This message does nothing but marks a bit in a bitmap recording which nodes are negotiating timeout on master node. 2. If storage down, nodes will send this message to master node, then when master node finds its bitmap including all online nodes, it sends NEGO_APPROVL message to all nodes one by one, this message will re-queue write timeout timer and negotiate timer. For any node doesn't receive this message or meets some issue when handling this message, it will be fenced. If storage up at any time, o2hb_thread will run and re-queue all the timer, nothing will be affected by these two steps. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Please register or sign in to comment