git.samba.org - ctdb.git/commit

author	Amitay Isaacs <amitay@gmail.com>
	Tue, 23 Oct 2012 05:23:12 +0000 (16:23 +1100)
committer	Amitay Isaacs <amitay@gmail.com>
	Thu, 22 Nov 2012 02:01:36 +0000 (13:01 +1100)
commit	6479566a0a104b903f499979db594541ffc00a1f
tree	84a992f7b5337c447b99c12f5d9c3f88579f3a88	tree
parent	5205d545e8d8c72d73b9d5fd148df6de30392fc8	commit \| diff

recoverd: Track the nodes that fail takeover run and set culprit count

If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:

"Unable to setup public takeover addresses. Try again later"

And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.

In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: a5c6bb1fffb8dc3960af113957a1fd080cc7c245

Conflicts:
include/ctdb_private.h
server/ctdb_takeover.c

include/ctdb_private.h		diff \| blob \| history
server/ctdb_recoverd.c		diff \| blob \| history
server/ctdb_takeover.c		diff \| blob \| history