recoverd: Track the nodes that fail takeover run and set culprit count
If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:
"Unable to setup public takeover addresses. Try again later"
And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.
In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from:
cbe68821180e04988edf186dcf6d042edcab81de
Conflicts:
server/ctdb_recoverd.c