ctdbd: Stop takeovers and releases from colliding in mid-air
There's a race here where release and takeover events for an IP can
run at the same time. For example, a "ctdb deleteip" and a takeover
initiated by the recovery daemon. The timeline is as follows:
1. The release code registers a callback to update the VNN. The
callback is executed *after* the eventscripts run the releaseip
event.
2. The release code calls the eventscripts for the releaseip event,
removing IP from its interface.
The takeover code "updates" the VNN saying that IP is on some
iface.... even if/though the address is already there.
3. The release callback runs, removing the iface associated with IP in
the VNN.
The takeover code calls the eventscripts for the takeip event,
adding IP to an interface.
As a result, CTDB doesn't think it should be hosting IP but IP is on
an interface. The recovery daemon fixes this later... but it
shouldn't happen.
This patch can cause some additional noise in the logs:
Release of IP 10.0.2.133/24 on interface eth2 node:2
recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it.
Release of IP 10.0.2.133/24 rejected update for this IP already in flight
recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed
recoverd:Failed to release local ip address
In this case the node has started releasing an IP when the recovery
daemon notices the addresses is still hosted and initiates another
release. This noise is harmless but annoying.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit
bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)