ctdb-recoverd: Always check for recmaster before doing recovery
authorAmitay Isaacs <amitay@gmail.com>
Tue, 6 Oct 2015 06:31:41 +0000 (17:31 +1100)
committerAmitay Isaacs <amitay@samba.org>
Wed, 7 Oct 2015 15:55:05 +0000 (17:55 +0200)
Recovery daemon checks if it is the recovery master before performing
certain checks.  During those checks it's possible that re-election can
change the recmaster.  In such a case, the recovery daemon should never
do a database recovery.

This is not complete fix since the recovery master can still change
while the recovery is going on.  The correct fix is to abort recovery
if the recovery master changes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Oct  7 17:55:05 CEST 2015 on sn-devel-104

ctdb/server/ctdb_recoverd.c

index 3c7f7f449cf14b1b8ae28564c7f32c479883a6d3..116531800cb6dfbd158e3c9ed6f53b48df38d80e 100644 (file)
@@ -2103,6 +2103,24 @@ static int do_recovery(struct ctdb_recoverd *rec,
 
        DEBUG(DEBUG_NOTICE, (__location__ " Starting do_recovery\n"));
 
+       /* Check if the current node is still the recmaster.  It's possible that
+        * re-election has changed the recmaster, but we have not yet updated
+        * that information.
+        */
+       ret = ctdb_ctrl_getrecmaster(ctdb, mem_ctx, CONTROL_TIMEOUT(),
+                                    pnn, &ctdb->recovery_master);
+       if (ret != 0) {
+               DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster\n"));
+               return -1;
+       }
+
+       if (pnn != ctdb->recovery_master) {
+               DEBUG(DEBUG_NOTICE,
+                     ("Recovery master changed to %u, aborting recovery\n",
+                      ctdb->recovery_master));
+               return -1;
+       }
+
        /* if recovery fails, force it again */
        rec->need_recovery = true;