ctdb-tests: Try harder to avoid failures due to repeated recoveries
authorMartin Schwenke <martin@meltin.net>
Tue, 10 Jun 2014 05:16:44 +0000 (15:16 +1000)
committerAmitay Isaacs <amitay@samba.org>
Thu, 19 Jun 2014 21:41:13 +0000 (23:41 +0200)
commit6a552f1a12ebe43f946bbbee2a3846b5a640ae4f
tree48a7da00070e52f9516dc2b756652f3d8af85d09
parent364bdadde3159dde1ddcc8c5fa4be981448f6833
ctdb-tests: Try harder to avoid failures due to repeated recoveries

About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery.  This was to avoid unexpected
recoveries causing tests to fail.  However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.

Instead, have cluster_is_healthy() warn if the cluster is in recovery.

Also:

* Rename wait_until_healthy() to wait_until_ready() because it waits
  until both healthy and out of recovery.

* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
  add a loop to wait (for 2 seconds at a time) if the cluster is back
  in recovery.  The logic here is that the re-recovery timeout has
  been set to 1 second, so sleeping for just 1 second might race
  against the next recovery.

* Use reverse logic in node_has_status() so that it works for "all".

* Tweak wait_until() so that it can handle timeouts with a
  recheck-interval specified.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb/tests/complex/34_nfs_tickle_restart.sh
ctdb/tests/scripts/integration.bash