git.samba.org - ctdb.git/commit

author	Martin Schwenke <martin@meltin.net>
	Fri, 7 Oct 2011 04:00:42 +0000 (15:00 +1100)
committer	Ronnie Sahlberg <ronniesahlberg@gmail.com>
	Thu, 13 Oct 2011 06:15:41 +0000 (17:15 +1100)
commit	e1cd38eee86ec3d826ba587aa29e587ec7384e56
tree	7e8652a033ea09e0f3c458b26c07e7bb733dbd29	tree
parent	7b0ddb7b3b4b4ce42ee40872b66269920d9f472a	commit \| diff

Make ctdb_diagnostics more resilient to uncontactable nodes.

Current behaviour is for onnode to timeout (for about 20s) for each
attempted ssh to a down node.  With 40 or 50 invocations of onnode
this takes a long time.

2 changes to work around this:

* If EXTRA_SSH_OPTS (which is passed to ssh by onnode) does not
  contains a ConnectTimeout= setting then add a setting for a 5 second
  timeout.

* Filter the nodes before starting any diagnosis, taking out any "bad
  nodes" that are uncontactable via onnode.

  In the nodes summary at the beginning of the output, print
  information about any "bad nodes".

Signed-off-by: Martin Schwenke <martin@meltin.net>