eventscripts: Become unhealthy faster on nfsd failure
authorMartin Schwenke <martin@meltin.net>
Mon, 12 Aug 2013 01:36:25 +0000 (11:36 +1000)
committerAmitay Isaacs <amitay@gmail.com>
Wed, 14 Aug 2013 06:10:30 +0000 (16:10 +1000)
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem.  Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures.  Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>
config/nfs-rpc-checks.d/20.nfsd.check
tests/eventscripts/60.nfs.monitor.112.sh
tests/eventscripts/60.nfs.monitor.113.sh
tests/eventscripts/60.nfs.monitor.114.sh

index d738a3245e5e2575b85b8d3a99d22495607fc29a..aa4a2e709ca33b67ffdc6ed7c5a398684383bda8 100644 (file)
@@ -1,3 +1,2 @@
--ge 6 verbose unhealthy
--eq 4 verbose restart
--eq 2 restart:b
+%   10 verbose restart:b unhealthy
+-ge  2 verbose unhealthy
index c5c39b26e6792222534a88dbfd4bf17c0c313f4c..49ee3357498c093e30d5cb1876317644b2a344fe 100755 (executable)
@@ -9,7 +9,4 @@ define_test "knfsd down, 6 iterations"
 setup_nfs
 rpc_services_down "nfs"
 
-iterate_test 6 'ok_null' \
-    2 'rpc_set_service_failure_response "nfsd"' \
-    4 'rpc_set_service_failure_response "nfsd"' \
-    6 'rpc_set_service_failure_response "nfsd"'
+iterate_test 10 'rpc_set_service_failure_response "nfsd"'
index caa49892a037f821ba010beb433b49c07507e96c..505df1b5275e49ddfa9c97e83e5047cb9c8d3923 100755 (executable)
@@ -12,7 +12,4 @@ rpc_services_down "nfs"
 CTDB_NFS_DUMP_STUCK_THREADS=5
 FAKE_NFSD_THREAD_PIDS=""
 
-iterate_test 6 'ok_null' \
-    2 'rpc_set_service_failure_response "nfsd"' \
-    4 'rpc_set_service_failure_response "nfsd"' \
-    6 'rpc_set_service_failure_response "nfsd"'
+iterate_test 10 'rpc_set_service_failure_response "nfsd"'
index 8279395cde8e47764aed0fd1de9a1d2452a5dcca..496f5e7dee274a60e9d553d65e9b5a48a301cecc 100755 (executable)
@@ -12,7 +12,4 @@ rpc_services_down "nfs"
 CTDB_NFS_DUMP_STUCK_THREADS=5
 FAKE_NFSD_THREAD_PIDS="1001 1002 1003"
 
-iterate_test 6 'ok_null' \
-    2 'rpc_set_service_failure_response "nfsd"' \
-    4 'rpc_set_service_failure_response "nfsd"' \
-    6 'rpc_set_service_failure_response "nfsd"'
+iterate_test 10 'rpc_set_service_failure_response "nfsd"'