Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
--ge 6 verbose unhealthy
--eq 4 verbose restart
--eq 2 restart:b
+% 10 verbose restart:b unhealthy
+-ge 2 verbose unhealthy
setup_nfs
rpc_services_down "nfs"
-iterate_test 6 'ok_null' \
- 2 'rpc_set_service_failure_response "nfsd"' \
- 4 'rpc_set_service_failure_response "nfsd"' \
- 6 'rpc_set_service_failure_response "nfsd"'
+iterate_test 10 'rpc_set_service_failure_response "nfsd"'
CTDB_NFS_DUMP_STUCK_THREADS=5
FAKE_NFSD_THREAD_PIDS=""
-iterate_test 6 'ok_null' \
- 2 'rpc_set_service_failure_response "nfsd"' \
- 4 'rpc_set_service_failure_response "nfsd"' \
- 6 'rpc_set_service_failure_response "nfsd"'
+iterate_test 10 'rpc_set_service_failure_response "nfsd"'
CTDB_NFS_DUMP_STUCK_THREADS=5
FAKE_NFSD_THREAD_PIDS="1001 1002 1003"
-iterate_test 6 'ok_null' \
- 2 'rpc_set_service_failure_response "nfsd"' \
- 4 'rpc_set_service_failure_response "nfsd"' \
- 6 'rpc_set_service_failure_response "nfsd"'
+iterate_test 10 'rpc_set_service_failure_response "nfsd"'