sahlberg/ctdb.git
12 years agoTests - eventscripts - add output for "not implemented" in ctdb stub
Martin Schwenke [Mon, 22 Aug 2011 05:58:23 +0000 (15:58 +1000)]
Tests - eventscripts - add output for "not implemented" in ctdb stub

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - eventscripts - add an nmap stub
Martin Schwenke [Mon, 22 Aug 2011 05:56:57 +0000 (15:56 +1000)]
Tests - eventscripts - add an nmap stub

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - eventscripts - stop timeouts waiting for backgrounded testparm
Martin Schwenke [Fri, 19 Aug 2011 06:51:08 +0000 (16:51 +1000)]
Tests - eventscripts - stop timeouts waiting for backgrounded testparm

Not sleeping at all speeds up the tests.  However, it can also cause
timeouts.  Therefore, every time sleep is run we force the stub to do
a short 0.1s sleep instead of whatever is specified.  This should be
enough to avoid races.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - add getdebug and checktcpport to ctdb eventscripts stub
Martin Schwenke [Fri, 19 Aug 2011 03:54:49 +0000 (13:54 +1000)]
Tests - add getdebug and checktcpport to ctdb eventscripts stub

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - add hooks to simulate ctdb commands that aren't implemented
Martin Schwenke [Fri, 19 Aug 2011 03:54:20 +0000 (13:54 +1000)]
Tests - add hooks to simulate ctdb commands that aren't implemented

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - add eventscripts testing stub for sleep command.
Martin Schwenke [Fri, 19 Aug 2011 03:53:05 +0000 (13:53 +1000)]
Tests - add eventscripts testing stub for sleep command.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - Change variable used to fake listening TCP ports.
Martin Schwenke [Fri, 19 Aug 2011 01:35:38 +0000 (11:35 +1000)]
Tests - Change variable used to fake listening TCP ports.

Change from $FAKE_NETSTAT_TCP_LISTEN to $FAKE_TCP_LISTEN.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - new NFS share checking tests
Martin Schwenke [Fri, 19 Aug 2011 01:24:56 +0000 (11:24 +1000)]
Tests - new NFS share checking tests

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - eventscripts exportfs stub should splits lines
Martin Schwenke [Fri, 19 Aug 2011 01:22:51 +0000 (11:22 +1000)]
Tests - eventscripts exportfs stub should splits lines

The real exportfs splits lines longer than 15 characters.  The stub
should do that too...

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests - add -T (trace) option to eventscripts run_test.sh
Martin Schwenke [Fri, 19 Aug 2011 01:21:33 +0000 (11:21 +1000)]
Tests - add -T (trace) option to eventscripts run_test.sh

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - use ctdb scriptstatus -Y when replaying status
Martin Schwenke [Tue, 30 Aug 2011 06:31:17 +0000 (16:31 +1000)]
Eventscripts - use ctdb scriptstatus -Y when replaying status

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: add a synchronous synthetic reconfigure event.
Martin Schwenke [Mon, 16 May 2011 04:23:28 +0000 (14:23 +1000)]
Eventscripts: add a synchronous synthetic reconfigure event.

In the current code services can only be reconfigured asynchronously.
This means that configuration file changes can be made, an asychronous
reconfigure event can be triggered, and it always succeeds.  Some time
later when a service is actually reconfigured then a failure may be
seen

This adds a synthetic reconfigure event that reconfigures a service
synchronously so that any failure is reported on exit.

ctdb_service_check_reconfigure() is essentially reimplemented.

If a reconfigure event is in flight and an ipreallocated or monitor
event occurs then any scheduled asynchronous reconfigure is deferred
until the next monitor cycle.  This is to avoid reconfigures trampling
on each other.  In this case a monitor event will also replay the
previous status to try to avoid exposing any temporary instability.

If a reconfigure event collides with another reconfigure event it will
exit with status 2, indicating that the reconfigure should be retried.

The reconfigure event is implemented using a subprocess to control the
exit from the synthetic event.

As before, if a monitor event causes a scheduled synchronous
reconfigure to occure then it will replay the previous status for the
service, given that a reconfigure can cause temporary instability.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - call ctdb_check_args() in 00.ctdb
Martin Schwenke [Tue, 23 Aug 2011 06:43:53 +0000 (16:43 +1000)]
Eventscripts - call ctdb_check_args() in 00.ctdb

This is the first eventscript.  Sanity check as early as possible and
everyone benefits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - call ctdb_check_args() instead of doing hand checking
Martin Schwenke [Tue, 23 Aug 2011 06:36:19 +0000 (16:36 +1000)]
Eventscripts - call ctdb_check_args() instead of doing hand checking

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - new function ctdb_check_args()
Martin Schwenke [Tue, 23 Aug 2011 06:32:34 +0000 (16:32 +1000)]
Eventscripts - new function ctdb_check_args()

Pass this "$@" to do common eventscript argument checking.

For regular use putting this in 00.ctdb would be enough.  However, for
developer testing it can be useful to call this in other eventscripts.
For example, 10.interfaces and 13.per_ip_routing currently check these
by hand.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - ctdb_check_tcp_ports() bug fix.
Martin Schwenke [Fri, 19 Aug 2011 04:20:58 +0000 (14:20 +1000)]
Eventscripts - ctdb_check_tcp_ports() bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - fix debugging buglet in ctdb_check_tcp_ports_ctdb()
Martin Schwenke [Fri, 19 Aug 2011 03:55:55 +0000 (13:55 +1000)]
Eventscripts - fix debugging buglet in ctdb_check_tcp_ports_ctdb()

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: New configuration variable CTDB_SERVICE_AUTOSTARTSTOP.
Martin Schwenke [Mon, 8 Aug 2011 03:13:59 +0000 (13:13 +1000)]
Eventscripts: New configuration variable CTDB_SERVICE_AUTOSTARTSTOP.

Some of the current auto-start/stop logic is broken, particularly for
Samba.  Fixing it is non-trivial.

If $CTDB_SERVICE_AUTOSTARTSTOP is "yes" then auto-start/stop services
when told to newly manage or no longer manage them.  This defaults to
"yes".

However, if using a canned configuration file that doesn't set
$CTDB_SERVICE_AUTOSTARTSTOP then this stops the auto-start-stop logic
from working.  Therefore, this works around CQ S1026685 - on the
system in question another daemon controls service auto-start/stop and
CTDB just gets in the way.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - in 60.nfs uniquify the share check directory list
Martin Schwenke [Wed, 17 Aug 2011 07:42:07 +0000 (17:42 +1000)]
Eventscripts - in 60.nfs uniquify the share check directory list

There are sites that have multiple entries for the same export.  This
optimises the share check in this case.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoLogging: when we log stdout/stderr messages from eventscripts to the system log...
Ronnie Sahlberg [Thu, 25 Aug 2011 23:39:25 +0000 (09:39 +1000)]
Logging:  when we log stdout/stderr messages from eventscripts to the system log, prefix every line of output with the name of the eventscript.

CQ S1028412

12 years agoLibCTDB : update the ctdb tool to use libctdb to read the recovery mode
Ronnie Sahlberg [Tue, 23 Aug 2011 06:35:08 +0000 (16:35 +1000)]
LibCTDB : update the ctdb tool to use libctdb to read the recovery mode

12 years agoLibCTDB : uptade the ctdb tool to use libctdb to query for the recmaster
Ronnie Sahlberg [Tue, 23 Aug 2011 06:32:38 +0000 (16:32 +1000)]
LibCTDB : uptade the ctdb tool to use libctdb to query for the recmaster

12 years agoLibCTDB : initialize ctdb->pnn to -1 when we create a new context
Ronnie Sahlberg [Tue, 23 Aug 2011 06:15:34 +0000 (16:15 +1000)]
LibCTDB : initialize ctdb->pnn to -1 when we create a new context
but before we learn the pnn of the local node

12 years agoLibCTDB : change the ctdb_fetch_lock_once test tool to use libctdb instead of the...
Ronnie Sahlberg [Tue, 23 Aug 2011 05:13:40 +0000 (15:13 +1000)]
LibCTDB : change the ctdb_fetch_lock_once test tool to use libctdb instead of the old client

12 years agoLibCTDB : add support for getrecmode
Ronnie Sahlberg [Tue, 23 Aug 2011 05:00:27 +0000 (15:00 +1000)]
LibCTDB : add support for getrecmode

12 years agoLibCTDB: add commands where an application can query how many commands are active
Ronnie Sahlberg [Tue, 23 Aug 2011 02:43:16 +0000 (12:43 +1000)]
LibCTDB: add commands where an application can query how many commands are active
and we have not yet received a reply to.
Applications may use this command to query if it is "safe" to stop the event system and sleep
or whether it should first wait for all activity to ctdb daemons to cease first.

12 years agoFix a const warning
Volker Lendecke [Mon, 22 Aug 2011 14:40:58 +0000 (16:40 +0200)]
Fix a const warning

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agoRemove an unused variable
Volker Lendecke [Mon, 22 Aug 2011 14:39:32 +0000 (16:39 +0200)]
Remove an unused variable

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: "unpack_reply_control" does not need the ctdb_connection parameter
Volker Lendecke [Fri, 19 Aug 2011 15:05:36 +0000 (17:05 +0200)]
libctdb: "unpack_reply_control" does not need the ctdb_connection parameter

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: "unpack_reply_call" does not need the ctdb_connection parameter
Volker Lendecke [Fri, 19 Aug 2011 15:05:36 +0000 (17:05 +0200)]
libctdb: "unpack_reply_call" does not need the ctdb_connection parameter

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: "ctdb_request_free" does not need the ctdb_connection parameter
Volker Lendecke [Fri, 19 Aug 2011 15:05:36 +0000 (17:05 +0200)]
libctdb: "ctdb_request_free" does not need the ctdb_connection parameter

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Make sure ctdb_request->ctdb is filled correctly
Volker Lendecke [Fri, 19 Aug 2011 14:36:20 +0000 (16:36 +0200)]
libctdb: Make sure ctdb_request->ctdb is filled correctly

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Ensure 0-termination of sun_path
Volker Lendecke [Thu, 18 Aug 2011 12:47:09 +0000 (14:47 +0200)]
libctdb: Ensure 0-termination of sun_path

Rusty, please check!

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Fix a few format warnings
Volker Lendecke [Thu, 18 Aug 2011 11:59:48 +0000 (13:59 +0200)]
libctdb: Fix a few format warnings

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Add license header to messages.c
Volker Lendecke [Thu, 18 Aug 2011 11:57:58 +0000 (13:57 +0200)]
libctdb: Add license header to messages.c

Rusty, please check!

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Reorder attachdb
Volker Lendecke [Thu, 18 Aug 2011 11:37:23 +0000 (13:37 +0200)]
libctdb: Reorder attachdb

No code change, this is for easier reading the sequence of what happens

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Reorder set_message_handler
Volker Lendecke [Thu, 18 Aug 2011 11:55:24 +0000 (13:55 +0200)]
libctdb: Reorder set_message_handler

No code change, this is for better readability

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agolibctdb: Correct 4bfdfda, stddef.h is needed by libctdb_private.h
Volker Lendecke [Thu, 18 Aug 2011 11:54:36 +0000 (13:54 +0200)]
libctdb: Correct 4bfdfda, stddef.h is needed by libctdb_private.h

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agoAdd missing #include to libctdb/ctdb.c
Volker Lendecke [Wed, 17 Aug 2011 12:46:43 +0000 (14:46 +0200)]
Add missing #include to libctdb/ctdb.c

We need that to have the "offsetof" macro, thus we don't need to redeclare it
in libctdb_private.h

Signed-off-by: Michael Adam <obnox@samba.org>
12 years agoMerge remote branch 'martins/eventscripts'
Ronnie Sahlberg [Wed, 17 Aug 2011 04:10:04 +0000 (14:10 +1000)]
Merge remote branch 'martins/eventscripts'

12 years agoEventscripts - new default TCP port checker using "ctdb checktcpport"
Martin Schwenke [Wed, 17 Aug 2011 04:02:45 +0000 (14:02 +1000)]
Eventscripts - new default TCP port checker using "ctdb checktcpport"

New function ctdb_check_tcp_ports_ctdb().  This should be fast... and
is now the default checker.  If it fails in an unexpected way we fall
back to the nmap and netstat checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - generalise TCP port checking plus new nmap-based checker
Martin Schwenke [Wed, 17 Aug 2011 02:12:20 +0000 (12:12 +1000)]
Eventscripts - generalise TCP port checking plus new nmap-based checker

Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().

Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.

ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try.  Default
value is currently "nmap netstat".  If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed.  This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging
Martin Schwenke [Wed, 17 Aug 2011 00:27:01 +0000 (10:27 +1000)]
Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging

Use the new debug function to conditionally print the netstat output.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - weaken TCP port check message if CTDB has just been started.
Martin Schwenke [Fri, 5 Aug 2011 06:39:57 +0000 (16:39 +1000)]
Eventscripts - weaken TCP port check message if CTDB has just been started.

Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started.  The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.

This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service.  When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message.  This means that until the node actually becomes
healthy we see more friendly messages.

The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started.  This reduces the chances of people reporting such
false recreates...

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscript functions: optimise ctdb_check_tcp_ports() and add debug.
Martin Schwenke [Tue, 5 Jul 2011 01:32:06 +0000 (11:32 +1000)]
Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.

ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port.  There are 2 problems with this:

* Netstat is run on each loop iteration when it need only be run once.

* The -a option is used to list all connections but the function only
  cares about the listening ports.  There may be many thousands of
  non-listening ports to grep through.

This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option.  It also only runs netstat once before the
main loop.

When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: add a debug() function and call ctdb_set_current_debuglevel()
Martin Schwenke [Tue, 16 Aug 2011 23:44:11 +0000 (09:44 +1000)]
Eventscripts: add a debug() function and call ctdb_set_current_debuglevel()

The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG).  If no args are given
then use stdin - this allows the function to be used with here
documents.

To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoAdd a new command 'ctdb checktcpport <port>'
Ronnie Sahlberg [Wed, 17 Aug 2011 00:16:35 +0000 (10:16 +1000)]
Add a new command 'ctdb checktcpport <port>'
that tries to bind to the specified port on INADDR_ANY.

This can be used for testing if a service is listening to that port or not.

Errors are printed to stdout and the returned status code is either 0 : if we managed to bind to the port (in which case the service is NOT listening on that bort) or the value of errno that stopped us from binding to a port.

errno for EADDRINUSE is 98 so a script using this command should check the status code against the value 98.
If this command returns 98 it means the service is listening to the specified port.

12 years agodont use a too big persistence timeout value
Ronnie Sahlberg [Tue, 16 Aug 2011 23:59:42 +0000 (09:59 +1000)]
dont use a too big persistence timeout value

12 years agoEventscripts - conditionally inherit ctdbd debug level in each monitor event
Martin Schwenke [Tue, 16 Aug 2011 23:14:23 +0000 (09:14 +1000)]
Eventscripts - conditionally inherit ctdbd debug level in each monitor event

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - new function ctdb_set_current_debuglevel()
Martin Schwenke [Tue, 16 Aug 2011 23:00:46 +0000 (09:00 +1000)]
Eventscripts - new function ctdb_set_current_debuglevel()

This function ensures that CTDB_CURRENT_DEBUGLEVEL is set.  It works
like this:

1. If it is already set then do nothing, since it might have been set
   some other way.

   The recommended "other way" would be to add a file in rc.local.d/.

2. If it is not set then set it by sourcing
   /var/ctdb/eventscript_debuglevel.

3. If this file does not exist then create it using output from "ctdb
   getdebug".

If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.

If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - ensure the statd update-trigger file always exists.
Martin Schwenke [Tue, 16 Aug 2011 03:28:40 +0000 (13:28 +1000)]
Eventscripts - ensure the statd update-trigger file always exists.

See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: remove "return 0" from 50.samba service_stop().
Martin Schwenke [Tue, 16 Aug 2011 03:18:40 +0000 (13:18 +1000)]
Eventscripts: remove "return 0" from 50.samba service_stop().

This potentially masks errors and was basically included by accident.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoChange the errors for 10.interface to clearly state ERROR: for error messages
Ronnie Sahlberg [Mon, 15 Aug 2011 05:53:04 +0000 (15:53 +1000)]
Change the errors for 10.interface to clearly state ERROR: for error messages

Update the tests system to catch the new error strings generated by this change

12 years agoMerge remote branch 'martins/eventscript_tests'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:43:15 +0000 (15:43 +1000)]
Merge remote branch 'martins/eventscript_tests'

12 years agoTests - exportfs stub needs to print out export options.
Martin Schwenke [Mon, 15 Aug 2011 05:40:35 +0000 (15:40 +1000)]
Tests - exportfs stub needs to print out export options.

This is needed due to bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoMerge remote branch 'martins/eventscript.10.interface'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:27:50 +0000 (15:27 +1000)]
Merge remote branch 'martins/eventscript.10.interface'

12 years agoMerge remote branch 'martins/60_nfs_regression'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:22:20 +0000 (15:22 +1000)]
Merge remote branch 'martins/60_nfs_regression'

12 years agoMerge remote branch 'martins/eventscript.60.nfs.rpc'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:20:18 +0000 (15:20 +1000)]
Merge remote branch 'martins/eventscript.60.nfs.rpc'

12 years agoMerge remote branch 'martins/test_suite'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:16:06 +0000 (15:16 +1000)]
Merge remote branch 'martins/test_suite'

12 years agoMerge remote branch 'martins/eventscript_tests'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:15:12 +0000 (15:15 +1000)]
Merge remote branch 'martins/eventscript_tests'

12 years agoTests - ctdb listvars test should allow alphanumericals in tunable names.
Martin Schwenke [Mon, 15 Aug 2011 03:53:39 +0000 (13:53 +1000)]
Tests - ctdb listvars test should allow alphanumericals in tunable names.

This matches the new "LCP2PublicIPs" tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoChange the default for ip failover to be LCP2 and not DeterministicIPs
Ronnie Sahlberg [Mon, 15 Aug 2011 00:23:50 +0000 (10:23 +1000)]
Change the default for ip failover to be LCP2 and not DeterministicIPs

12 years agoEventscripts: 10.interfaces - make startup event actually mark interfaces up!
Martin Schwenke [Tue, 5 Jul 2011 07:21:57 +0000 (17:21 +1000)]
Eventscripts: 10.interfaces - make startup event actually mark interfaces up!

The startup event intends to mark interfaces up.  However, it doesn't
actually do that because $INTERFACES is empty.

This uses the function get_all_interfaces() to list the
interfaces... and then mark them up.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interfaces - startup comment says assume all interfaces good.
Martin Schwenke [Tue, 5 Jul 2011 07:20:09 +0000 (17:20 +1000)]
Eventscripts: 10.interfaces - startup comment says assume all interfaces good.

Interfaces are currently marked down.  Mark them up instead, as per
the comment... and discussion with Ronnie.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interfaces - new function get_all_interfaces().
Martin Schwenke [Tue, 5 Jul 2011 07:18:30 +0000 (17:18 +1000)]
Eventscripts: 10.interfaces - new function get_all_interfaces().

Move existing interface listing code to new function in preparation
for using it in startup event.

While we're here change the "sort | uniq" into "sort -u" and save some
complexity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interface clean-ups - minor tweaks and new comments.
Martin Schwenke [Tue, 28 Jun 2011 07:07:39 +0000 (17:07 +1000)]
Eventscripts: 10.interface clean-ups - minor tweaks and new comments.

* sed can read files, it doesn't need a file piped to it
* use $() subshells instead of `` - they seem to quote better in dash
* tweak the uniquifying code so that it is easier to read
* add comments
* remove some extraneous semicolons at ends of lines

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests: re-enable the NFS eventscript tests - they work again.
Martin Schwenke [Fri, 12 Aug 2011 06:30:54 +0000 (16:30 +1000)]
Tests: re-enable the NFS eventscript tests - they work again.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.
Martin Schwenke [Fri, 12 Aug 2011 06:28:09 +0000 (16:28 +1000)]
Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.

This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interface clean-ups - variable name fix-ups.
Martin Schwenke [Tue, 28 Jun 2011 06:50:47 +0000 (16:50 +1000)]
Eventscripts: 10.interface clean-ups - variable name fix-ups.

Change most of the uppercase variable names to lowercase for
consistency with other variables, readability and so they can be
easily distinguished from environment/configuration variables.  Change
the name of 2 of the variabless to add some clarity.  Changes are as
follows:

  INTERFACES   -> all_interfaces
  IFACES       -> ctdb_interfaces
  IFACE        -> iface
  I            -> i
  REALIFACE    -> realiface

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces().
Martin Schwenke [Tue, 28 Jun 2011 06:27:01 +0000 (16:27 +1000)]
Eventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces().

The logic in the monitor event itself is very complex.  Nearly all of
it can go away by adding a single check of
$CTDB_PARTIALLY_ONLINE_INTERFACES to the return logic of
monitor_interfaces() and reversing the sense of the corresponding
check.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interfaces clean-up - use more descriptive variable names.
Martin Schwenke [Tue, 28 Jun 2011 06:10:23 +0000 (16:10 +1000)]
Eventscripts: 10.interfaces clean-up - use more descriptive variable names.

The name of variable $ok gives no clue to its meaning/use so this
changes that variable to be named $up_interfaces_found.

The return logic relating to $ok and $fail is difficult to read, so
these variables are given true/fale values, allowing the return logic
to be simplified.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 10.interfaces cleanup - new functions mark_up(), mark_down().
Martin Schwenke [Tue, 28 Jun 2011 05:53:54 +0000 (15:53 +1000)]
Eventscripts: 10.interfaces cleanup - new functions mark_up(), mark_down().

The same few lines of logic are used every time an interface up or down.

This encapsulates those few lines in 2 new functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: change failure counts and behaviour for statd and nfsd.
Martin Schwenke [Thu, 13 Jan 2011 22:40:11 +0000 (09:40 +1100)]
Eventscripts: change failure counts and behaviour for statd and nfsd.

We reduce the number of failures before attempting a restart.
However, after 6 failures we mark the cluster unhealthy and no longer
try to restart.  If the previous 2 attempts didn't work then there
isn't any use in bogging the system down with an attempted restart on
every monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: clean up 60.nfs monitor event.
Martin Schwenke [Fri, 17 Dec 2010 05:25:04 +0000 (16:25 +1100)]
Eventscripts: clean up 60.nfs monitor event.

This adds a helper function called nfs_check_rpc_service() and uses it
to make the monitor event much more readable.  An example of usage is
as follows:

  nfs_check_rpc_service "mountd" \
    -ge 10 "verbose restart:b unhealthy" \
    -eq 5 "restart:b"

The first argument to nfs_check_rpc_service() is the name of the RPC
service to be checked.  The RPC service corresponding to this command
is checked for availability using the rpcinfo command.  If the service
is available then the function succeeds and subsequent arguments are
ignored.

If the rpcinfo check fails then a failure counter for that particular
RPC service is incremented and subsequent arguments are processed in
groups of 3:

1. An integer comparison operator supported by test.
2. An integer failure limit.
3. An action string.

The value of the failure counter is checked using (1) and (2) above.
The first check that succeeds has its action string processed - note
that this explains the somewhat curious reverse ordering of checks.

It the example above:

* If the counter is >= 10 then a verbose message is printed
  describing the failure, the service is restarted in the background
  and the node is marked as unhealthy (via an "exit 1" from the
  function).

* If the counter is == 5 then the service us restarted in the
  background.

For more action options please see the code.

This also changes the ctdb_check_rpc() function so that it no longer
takes a program number to check.  It now just takes a real RPC program
name that rpcinfo can resolve via /etc/rpc.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoTests: Re-enable the Samba eventscript tests.
Martin Schwenke [Thu, 11 Aug 2011 05:33:46 +0000 (15:33 +1000)]
Tests: Re-enable the Samba eventscript tests.

They work again.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoRevert "Tests: tweak some samba tests to cope with debug from ctdb_check_tcp_ports()."
Martin Schwenke [Thu, 11 Aug 2011 05:32:28 +0000 (15:32 +1000)]
Revert "Tests: tweak some samba tests to cope with debug from ctdb_check_tcp_ports()."

This reverts commit 557ac30e60516742da10b83bfbbbb41430c977a2.

12 years agoEventscripts: fix regression in 60.nfs export checking.
Martin Schwenke [Wed, 13 Apr 2011 02:37:42 +0000 (12:37 +1000)]
Eventscripts: fix regression in 60.nfs export checking.

Commit 35a60a63a9b5c7d98dde514ae552239506b691c9 introduced a
regression, reported by "Jonathan Buzzard" <J.Buzzard@dundee.ac.uk>,
as follows:

  Basically the use of sed in the following code snippet does not work
  for long exports where exportfs wraps the host or network onto the
  next line.

         exportfs | grep -v '^#' | grep '^/' |
         sed -e 's/[[:space:]]*[^[:space:]]*$//' |
         ctdb_check_directories

  The result is that the you get lots of blank lines being sent to
  ctdb_check_directories which causes the host to be marked as
  unhealthy and then thrashing sets in of the managed IP's making the
  whole cluster unusable.

This tightens up the sed expression so that it is less likely to
produce a spurious empty line.  It also removes an unnecessary "grep -v".

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoMerge remote branch 'martins/eventscript.10.interface'
Ronnie Sahlberg [Thu, 11 Aug 2011 04:15:22 +0000 (14:15 +1000)]
Merge remote branch 'martins/eventscript.10.interface'

12 years agoMerge remote branch 'martins/eventscript_infrastructure'
Ronnie Sahlberg [Thu, 11 Aug 2011 04:01:02 +0000 (14:01 +1000)]
Merge remote branch 'martins/eventscript_infrastructure'

12 years agoEventscripts: in 60.nfs move statd-notify code to service_reconfigure().
Martin Schwenke [Mon, 23 May 2011 06:00:05 +0000 (16:00 +1000)]
Eventscripts: in 60.nfs move statd-notify code to service_reconfigure().

This means that it now occurs on every reconfigure event.  As a result
the ipreallocated event is removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts - 60.nfs should define service_reconfigure().
Martin Schwenke [Thu, 11 Aug 2011 03:55:02 +0000 (13:55 +1000)]
Eventscripts - 60.nfs should define service_reconfigure().

Not $service_reconfigure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoWhen starting and stopping ctdb through the init-script, make sure we first clear...
Ronnie Sahlberg [Thu, 11 Aug 2011 01:45:59 +0000 (11:45 +1000)]
When starting and stopping ctdb through the init-script, make sure we first clear all public ips bvefore we start the daemon, in case they are still hanging around since a previous kill -9   and also make sure we drop them after we have stopped the deamon when shutting down

CQ S1027550

12 years agoEvenscripts: improvements to ctdb_service_check_reconfigure().
Martin Schwenke [Thu, 13 Jan 2011 22:31:56 +0000 (09:31 +1100)]
Evenscripts: improvements to ctdb_service_check_reconfigure().

* Make this function applicable to "ipreallocated" event too.

* Monitor event should not always succeed just because we reconfigure.

  If the service was unhealthy before the reconfigure and we end the
  reconfigure with "exit 0" then we can cause the node's health status
  to flip-flop.

  To avoid this we return the status of the service from the previous
  monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 50.samba - only start/stop nmbd if $CTDB_SERVICE_NMB set.
Martin Schwenke [Fri, 27 May 2011 04:37:37 +0000 (14:37 +1000)]
Eventscripts: 50.samba - only start/stop nmbd if $CTDB_SERVICE_NMB set.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 50.samba needs null service_reconfigure() function.
Martin Schwenke [Mon, 23 May 2011 05:37:09 +0000 (15:37 +1000)]
Eventscripts: 50.samba needs null service_reconfigure() function.

Samba doesn't need to do anything for configuration changes.  It will
notice configuration changes and reload automatically.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: 40.vsftpd service_stop() no longer /dev/null's output.
Martin Schwenke [Thu, 13 Jan 2011 22:42:18 +0000 (09:42 +1100)]
Eventscripts: 40.vsftpd service_stop() no longer /dev/null's output.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: improvements to 41.httpd.
Martin Schwenke [Thu, 13 Jan 2011 22:43:01 +0000 (09:43 +1100)]
Eventscripts: improvements to 41.httpd.

* Reduce the failure counts so that restart attempts happen sooner.

* Use service_start() and service_stop() for the restart.
  ctdb_service_start() resets the failure count, which isn't very
  useful in this context.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscript functions: new function ctdb_check_counter().
Martin Schwenke [Fri, 17 Dec 2010 05:10:56 +0000 (16:10 +1100)]
Eventscript functions: new function ctdb_check_counter().

This should eventually be able to replace ctdb_check_counter_limit()
and ctdb_check_counter_equal(), although it doesn't issue warnings
like the former.

It takes 4 optional arguments:

1. _msg - If "error" then over limit causes an error message and and
   exit 1.  Anything else fails silently but the function returns 1.
   Default is "error".

2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt).
   Default is -ge.

3. _limit - Limit for the counter to be used in comparison.  Default is
   $service_fail_limit.

4. _service_name - Used to identify the counter.  Default is
   $service_name.

For example:

  ctdb_check_counter error -ge 5 foo

will print a message and exit 1 if the counter for foo is >= 5,
whereas

  ctdb_check_counter check -ge 5 foo

will just return 1 if the counter for foo is >= 5, and

  ctdb_counter_check

with print a message and exit 1 if the counter for $service_name is >=
$service_fail_limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: remove unused remove_ip() function.
Martin Schwenke [Tue, 28 Jun 2011 04:57:11 +0000 (14:57 +1000)]
Eventscripts: remove unused remove_ip() function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: startstop_nfs stop no longer redirects output to /dev/null.
Martin Schwenke [Thu, 13 Jan 2011 22:31:05 +0000 (09:31 +1100)]
Eventscripts: startstop_nfs stop no longer redirects output to /dev/null.

When stopping (as opposed to restarting) it is useful to see this
information.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: fix typo in _ctdb_counter_common().
Martin Schwenke [Thu, 13 Jan 2011 22:29:16 +0000 (09:29 +1100)]
Eventscripts: fix typo in _ctdb_counter_common().

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: improve log messages in ctdb_start_stop_service().
Martin Schwenke [Thu, 13 Jan 2011 22:30:21 +0000 (09:30 +1100)]
Eventscripts: improve log messages in ctdb_start_stop_service().

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscript functions: fix counter regression.
Martin Schwenke [Wed, 15 Dec 2010 23:11:33 +0000 (10:11 +1100)]
Eventscript functions: fix counter regression.

d362be7d32079ac1390d67056ce107bfbca2c937 wasn't well thought out.
Subsequent commits depend on ctdb_counter_init() taking an argument,
so this makes those cases work.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscript functions: ctdb_service_check-reconfigure() acts only on monitor.
Martin Schwenke [Wed, 15 Dec 2010 22:50:44 +0000 (09:50 +1100)]
Eventscript functions: ctdb_service_check-reconfigure() acts only on monitor.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: make 50.samba use $service_state_dir.
Martin Schwenke [Fri, 17 Dec 2010 05:29:21 +0000 (16:29 +1100)]
Eventscripts: make 50.samba use $service_state_dir.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEvenscripts: update 60.nfs to use ctdb_service_check_reconfigure.
Martin Schwenke [Wed, 15 Dec 2010 22:45:28 +0000 (09:45 +1100)]
Evenscripts: update 60.nfs to use ctdb_service_check_reconfigure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEvenscripts: update 60.nfs to use ctdb_setup_service_state_dir.
Martin Schwenke [Wed, 15 Dec 2010 21:57:46 +0000 (08:57 +1100)]
Evenscripts: update 60.nfs to use ctdb_setup_service_state_dir.

The state directory basename becomes "nfs" rather than "statd".  One
line of code i moved from the "startup" event to service_start().

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEvenscripts: update 40.vsftpd to use ctdb_service_check_reconfigure.
Martin Schwenke [Wed, 15 Dec 2010 22:48:25 +0000 (09:48 +1100)]
Evenscripts: update 40.vsftpd to use ctdb_service_check_reconfigure.

To simplify we also remove the reconfigure from the recovered event
because the monitor event will handle this very quickly anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEvenscripts: update 41.httpd to use ctdb_service_check_reconfigure.
Martin Schwenke [Wed, 15 Dec 2010 22:47:10 +0000 (09:47 +1100)]
Evenscripts: update 41.httpd to use ctdb_service_check_reconfigure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: rejig the reconfigure infrastructure.
Martin Schwenke [Wed, 15 Dec 2010 08:19:21 +0000 (19:19 +1100)]
Eventscripts: rejig the reconfigure infrastructure.

* Add an optional service name argument to existing reconfigure
  functions.

* User function service_reconfigure() instead of variable
  $service_reconfigure to specify how a service is reconfigured.

* New function ctdb_service_check_reconfigure() reconfigures a service
  if it is flagged for reconfigure.

* Remove $service_reconfigure settings from 40.vsftpd and 41.httpd -
  they're the defaults.

Signed-off-by: Martin Schwenke <martin@meltin.net>