obnox/ctdb.git
11 years agotools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
Martin Schwenke [Fri, 11 Jan 2013 03:09:14 +0000 (14:09 +1100)]
tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments

If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.

At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately.  This behaviour isn't very
friendly.

The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.

The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agotools/ctdb: New command runstate to print current runstate
Martin Schwenke [Fri, 11 Jan 2013 03:07:12 +0000 (14:07 +1100)]
tools/ctdb: New command runstate to print current runstate

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoctdbd: New control CTDB_CONTROL_GET_RUNSTATE
Martin Schwenke [Tue, 21 May 2013 06:18:28 +0000 (16:18 +1000)]
ctdbd: New control CTDB_CONTROL_GET_RUNSTATE

Also new client function ctdb_ctrl_get_runstate().

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Start logging process earlier
Martin Schwenke [Thu, 10 Jan 2013 05:48:39 +0000 (16:48 +1100)]
ctdbd: Start logging process earlier

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoctdbd: Only start recovery daemon and timed events after setup event
Martin Schwenke [Thu, 10 Jan 2013 05:33:36 +0000 (16:33 +1100)]
ctdbd: Only start recovery daemon and timed events after setup event

This deconstructs ctdb_start_transport(), which did much more than
starting the transport.

This removes a very unlikely race and adds some clarity.  The setup
event is supposed to set the tunables before the first recovery.
However, there was nothing stopping the first recovery from starting
before the setup event had completed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoctdbd: Replace ctdb->done_startup with ctdb->runstate
Martin Schwenke [Thu, 10 Jan 2013 05:06:25 +0000 (16:06 +1100)]
ctdbd: Replace ctdb->done_startup with ctdb->runstate

This allows states, including startup and shutdown states, to be
clearly tracked.  This doesn't include regular runtime "states", which
are handled by node flags.

Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agotools/ctdb: Remove duplicate command definition for "sync"
Martin Schwenke [Thu, 23 May 2013 06:06:47 +0000 (16:06 +1000)]
tools/ctdb: Remove duplicate command definition for "sync"

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agologging: Make sure ringbuffer messages are terminated with a newline
Amitay Isaacs [Wed, 8 May 2013 13:29:55 +0000 (23:29 +1000)]
logging: Make sure ringbuffer messages are terminated with a newline

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotests: Fix output of run_tests usage
Amitay Isaacs [Wed, 8 May 2013 06:25:30 +0000 (16:25 +1000)]
tests: Fix output of run_tests usage

11 years agolocking: Set lock helper path once
Amitay Isaacs [Wed, 8 May 2013 03:45:55 +0000 (13:45 +1000)]
locking: Set lock helper path once

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Remove functions that are not used anymore
Amitay Isaacs [Wed, 8 May 2013 00:42:08 +0000 (10:42 +1000)]
locking: Remove functions that are not used anymore

These functions were used in locking child process to do the locking.  With
locking helper, these are not required.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Remove functions that are not used anymore
Amitay Isaacs [Tue, 30 Apr 2013 05:13:44 +0000 (15:13 +1000)]
locking: Remove functions that are not used anymore

These functions were used in locking child process to do the locking.  With
locking helper, these are not required.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Use separate locking helper binary for locking
Amitay Isaacs [Tue, 30 Apr 2013 05:07:49 +0000 (15:07 +1000)]
locking: Use separate locking helper binary for locking

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Create commandline arguments for locking helper
Amitay Isaacs [Tue, 30 Apr 2013 04:32:46 +0000 (14:32 +1000)]
locking: Create commandline arguments for locking helper

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Add a standalone helper to lock record/db
Amitay Isaacs [Mon, 22 Apr 2013 05:36:27 +0000 (15:36 +1000)]
locking: Add a standalone helper to lock record/db

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Use database iterator for unmarking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:14:16 +0000 (14:14 +1000)]
locking: Use database iterator for unmarking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Add handler function for unmarking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:16:07 +0000 (14:16 +1000)]
locking: Add handler function for unmarking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Use database iterator for marking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:12:40 +0000 (14:12 +1000)]
locking: Use database iterator for marking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Add handler function for marking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:07:11 +0000 (14:07 +1000)]
locking: Add handler function for marking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Use database iterator for unlocking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:10:06 +0000 (14:10 +1000)]
locking: Use database iterator for unlocking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Add handler function for unlocking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:06:46 +0000 (14:06 +1000)]
locking: Add handler function for unlocking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Use database iterator for locking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:08:51 +0000 (14:08 +1000)]
locking: Use database iterator for locking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Add handler function for locking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:06:27 +0000 (14:06 +1000)]
locking: Add handler function for locking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Refactor code to iterate over databases based on priority
Amitay Isaacs [Tue, 30 Apr 2013 03:23:59 +0000 (13:23 +1000)]
locking: Refactor code to iterate over databases based on priority

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolocking: Add newline to debug logs
Amitay Isaacs [Wed, 1 May 2013 02:55:22 +0000 (12:55 +1000)]
locking: Add newline to debug logs

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotools/ctdb: Fix racy ipreallocate code
Amitay Isaacs [Thu, 23 May 2013 03:04:06 +0000 (13:04 +1000)]
tools/ctdb: Fix racy ipreallocate code

This code tried to find the recovery master and send an ipreallocate
request to that node.  When a node is stopped, this code asked the
stopped node for recovery master.  Stopped node does not have up-to-date
information on the current recovery master.  So ipreallocate requests
were sent to the wrong node and ignored by that node which is not the
recovery master.

Send ipreallocate request to all active nodes.  That way we guarantee
that the current recovery master will see it and respond to it.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

11 years agoctdbd: Print version string in the daemon startup
Amitay Isaacs [Wed, 22 May 2013 05:37:46 +0000 (15:37 +1000)]
ctdbd: Print version string in the daemon startup

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agobuild: Rename version.h to ctdb_version.h
Amitay Isaacs [Wed, 22 May 2013 04:23:17 +0000 (14:23 +1000)]
build: Rename version.h to ctdb_version.h

This avoids clash with version.h from Samba tree.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agologging: Fix a bug in ringbuffer
Amitay Isaacs [Thu, 9 May 2013 05:43:10 +0000 (15:43 +1000)]
logging: Fix a bug in ringbuffer

When ringbuffer is full, it does not return any entries.  Simplify
ringbuffer logic by keeping track of number of log entries rather than
last entry.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

11 years agorecoverd: takeover_run_core() should not use modified node flags
Martin Schwenke [Mon, 13 May 2013 05:27:04 +0000 (15:27 +1000)]
recoverd: takeover_run_core() should not use modified node flags

Modifying the node flags with IP-allocation-only flags is not
necessary.  It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.

Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags.  As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Update confusing log message
Martin Schwenke [Mon, 20 May 2013 00:47:07 +0000 (10:47 +1000)]
ctdbd: Update confusing log message

Inactive can also mean stopped.  To add information, just print the
flags instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoPackaging: maketarball.sh should be a bash script due to pushd use
Martin Schwenke [Fri, 17 May 2013 06:46:41 +0000 (16:46 +1000)]
Packaging: maketarball.sh should be a bash script due to pushd use

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: Rework notify.sh to use notify.d/ directory
Martin Schwenke [Fri, 17 May 2013 06:42:25 +0000 (16:42 +1000)]
scripts: Rework notify.sh to use notify.d/ directory

This makes it easier to add notification handlers.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Log a message when recovery master changes
Martin Schwenke [Tue, 14 May 2013 06:20:32 +0000 (16:20 +1000)]
ctdbd: Log a message when recovery master changes

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>

11 years agoctdbd: Log add and delete of IPs
Martin Schwenke [Tue, 14 May 2013 05:38:08 +0000 (15:38 +1000)]
ctdbd: Log add and delete of IPs

At the moment, when someone deletes all the IPs on a node, all we see
are the release IP messages and we have to guess why.

Some would argue that add/release are more significant than
take/release so they should be logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Removed bogus comment in ctdb_find_iface()
Martin Schwenke [Tue, 14 May 2013 05:30:53 +0000 (15:30 +1000)]
ctdbd: Removed bogus comment in ctdb_find_iface()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Fix regression in _loadconfig()
Martin Schwenke [Tue, 14 May 2013 04:56:26 +0000 (14:56 +1000)]
eventscripts: Fix regression in _loadconfig()

fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoinitscript: If CTDB doesn't become ready, print a message before killing
Martin Schwenke [Thu, 9 May 2013 10:44:11 +0000 (20:44 +1000)]
initscript: If CTDB doesn't become ready, print a message before killing

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agobuild: Create sudoers.d dir during make install
Christian Ambach [Wed, 8 May 2013 06:45:09 +0000 (08:45 +0200)]
build: Create sudoers.d dir during make install

otherwise make install into non-standard prefix will fail

Signed-off-by: Christian Ambach <ambi@samba.org>
11 years agoeventscripts: Do not use bashism for string comparison
Amitay Isaacs [Tue, 14 May 2013 13:18:32 +0000 (23:18 +1000)]
eventscripts: Do not use bashism for string comparison

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agorecoverd: Move IP flags into ctdb_takeover.c
Martin Schwenke [Thu, 9 May 2013 02:53:48 +0000 (12:53 +1000)]
recoverd: Move IP flags into ctdb_takeover.c

These should never be seen outside the IP allocation code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Clear IP flags after IP allocation algorithm has run
Martin Schwenke [Thu, 9 May 2013 02:51:57 +0000 (12:51 +1000)]
recoverd: Clear IP flags after IP allocation algorithm has run

If these flags are left set they will confuse other recovery daemon
code.

Factor the clearing code into new function clear_ipflags().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agorecoverd: Remove unused mask argument and initial mask calculation
Martin Schwenke [Fri, 3 May 2013 10:46:15 +0000 (20:46 +1000)]
recoverd: Remove unused mask argument and initial mask calculation

This has been replaced by set_ipflags() and associated functionality.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: When calculating rebalance candidates don't consider flags
Martin Schwenke [Fri, 3 May 2013 10:41:32 +0000 (20:41 +1000)]
recoverd: When calculating rebalance candidates don't consider flags

This is really a check to see if a node is already hosting IPs.  If
so, we assume it was previously healthy so it isn't considered as a
rebalance candidate.  There's no need to limit this to healthy node,
since this is checked elsewhere.

Due to this the variable newly_healthy is renamed everywhere to
rebalance_candidates.

The mask argument is now completely unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Remove unused mask argument from IP allocation functions
Martin Schwenke [Fri, 3 May 2013 10:13:40 +0000 (20:13 +1000)]
recoverd: Remove unused mask argument from IP allocation functions

This is a no-op and is in a separate commit to make the previous
commit less cumbersome.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Add takeover tests, mostly for NoIPHostOnAllDisabled
Martin Schwenke [Fri, 3 May 2013 05:57:21 +0000 (15:57 +1000)]
tests/takeover: Add takeover tests, mostly for NoIPHostOnAllDisabled

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agorecoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled
Martin Schwenke [Fri, 3 May 2013 06:59:20 +0000 (16:59 +1000)]
recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled

This really needs to be per-node.  The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).

* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.

* Enhance set_ipflags_internal() and set_ipflags() to setup
  NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
  and/or whether nodes are disabled/inactive.

* Replace can_node_servce_ip() with functions can_node_host_ip() and
  can_node_takeover_ip().  These functions are the only ones that need
  to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST.  They
  can make the decision without looking at any other flags due to
  previous setup.

* Remove explicit flag checking in IP allocation functions (including
  unassign_unsuitable_ips()) and just call can_node_host_ip() and
  can_node_takeover_ip() as appropriate.

* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agorecoverd: Factor out new function all_nodes_are_disabled()
Martin Schwenke [Fri, 3 May 2013 06:56:24 +0000 (16:56 +1000)]
recoverd: Factor out new function all_nodes_are_disabled()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Allow per-node tunable settings
Martin Schwenke [Fri, 3 May 2013 05:55:01 +0000 (15:55 +1000)]
tests/takeover: Allow per-node tunable settings

Implemented for CTDB_SET_NoIPTakeover.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agorecoverd: Refactor code to get NoIPTakeover tunable from all nodes
Martin Schwenke [Fri, 3 May 2013 06:21:16 +0000 (16:21 +1000)]
recoverd: Refactor code to get NoIPTakeover tunable from all nodes

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agotests: Unit test diff output should use filtered output
Martin Schwenke [Fri, 3 May 2013 05:53:13 +0000 (15:53 +1000)]
tests: Unit test diff output should use filtered output

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Add debug message when dropping IPs in IP allocation
Martin Schwenke [Fri, 3 May 2013 05:41:26 +0000 (15:41 +1000)]
recoverd: Add debug message when dropping IPs in IP allocation

Update tests accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: NFS RPC checks no longer support "knfsd"
Martin Schwenke [Tue, 23 Apr 2013 02:30:33 +0000 (12:30 +1000)]
eventscripts: NFS RPC checks no longer support "knfsd"

No longer used, support removed from test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC services
Martin Schwenke [Tue, 23 Apr 2013 02:17:31 +0000 (12:17 +1000)]
eventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC services

* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs

* Installation and packaging additions to handle nfs-rpc-checks.d/

* Unit test updates, including deleting 1 test that sanity checked
  test infrastructure

* Test infrastructure changes to use nfs-rpc-checks.d/

Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs.  To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"
Martin Schwenke [Tue, 23 Apr 2013 01:14:48 +0000 (11:14 +1000)]
eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"

Want nfs_check_rpc_services() to support filenames without the 'k'.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: New function nfs_check_rpc_services()
Martin Schwenke [Mon, 22 Apr 2013 20:42:54 +0000 (06:42 +1000)]
eventscripts: New function nfs_check_rpc_services()

This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.

nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator.  The files have one limit check and
a set of actions per line.  The program name is extracted from the
file name.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()
Martin Schwenke [Mon, 22 Apr 2013 20:28:27 +0000 (06:28 +1000)]
eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Factor out common code from nfs_check_rpc_service()
Martin Schwenke [Mon, 22 Apr 2013 20:27:02 +0000 (06:27 +1000)]
eventscripts: Factor out common code from nfs_check_rpc_service()

This creates new function _nfs_check_rpc_common().

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Remove ganesha support from nfs_check_rpc_service()
Martin Schwenke [Mon, 22 Apr 2013 20:17:15 +0000 (06:17 +1000)]
eventscripts: Remove ganesha support from nfs_check_rpc_service()

This is unused so doesn't need to be maintained.  An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoRevert "Eventscript functions: add optional version to nfs_check_rpc_service()"
Martin Schwenke [Mon, 22 Apr 2013 20:14:43 +0000 (06:14 +1000)]
Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"

This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.

This change is unused and is just complicating the function.

Conflicts:
config/functions

11 years agoeventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
Martin Schwenke [Mon, 22 Apr 2013 19:54:12 +0000 (05:54 +1000)]
eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()

The code in 60.nfs is going to be genericised, so make all the checks
look the same.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Martin Schwenke [Mon, 22 Apr 2013 05:45:13 +0000 (15:45 +1000)]
eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Remove unused function ctdb_check_counter_limit()
Martin Schwenke [Tue, 30 Apr 2013 05:33:12 +0000 (15:33 +1000)]
eventscripts: Remove unused function ctdb_check_counter_limit()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()
Martin Schwenke [Tue, 30 Apr 2013 05:23:20 +0000 (15:23 +1000)]
eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()

ctdb_check_counter_limit() can soon be removed...

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Might as well try to stat the reclock file first
Martin Schwenke [Tue, 30 Apr 2013 05:19:52 +0000 (15:19 +1000)]
eventscripts: Might as well try to stat the reclock file first

It is in the background but it still might cause the counter to be
reset before it is checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Make the early exit in 01.reclock earlier
Martin Schwenke [Tue, 30 Apr 2013 05:16:44 +0000 (15:16 +1000)]
eventscripts: Make the early exit in 01.reclock earlier

That way we don't even check the counter...

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Minor cleanups for killtcp/tickle functions
Martin Schwenke [Mon, 6 May 2013 06:23:25 +0000 (16:23 +1000)]
eventscripts: Minor cleanups for killtcp/tickle functions

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Tweak the timeout check in kill_tcp_connections()
Martin Schwenke [Tue, 30 Apr 2013 01:39:46 +0000 (11:39 +1000)]
eventscripts: Tweak the timeout check in kill_tcp_connections()

This has 2 advantages:

1. It uses get_tcp_connections_for_ip() to check for leftover
   connections, instead of custom code.

2. It checks for the timeout condition before sleeping.  The current
   code sleeps and then checks, so wastes a second.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: In killtcp/tickle functions, $_failed should be boolean
Martin Schwenke [Mon, 29 Apr 2013 20:31:30 +0000 (06:31 +1000)]
eventscripts: In killtcp/tickle functions, $_failed should be boolean

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Remove unused $_killcount from tickle_tcp_connections()
Martin Schwenke [Mon, 29 Apr 2013 20:27:58 +0000 (06:27 +1000)]
eventscripts: Remove unused $_killcount from tickle_tcp_connections()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Refactor connection listing in killtcp and tickle functions
Martin Schwenke [Mon, 29 Apr 2013 20:25:26 +0000 (06:25 +1000)]
eventscripts: Refactor connection listing in killtcp and tickle functions

Uses new function get_tcp_connections_for_ip().  This avoids using a
temporary file and running netstat twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Reimplement kill_tcp_connections_local_only()
Martin Schwenke [Mon, 29 Apr 2013 20:19:18 +0000 (06:19 +1000)]
eventscripts: Reimplement kill_tcp_connections_local_only()

... using kill_tcp_connections()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Change handling of one-way kills in kill_tcp_connections()
Martin Schwenke [Mon, 29 Apr 2013 20:14:01 +0000 (06:14 +1000)]
eventscripts: Change handling of one-way kills in kill_tcp_connections()

This change is a no-op.  However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Remove unnecessary variables from killtcp/tickle functions
Martin Schwenke [Mon, 29 Apr 2013 20:05:52 +0000 (06:05 +1000)]
eventscripts: Remove unnecessary variables from killtcp/tickle functions

Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Clean up ctdb_check_command()
Martin Schwenke [Mon, 29 Apr 2013 17:54:17 +0000 (03:54 +1000)]
eventscripts: Clean up ctdb_check_command()

* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts; Cleanup up ctdb_check_directories()
Martin Schwenke [Mon, 29 Apr 2013 17:48:51 +0000 (03:48 +1000)]
eventscripts; Cleanup up ctdb_check_directories()

The documentation comments are wrong... and remove option
$service_name argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Assert that $service_name is set in a few key places
Martin Schwenke [Mon, 29 Apr 2013 17:45:21 +0000 (03:45 +1000)]
eventscripts: Assert that $service_name is set in a few key places

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: counters default to $script_name if $service_name not set
Martin Schwenke [Tue, 30 Apr 2013 05:31:27 +0000 (15:31 +1000)]
eventscripts: counters default to $script_name if $service_name not set

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Simplify handling of $service name in "managed" functions
Martin Schwenke [Mon, 29 Apr 2013 17:32:29 +0000 (03:32 +1000)]
eventscripts: Simplify handling of $service name in "managed" functions

Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().

Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Simplify handling of $service name in start/stop functions
Martin Schwenke [Mon, 29 Apr 2013 17:18:01 +0000 (03:18 +1000)]
eventscripts: Simplify handling of $service name in start/stop functions

Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Simplify handling of $service name in service_management
Martin Schwenke [Mon, 29 Apr 2013 17:13:36 +0000 (03:13 +1000)]
eventscripts: Simplify handling of $service name in service_management

Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Simplify handling of $service name in reconfigure functions
Martin Schwenke [Mon, 29 Apr 2013 16:59:41 +0000 (02:59 +1000)]
eventscripts: Simplify handling of $service name in reconfigure functions

Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Remove unused function ctdb_check_counter_equal()
Martin Schwenke [Wed, 24 Apr 2013 07:14:32 +0000 (17:14 +1000)]
eventscripts: Remove unused function ctdb_check_counter_equal()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: Fix script_log() regression
Martin Schwenke [Tue, 23 Apr 2013 03:56:15 +0000 (13:56 +1000)]
scripts: Fix script_log() regression

5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.

Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoinitscript: Look for tdbtool/tdbdump using which, not in fixed locations
Martin Schwenke [Tue, 23 Apr 2013 03:49:28 +0000 (13:49 +1000)]
initscript: Look for tdbtool/tdbdump using which, not in fixed locations

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Log CTDB startup before creating the PID file
Martin Schwenke [Mon, 22 Apr 2013 04:55:33 +0000 (14:55 +1000)]
ctdbd: Log CTDB startup before creating the PID file

Otherwise the messages are in a stupid order...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Remove the "stopped" event
Martin Schwenke [Thu, 21 Feb 2013 03:28:13 +0000 (14:28 +1100)]
ctdbd: Remove the "stopped" event

It isn't used, superceded by "ipreallocated".

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Remove use of "stopped" event
Martin Schwenke [Thu, 21 Feb 2013 03:17:09 +0000 (14:17 +1100)]
eventscripts: Remove use of "stopped" event

Use "ipreallocated" instead.  The "stopped" event pre-dates the
"ipreallocated" event.  The only way of stopping a node is via the
ctdb tool, which explicitly causes a takeover run to occur after the
node is stopped.  The takeover run will generate an "ipreallocated"
event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATED
Martin Schwenke [Thu, 21 Feb 2013 02:13:09 +0000 (13:13 +1100)]
recoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATED

This means "ipreallocated" is now run on stopped nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: New control CTDB_CONTROL_IPREALLOCATED
Martin Schwenke [Fri, 19 Apr 2013 03:05:02 +0000 (13:05 +1000)]
ctdbd: New control CTDB_CONTROL_IPREALLOCATED

This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Avoid freeing non-monitor event callback when monitoring is disabled
Martin Schwenke [Tue, 30 Apr 2013 07:22:23 +0000 (17:22 +1000)]
ctdbd: Avoid freeing non-monitor event callback when monitoring is disabled

When running a non-monitor event, check is made for any active monitor
events.  If there is an active monitor event, then the active monitor
event is cancelled.  This is done by freeing state->callback which is
allocated from monitor_context.

When CTDB is stopped or shutdown, monitoring is disabled by freeing
monitor_context, which frees callback and then stopped or shutdown event
is run.  This creates a new callback structure which is allocated at
the exact same memory location as the monitor callback which was freed.
So in the check for active monitor events, it frees the new callback
for non-monitor event.  Since the callback function flags successful
completion of that event, it is never marked complete and CTDB is stuck
in a loop waiting for completion.

Move the monitor cancellation to the top of the function so that this
can't happen.

Follow log snippest highlights the problem.

2013/04/30 16:54:10.673807 [21505]: Received SHUTDOWN command. Stopping CTDB daemon.
2013/04/30 16:54:10.673814 [21505]: Shutting down recovery daemon
2013/04/30 16:54:10.673852 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0
2013/04/30 16:54:10.673858 [21505]: Monitoring has been stopped
2013/04/30 16:54:10.673899 [21505]: server/eventscript.c:594 Sending SIGTERM to child pid:23847
2013/04/30 16:54:10.673913 [21505]: server/eventscript.c:629 searching for callback 0x1c6d5c0
2013/04/30 16:54:10.673932 [21505]: server/eventscript.c:641 running callback
2013/04/30 16:54:10.673939 [21505]: server/eventscript.c:866 in event_script_callback
2013/04/30 16:54:10.673946 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agorecoverd: Interface reference count changes should not cause takeover runs
Martin Schwenke [Wed, 20 Feb 2013 23:43:35 +0000 (10:43 +1100)]
recoverd: Interface reference count changes should not cause takeover runs

At the moment a naive compare of the all the interface data is done.
So, if any IPs move then the reference counts for the the relevant
interfaces change, interfaces appear to have changed and another
takeover run is initiated by each node that took/released IPs.

This change stops the spurious takeover runs by changing the interface
comparison to ignore the reference counts.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecover: use CTDB_REC_RO_FLAGS where appropriate
Michael Adam [Fri, 19 Apr 2013 14:24:32 +0000 (16:24 +0200)]
recover: use CTDB_REC_RO_FLAGS where appropriate

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agoctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate
Michael Adam [Fri, 19 Apr 2013 14:23:16 +0000 (16:23 +0200)]
ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agoctdb_call: use CTDB_REC_RO_FLAGS where appropriate
Michael Adam [Fri, 19 Apr 2013 14:22:49 +0000 (16:22 +0200)]
ctdb_call: use CTDB_REC_RO_FLAGS where appropriate

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agovacuum: use CTDB_REC_RO_FLAGS in the vacuuming code
Michael Adam [Fri, 19 Apr 2013 14:09:34 +0000 (16:09 +0200)]
vacuum: use  CTDB_REC_RO_FLAGS in the vacuuming code

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agoltdb_server: use CTDB_REC_RO_FLAGS where appropriate
Michael Adam [Fri, 19 Apr 2013 13:55:38 +0000 (15:55 +0200)]
ltdb_server: use CTDB_REC_RO_FLAGS where appropriate

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agoinclude: define CTDB_REC_RO_FLAGS - all read-only related record flags
Michael Adam [Fri, 19 Apr 2013 14:01:45 +0000 (16:01 +0200)]
include: define CTDB_REC_RO_FLAGS - all read-only related record flags

This is used for some checks

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agovacuum: Update (C)
Michael Adam [Fri, 22 Feb 2013 15:12:17 +0000 (16:12 +0100)]
vacuum: Update (C)

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
11 years agovacuum: extend the header comment for ctdb_process_delete_list()
Michael Adam [Sat, 29 Dec 2012 16:23:27 +0000 (17:23 +0100)]
vacuum: extend the header comment for ctdb_process_delete_list()

Describe the (new) process more precisely.
And mention that is the last step of the vacuuming process
that is performed on the lmaster.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>