Michael Adam [Wed, 26 Jun 2013 08:00:12 +0000 (10:00 +0200)]
TODO(correct?) recoverd: when banning a node, update our local flags
is it correct to use nodes[pnn].flags ??
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Wed, 26 Jun 2013 07:23:22 +0000 (09:23 +0200)]
recoverd: when the recmaster considers itself banned consider this when forcing an election
When we trigger an election because the recmaster considers itself inactive,
update our local nodemap with the recmaster's flags before calling
force_election(). This way, we don't send the inactive node freeze commands
(e.g.) that may fail and then lead to ourselves getting banned.
The theory is that this should help avoiding banning loops.
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Wed, 26 Jun 2013 05:11:51 +0000 (07:11 +0200)]
recoverd: fix a comment typo
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Fri, 21 Jun 2013 15:57:37 +0000 (17:57 +0200)]
recoverd: fix a comment in main_loop
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Fri, 21 Jun 2013 12:06:22 +0000 (14:06 +0200)]
recoverd: eliminate some trailing spaces from ctdb_election_win()
Signed-off-by: Michael Adam <obnox@samba.org>
Amitay Isaacs [Wed, 26 Jun 2013 06:02:23 +0000 (16:02 +1000)]
recovered: Remove old comment as the code corresponding to that has gone away
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 26 Jun 2013 06:01:51 +0000 (16:01 +1000)]
recovered: Set rec->node_flags as soon as we have the new information
Move this code before checking for rec->node_flags.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Wed, 26 Jun 2013 04:34:47 +0000 (14:34 +1000)]
tools/ctdb: Add "force" option to "recover" command
At the moment there is no easy way to force a recovery when attempting
to reproduce certain classes of bugs. This option is added without
documentation because it is dangerous until the bugs are fixed! :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Wed, 26 Jun 2013 05:22:46 +0000 (15:22 +1000)]
recovered: When updating all the nodes, send the updated flags
Do not send our old copy of flags, but send the flags updated from the node.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 26 Jun 2013 03:44:11 +0000 (13:44 +1000)]
debug: Print modflags in client function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 26 Jun 2013 01:54:50 +0000 (11:54 +1000)]
debug: Print modflags
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 26 Jun 2013 01:40:07 +0000 (11:40 +1000)]
debug: print flag changes at higher debug level
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 25 Jun 2013 07:45:48 +0000 (17:45 +1000)]
banning: Do not set banning credits on a node if current node is inactive
If the current node is banned or stopped, then it should not assign banning
credits to other nodes since the current node will not have up-to-date flags
of other nodes.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 06:58:53 +0000 (16:58 +1000)]
Do an early check if we are recmaster when we get banned or stopped
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 06:44:22 +0000 (16:44 +1000)]
merge with banning change
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 04:33:32 +0000 (14:33 +1000)]
banning: No need to check if banned pnn is for local node
If the banned pnn is not the local node, the function returns early.
So no need for additional check.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 06:31:57 +0000 (16:31 +1000)]
banning: Make ctdb_local_node_got_banned() a void function
When this function is called, we are already committed to banning
and there is no point in failing this function. In case, freezing of
databases fails, it will be fixed from recovery daemon.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 04:31:50 +0000 (14:31 +1000)]
banning: Log ban state changes for other nodes at higher debug level
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 04:31:03 +0000 (14:31 +1000)]
banning: If freezing fails at some priority, stop freezing at higher priority
If the databases at lower priority cannot be frozen, then stop freezing
databases since freezing databases out of order can cause deadlock.
Recovery mode cannot be set active till all the databases are frozen.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 04:18:58 +0000 (14:18 +1000)]
recoverd: Print banning message only after verifying pnn
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 24 Jun 2013 07:37:15 +0000 (17:37 +1000)]
client: Exit with non-zero status when unix socket is closed
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 21 Jun 2013 04:49:20 +0000 (14:49 +1000)]
doc: Fix ctdb ping entry in manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 21 Jun 2013 04:47:20 +0000 (14:47 +1000)]
doc: Fix documentation for NoIPTakeover in ctdbd manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 21 Jun 2013 04:33:12 +0000 (14:33 +1000)]
doc: Update notification script section in ctdbd manpage
The example notification script is now much more useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 21 Jun 2013 04:32:50 +0000 (14:32 +1000)]
doc: Add nodestatus command to the ctdb manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 21 Jun 2013 00:52:05 +0000 (10:52 +1000)]
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 20 Jun 2013 06:43:10 +0000 (16:43 +1000)]
tests: Integration tests use "ctdb nodestatus" for healthy cluster check
Also check that we're not in recovery mode.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 20 Jun 2013 06:42:30 +0000 (16:42 +1000)]
tests: Integration test infrastructure should do only a single recovery
No need for 2 recoveries after a restart.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sat, 22 Jun 2013 05:44:28 +0000 (15:44 +1000)]
ctdbd: Fix panic on overlapping shutdowns
The runstate can't be set to SHUTDOWN twice, so the current naive code
causes a panic on the 2nd shutdown. This regression was introduced in
commit
8076773a9924dcf8aff16f7d96b2b9ac383ecc28.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 19 Jun 2013 00:58:14 +0000 (10:58 +1000)]
ctdbd: Refactor shutdown sequence
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sun, 16 Jun 2013 11:01:43 +0000 (21:01 +1000)]
eventscripts: "setup" event doesn't need to wait for SETUP runstate
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 18 Jun 2013 05:07:26 +0000 (15:07 +1000)]
tests/eventscripts: New tests for 00.ctdb "init" event
These test dropping of IPs and TDB checking.
New stubs for date, tdbdump, tdbtool.
Enhance ip stub to handle "ip addr show to ..."
Tweak some infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Tue, 18 Jun 2013 05:02:05 +0000 (15:02 +1000)]
eventscripts: 13.per_ip_routing should not try hard to find public_addresses
This essentially reverts
d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!
The test code has been fixed instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 18 Jun 2013 05:05:39 +0000 (15:05 +1000)]
tests/eventscripts: setup_ctdb() should always set $CTDB_PUBLIC_ADDRESSES
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Mon, 17 Jun 2013 05:14:53 +0000 (15:14 +1000)]
logging: Notify parent when logging daemon is up
Messages are lost until it is really up because syslogd_is_started is
set too early. Adding a pipe to do the notification allows the parent
to wait and only set syslogd_is_started when the logging daemon is
actually ready.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Mon, 17 Jun 2013 00:14:24 +0000 (10:14 +1000)]
scripts: Move TDB checking from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sun, 16 Jun 2013 10:29:33 +0000 (20:29 +1000)]
scripts: Move dropping of all IPs from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 18 Jun 2013 04:53:17 +0000 (14:53 +1000)]
scripts: drop_ip() should use delete_ip_from_iface()
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sun, 16 Jun 2013 10:24:10 +0000 (20:24 +1000)]
scripts: drop_all_public_ips() now prints messages to stdout, not log
Change all callers to maintain current behaviour.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sun, 16 Jun 2013 09:49:02 +0000 (19:49 +1000)]
ctdbd: "init" event should run earlier in daemon initialisation
It should run before:
* the transport is started;
* databases are attached; and
* processing configuration files (e.g. nodes, public_addresses).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 18 Jun 2013 04:27:34 +0000 (14:27 +1000)]
tools/ctdb: Do not exit prematurely on control timeout if retrying in a loop
This avoids premature exits from "ctdb stop" and "ctdb continue" due to
intermittent control (e.g. getpnn, getnodemap) timeouts.
This needs a proper fix to distinguish between timeout and failure
conditions and take appropriate action.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Thu, 13 Jun 2013 02:55:29 +0000 (12:55 +1000)]
packaging: Update the minimum required library versions
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 7 Jun 2013 01:24:17 +0000 (11:24 +1000)]
build: Enable VERBOSE option to display build command line
make V=1 or make VERBOSE=1 will display build commands.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Mathieu Parent [Thu, 6 Jun 2013 19:58:02 +0000 (21:58 +0200)]
build: Fix tdb.h path to enable building with system TDB library
Mathieu Parent [Thu, 6 Jun 2013 19:43:08 +0000 (21:43 +0200)]
libctdb: Include config.h in libctdb/ctdb.c
Bug-Debian: http://bugs.debian.org/703551
Amitay Isaacs [Thu, 6 Jun 2013 06:42:02 +0000 (16:42 +1000)]
ctdbd: Make sure we don't kill init process by mistake
If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1). This does not happen on RHEL, but does on AIX.
Reported-by: Chris Cowan <cc@us.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Thu, 13 Jun 2013 06:32:06 +0000 (16:32 +1000)]
tests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jun 2013 06:30:45 +0000 (16:30 +1000)]
tests/eventscripts: Fix -X tracing in iterate_test()
... and delete a bogus comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jun 2013 05:50:44 +0000 (15:50 +1000)]
tests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jun 2013 01:56:25 +0000 (11:56 +1000)]
eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped. This can be useful for trying to determine
why nfsd is stuck.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jun 2013 00:17:20 +0000 (10:17 +1000)]
eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Consider the following example:
1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.
Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 31 May 2013 04:55:07 +0000 (14:55 +1000)]
recoverd: Log node that causes takoever run to fail
Extend takeover_fail_callback() to just log (and not do any ban
processing) when the callback data is NULL. Always call
ctdb_takeover_run() with the callback so that useful errors are always
logged.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 24 May 2013 05:38:54 +0000 (15:38 +1000)]
doc: Add release notes for 2.2
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Wed, 29 May 2013 05:14:42 +0000 (15:14 +1000)]
build: Fix extra whitespaces
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 29 May 2013 04:12:14 +0000 (14:12 +1000)]
tevent: Sync to tevent 0.9.18 from upstream
Amitay Isaacs [Wed, 29 May 2013 04:44:03 +0000 (14:44 +1000)]
replace: Sync to latest replace from upstream
The latest commits affecting lib/replace remove autoconf build from
Samba tree. So using following commit as a sync point.
commit
9ddfd7d8784e6f546628f48990b69ee2850be52d
Author: Andrew Bartlett <abartlet@samba.org>
Date: Wed May 22 17:23:30 2013 +1000
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 29 May 2013 04:05:50 +0000 (14:05 +1000)]
tdb: Sync to tdb 1.2.11 from upstream
Amitay Isaacs [Wed, 29 May 2013 03:53:38 +0000 (13:53 +1000)]
talloc: Sync to talloc 2.0.8 from upstream
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 29 May 2013 02:11:49 +0000 (12:11 +1000)]
ctdbd: Log node state transitions at higher debug level
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 29 May 2013 04:17:59 +0000 (14:17 +1000)]
git: Ignore generated ctdb.spec file
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 29 May 2013 04:17:00 +0000 (14:17 +1000)]
git: Ignore ctdb_version.h file
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 24 May 2013 05:25:52 +0000 (15:25 +1000)]
build: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rules
This fixes the build on AIX where libreplace is required to build
ctdb_lock_helper, ctdb_fetch_lock_once, ctdb_fetch_readonly_once.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 24 May 2013 05:14:20 +0000 (15:14 +1000)]
build: Support for building on AIX xlc compiler
xlc does not support -fPIC, -Wno-format-zero-length
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 24 May 2013 04:44:45 +0000 (23:44 -0500)]
tests: Do not use err() to support AIX
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 24 May 2013 04:52:09 +0000 (14:52 +1000)]
tests: Include system/time.h to support building on AIX
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 24 May 2013 04:51:46 +0000 (14:51 +1000)]
libctdb: Do not include sys/time.h to support build on AIX
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 24 May 2013 04:42:23 +0000 (23:42 -0500)]
util: Do not stop build if backtracing is not supported
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Tue, 28 May 2013 02:01:57 +0000 (12:01 +1000)]
eventscripts: Fix statd-callout update handling
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run. This stops the statd-callout updates from ever being called.
Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file. Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>
Martin Schwenke [Tue, 28 May 2013 01:26:17 +0000 (11:26 +1000)]
tests/integration: Improve debug output for unhealthy cluster after restart
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 27 May 2013 05:16:28 +0000 (15:16 +1000)]
tests/scripts: Delete unused $rows and $ww variables from run_tests
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 May 2013 04:19:32 +0000 (14:19 +1000)]
packaging: Create separate package for pcp pmda
To build ctdb-pcp-pmda package, run packaging/RPM/makerpms.sh script with
"--with pmda" option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Tue, 28 May 2013 04:16:02 +0000 (14:16 +1000)]
build: Separate autoconf macros for pmda
The pmda stuff is no longer built by default even if the headers are
available. To build, run "configure --enable-pmda".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Tue, 28 May 2013 04:16:25 +0000 (14:16 +1000)]
build: Fix install paths for pcp pmda
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Mon, 27 May 2013 04:43:03 +0000 (14:43 +1000)]
packaging: makerpms.sh can take multiple arguments for rpmbuild
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 27 May 2013 02:56:41 +0000 (12:56 +1000)]
eventscripts: Stop NAT gateway's delete_all() from polluting the log
Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:
ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1
Since this usually fails it is better to mute the error than to have
it pollute the log.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 27 May 2013 01:29:42 +0000 (11:29 +1000)]
recoverd: Backward compatibility for nodes without IPREALLOCATED control
Consider the case of upgrading a cluster node by node, where some
nodes are still running older versions of CTDB without the
IPREALLOCATED control. If a "new" node takes over as recovery master
and a failover occurs, then it will attempt to send IPREALLOCATED
controls to all nodes. The "old" nodes will fail in a fairly
nondescript way (result == -1).
To try to handle this situation, fall back to the EVENTSCRIPT control
to handle "ipreallocated". Only do this on the failed nodes.
However, do not do this on nodes that timed out (they've probably
implemented the control and we should call the regular fail_callback
to get those nodes banned) or for stopped nodes (since they can't
actually run the "ipreallocated" event via the EVENTSCRIPT control).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sat, 25 May 2013 09:57:24 +0000 (19:57 +1000)]
scripts: Provide mktemp function for platforms without mktemp command
This is needed for AIX and possibly others.
Also provide a cheaper mktemp function is needed in the run_tests
script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Sat, 25 May 2013 09:08:49 +0000 (19:08 +1000)]
tests: Fix integration tests to use real private IPs
192.0.2.x was a typo.
Signed-off-by: Martin Schwenke <martin@meltin.net>
David Disseldorp [Fri, 24 May 2013 14:11:12 +0000 (16:11 +0200)]
pmda: handle new ctdb_statistics format
The ctdb_statistics structure was recently changed. Update the PMDA to
dereference the new structure member names.
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Martin Schwenke [Fri, 5 Apr 2013 09:47:47 +0000 (20:47 +1100)]
tests/takeover: New test with 900 IPs
Martin Schwenke [Fri, 5 Apr 2013 09:45:08 +0000 (20:45 +1100)]
tests/takeover: Takeover tests can use up to 1024 and checks limits
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 8 Apr 2013 04:37:44 +0000 (14:37 +1000)]
tests/takeover: LCP2 tests for weird, unbalanced corner-cases
2 tests to show a bad result and a 3rd test for the fix.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 8 Apr 2013 04:37:08 +0000 (14:37 +1000)]
tests/takeover: Allow takeover runs with differing IP allocations per node
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Fri, 24 May 2013 08:07:39 +0000 (18:07 +1000)]
vacuum: Reduce the priority of non-critical error
Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain. We will try again to store this record.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Michael Adam [Fri, 17 May 2013 09:05:44 +0000 (11:05 +0200)]
ctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Fri, 17 May 2013 09:01:31 +0000 (11:01 +0200)]
ctdbd: remove a nonempty blank line
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Fri, 17 May 2013 09:00:32 +0000 (11:00 +0200)]
ctdbd: update comment describing ctdb_call_send_redirect()
Signed-off-by: Michael Adam <obnox@samba.org>
Martin Schwenke [Mon, 6 May 2013 10:31:08 +0000 (20:31 +1000)]
tests/takeover: New tests to check runstate handling
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 6 May 2013 05:36:29 +0000 (15:36 +1000)]
recoverd: Nodes can only takeover IPs if they are in runstate RUNNING
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined. Both of
these events can (re)start services.
This stops IPs being hosted before the "startup" event has completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Thu, 23 May 2013 09:03:11 +0000 (19:03 +1000)]
recoverd: Handle errors carefully when fetching tunables
If a tunable is not implemented on a remote node then this should not
be fatal. In this case the takeover run can continue using benign
defaults for the tunables.
However, timeouts and any unexpected errors should be fatal. These
should abort the takeover run because they can lead to unexpected IP
movements.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 23 May 2013 09:01:01 +0000 (19:01 +1000)]
recoverd: Set explicit default value when getting tunable from nodes
Both of the current defaults are implicitly 0. It is better to make
the defaults obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 23 May 2013 06:09:38 +0000 (16:09 +1000)]
client: async_callback() sets result to -ETIME if a control times out
Otherwise there is no way of treating a timeout differently to a
general failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Tue, 21 May 2013 05:41:56 +0000 (15:41 +1000)]
ctdbd: Update the get_tunable code to return -EINVAL for unknown tunable
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 22 May 2013 07:19:34 +0000 (17:19 +1000)]
recoverd: Whitespace improvements
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 22 May 2013 10:56:03 +0000 (20:56 +1000)]
recoverd: Use talloc_array_length() for simpler code
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 11 Jan 2013 07:02:51 +0000 (18:02 +1100)]
ctdbd: When the "setup" event fails log an error and exit, don't abort
The "setup" event can fail when one of the eventscripts fails to run
its "setup" event. If this occurs then the eventscript should log an
error. The stack trace and core file generated when we abort provides
no useful information.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 11 Jan 2013 05:02:31 +0000 (16:02 +1100)]
eventscripts: 11.natgw should not call ctdb tool in "init" event
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.
Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Thu, 18 Apr 2013 10:30:14 +0000 (20:30 +1000)]
ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 11 Jan 2013 03:09:14 +0000 (14:09 +1100)]
tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.
At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately. This behaviour isn't very
friendly.
The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.
The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 11 Jan 2013 03:07:12 +0000 (14:07 +1100)]
tools/ctdb: New command runstate to print current runstate
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>