metze/samba/wip.git
11 years agobuild: Fix tdb.h path to enable building with system TDB library
Mathieu Parent [Thu, 6 Jun 2013 19:58:02 +0000 (21:58 +0200)]
build: Fix tdb.h path to enable building with system TDB library

(This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)

11 years agolibctdb: Include config.h in libctdb/ctdb.c
Mathieu Parent [Thu, 6 Jun 2013 19:43:08 +0000 (21:43 +0200)]
libctdb: Include config.h in libctdb/ctdb.c

Bug-Debian: http://bugs.debian.org/703551

(This used to be ctdb commit 14a79c0f3967c88f8ffc8200d122f6c5ffdb63a8)

11 years agoctdbd: Make sure we don't kill init process by mistake
Amitay Isaacs [Thu, 6 Jun 2013 06:42:02 +0000 (16:42 +1000)]
ctdbd: Make sure we don't kill init process by mistake

If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1).  This does not happen on RHEL, but does on AIX.

Reported-by: Chris Cowan <cc@us.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit edb2a3556d03e248b42f63dd2c62382b723bc98f)

11 years agotests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS
Martin Schwenke [Thu, 13 Jun 2013 06:32:06 +0000 (16:32 +1000)]
tests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS

Includes minor test infrastructure updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cd4358b01c6c3d413b431f5760029d2b163b9c03)

11 years agotests/eventscripts: Fix -X tracing in iterate_test()
Martin Schwenke [Thu, 13 Jun 2013 06:30:45 +0000 (16:30 +1000)]
tests/eventscripts: Fix -X tracing in iterate_test()

... and delete a bogus comment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0e2b5a8f89440a53f996482ac0c98b31a4f2cad3)

11 years agotests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT
Martin Schwenke [Thu, 13 Jun 2013 05:50:44 +0000 (15:50 +1000)]
tests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT

Includes minor test infrastructure updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ce2ef2be8aa22c0baf868daac8d4cf27246baa14)

11 years agoeventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
Martin Schwenke [Thu, 13 Jun 2013 01:56:25 +0000 (11:56 +1000)]
eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS

If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped.  This can be useful for trying to determine
why nfsd is stuck.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)

11 years agoeventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Martin Schwenke [Thu, 13 Jun 2013 00:17:20 +0000 (10:17 +1000)]
eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT

Consider the following example:

1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
   underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.

Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)

11 years agorecoverd: Log node that causes takoever run to fail
Martin Schwenke [Fri, 31 May 2013 04:55:07 +0000 (14:55 +1000)]
recoverd: Log node that causes takoever run to fail

Extend takeover_fail_callback() to just log (and not do any ban
processing) when the callback data is NULL.  Always call
ctdb_takeover_run() with the callback so that useful errors are always
logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea)

11 years agodoc: Add release notes for 2.2
Martin Schwenke [Fri, 24 May 2013 05:38:54 +0000 (15:38 +1000)]
doc: Add release notes for 2.2

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ac0892d3a57adb0587a37de0f94fa686bed8970f)

11 years agobuild: Fix extra whitespaces
Amitay Isaacs [Wed, 29 May 2013 05:14:42 +0000 (15:14 +1000)]
build: Fix extra whitespaces

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 78cff9d54f241fb6a2943e50346f9c2ad9decc78)

11 years agotevent: Sync to tevent 0.9.18 from upstream
Amitay Isaacs [Wed, 29 May 2013 04:12:14 +0000 (14:12 +1000)]
tevent: Sync to tevent 0.9.18 from upstream

(This used to be ctdb commit 82d61f77c01df0fbb42743593937b175ce22a445)

11 years agoreplace: Sync to latest replace from upstream
Amitay Isaacs [Wed, 29 May 2013 04:44:03 +0000 (14:44 +1000)]
replace: Sync to latest replace from upstream

The latest commits affecting lib/replace remove autoconf build from
Samba tree.  So using following commit as a sync point.

  commit 9ddfd7d8784e6f546628f48990b69ee2850be52d
  Author: Andrew Bartlett <abartlet@samba.org>
  Date:   Wed May 22 17:23:30 2013 +1000

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 506b27c944b4031e8a325816bd12abddd442a0bb)

11 years agotdb: Sync to tdb 1.2.11 from upstream
Amitay Isaacs [Wed, 29 May 2013 04:05:50 +0000 (14:05 +1000)]
tdb: Sync to tdb 1.2.11 from upstream

(This used to be ctdb commit bb3a32ec055432afc7225c9fd7504fb187694bda)

11 years agotalloc: Sync to talloc 2.0.8 from upstream
Amitay Isaacs [Wed, 29 May 2013 03:53:38 +0000 (13:53 +1000)]
talloc: Sync to talloc 2.0.8 from upstream

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 3bffca8c17e441364525df115ee2ac16b5969e24)

11 years agoctdbd: Log node state transitions at higher debug level
Amitay Isaacs [Wed, 29 May 2013 02:11:49 +0000 (12:11 +1000)]
ctdbd: Log node state transitions at higher debug level

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit db31dc48bd3135e9242af08bb79b67a17a2b1668)

11 years agogit: Ignore generated ctdb.spec file
Amitay Isaacs [Wed, 29 May 2013 04:17:59 +0000 (14:17 +1000)]
git: Ignore generated ctdb.spec file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ca7ba26362eabfbcc329c66919d9c4da79c3b799)

11 years agogit: Ignore ctdb_version.h file
Amitay Isaacs [Wed, 29 May 2013 04:17:00 +0000 (14:17 +1000)]
git: Ignore ctdb_version.h file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 641f539ffc7dd9542e669a3ec20c004f8bbcbf1e)

11 years agobuild: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rules
Amitay Isaacs [Fri, 24 May 2013 05:25:52 +0000 (15:25 +1000)]
build: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rules

This fixes the build on AIX where libreplace is required to build
ctdb_lock_helper, ctdb_fetch_lock_once, ctdb_fetch_readonly_once.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit fa757b49374e44c2380d4457e9b0eb3582981fac)

11 years agobuild: Support for building on AIX xlc compiler
Amitay Isaacs [Fri, 24 May 2013 05:14:20 +0000 (15:14 +1000)]
build: Support for building on AIX xlc compiler

xlc does not support -fPIC, -Wno-format-zero-length

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2cf95741fdab2ee5f724950a0b1ef257d6aeade7)

11 years agotests: Do not use err() to support AIX
Amitay Isaacs [Fri, 24 May 2013 04:44:45 +0000 (23:44 -0500)]
tests: Do not use err() to support AIX

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1734562a7b3512853b9e0232880c42d50c1c2e4c)

11 years agotests: Include system/time.h to support building on AIX
Amitay Isaacs [Fri, 24 May 2013 04:52:09 +0000 (14:52 +1000)]
tests: Include system/time.h to support building on AIX

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 0320bb4f8ca8171812ec7f41556aed847c74bfb4)

11 years agolibctdb: Do not include sys/time.h to support build on AIX
Amitay Isaacs [Fri, 24 May 2013 04:51:46 +0000 (14:51 +1000)]
libctdb: Do not include sys/time.h to support build on AIX

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2c19fa78ce0b25c3615b23664df32233bdbdea42)

11 years agoutil: Do not stop build if backtracing is not supported
Amitay Isaacs [Fri, 24 May 2013 04:42:23 +0000 (23:42 -0500)]
util: Do not stop build if backtracing is not supported

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit b091f09ea01482823bd850d1d4e2329e0a19c959)

11 years agoeventscripts: Fix statd-callout update handling
Martin Schwenke [Tue, 28 May 2013 02:01:57 +0000 (12:01 +1000)]
eventscripts: Fix statd-callout update handling

60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run.  This stops the statd-callout updates from ever being called.

Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file.  Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>
(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)

11 years agotests/integration: Improve debug output for unhealthy cluster after restart
Martin Schwenke [Tue, 28 May 2013 01:26:17 +0000 (11:26 +1000)]
tests/integration: Improve debug output for unhealthy cluster after restart

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 25a6fd784cde96f3d20a79f70b5589b5c4aca675)

11 years agotests/scripts: Delete unused $rows and $ww variables from run_tests
Martin Schwenke [Mon, 27 May 2013 05:16:28 +0000 (15:16 +1000)]
tests/scripts: Delete unused $rows and $ww variables from run_tests

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 80b3cf2c652c6098390cdd0dbb3edc648f7df487)

11 years agopackaging: Create separate package for pcp pmda
Martin Schwenke [Tue, 28 May 2013 04:19:32 +0000 (14:19 +1000)]
packaging: Create separate package for pcp pmda

To build ctdb-pcp-pmda package, run packaging/RPM/makerpms.sh script with
"--with pmda" option.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 85e11b9b13b3add88c1b8957be51793cc1db4f2d)

11 years agobuild: Separate autoconf macros for pmda
Martin Schwenke [Tue, 28 May 2013 04:16:02 +0000 (14:16 +1000)]
build: Separate autoconf macros for pmda

The pmda stuff is no longer built by default even if the headers are
available.  To build, run "configure --enable-pmda".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 194f7a0dec26d693a5f3e6734b1c82f61f8e4d19)

11 years agobuild: Fix install paths for pcp pmda
Martin Schwenke [Tue, 28 May 2013 04:16:25 +0000 (14:16 +1000)]
build: Fix install paths for pcp pmda

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 11af486754bb04899e3dc544157bf70530e66cd1)

11 years agopackaging: makerpms.sh can take multiple arguments for rpmbuild
Martin Schwenke [Mon, 27 May 2013 04:43:03 +0000 (14:43 +1000)]
packaging: makerpms.sh can take multiple arguments for rpmbuild

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f2ef3510407fbad29908195c58e4160d5a81e8a4)

11 years agoeventscripts: Stop NAT gateway's delete_all() from polluting the log
Martin Schwenke [Mon, 27 May 2013 02:56:41 +0000 (12:56 +1000)]
eventscripts: Stop NAT gateway's delete_all() from polluting the log

Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:

  ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1

Since this usually fails it is better to mute the error than to have
it pollute the log.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)

11 years agorecoverd: Backward compatibility for nodes without IPREALLOCATED control
Martin Schwenke [Mon, 27 May 2013 01:29:42 +0000 (11:29 +1000)]
recoverd: Backward compatibility for nodes without IPREALLOCATED control

Consider the case of upgrading a cluster node by node, where some
nodes are still running older versions of CTDB without the
IPREALLOCATED control.  If a "new" node takes over as recovery master
and a failover occurs, then it will attempt to send IPREALLOCATED
controls to all nodes.  The "old" nodes will fail in a fairly
nondescript way (result == -1).

To try to handle this situation, fall back to the EVENTSCRIPT control
to handle "ipreallocated".  Only do this on the failed nodes.
However, do not do this on nodes that timed out (they've probably
implemented the control and we should call the regular fail_callback
to get those nodes banned) or for stopped nodes (since they can't
actually run the "ipreallocated" event via the EVENTSCRIPT control).

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2654853ce9b7c18c5874b080bc94d3118078a5d)

11 years agoscripts: Provide mktemp function for platforms without mktemp command
Martin Schwenke [Sat, 25 May 2013 09:57:24 +0000 (19:57 +1000)]
scripts: Provide mktemp function for platforms without mktemp command

This is needed for AIX and possibly others.

Also provide a cheaper mktemp function is needed in the run_tests
script.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)

11 years agotests: Fix integration tests to use real private IPs
Martin Schwenke [Sat, 25 May 2013 09:08:49 +0000 (19:08 +1000)]
tests: Fix integration tests to use real private IPs

192.0.2.x was a typo.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c9e36f596c63c9af7f80d7cb8d7a5c6dcca4860a)

11 years agopmda: handle new ctdb_statistics format
David Disseldorp [Fri, 24 May 2013 14:11:12 +0000 (16:11 +0200)]
pmda: handle new ctdb_statistics format

The ctdb_statistics structure was recently changed. Update the PMDA to
dereference the new structure member names.

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit e5a5ab53173d9aa4190ddf68c4ae316d4473eb56)

11 years agotests/takeover: New test with 900 IPs
Martin Schwenke [Fri, 5 Apr 2013 09:47:47 +0000 (20:47 +1100)]
tests/takeover: New test with 900 IPs

(This used to be ctdb commit 75a620c516e384f042b5d675183b3a1b48fd6115)

11 years agotests/takeover: Takeover tests can use up to 1024 and checks limits
Martin Schwenke [Fri, 5 Apr 2013 09:45:08 +0000 (20:45 +1100)]
tests/takeover: Takeover tests can use up to 1024 and checks limits

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cfd1371d3a1f78a0ed86485d83bd4d311727c3d4)

11 years agotests/takeover: LCP2 tests for weird, unbalanced corner-cases
Martin Schwenke [Mon, 8 Apr 2013 04:37:44 +0000 (14:37 +1000)]
tests/takeover: LCP2 tests for weird, unbalanced corner-cases

2 tests to show a bad result and a 3rd test for the fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ef35c8889d90220929e48e66eb62da9ea2025ede)

11 years agotests/takeover: Allow takeover runs with differing IP allocations per node
Martin Schwenke [Mon, 8 Apr 2013 04:37:08 +0000 (14:37 +1000)]
tests/takeover: Allow takeover runs with differing IP allocations per node

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 954ae6f84cb06a8dcbc12456d4752280072be5bf)

11 years agovacuum: Reduce the priority of non-critical error
Amitay Isaacs [Fri, 24 May 2013 08:07:39 +0000 (18:07 +1000)]
vacuum: Reduce the priority of non-critical error

Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain.  We will try again to store this record.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)

11 years agoctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.
Michael Adam [Fri, 17 May 2013 09:05:44 +0000 (11:05 +0200)]
ctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit b697625b184227dad1be31a41b7a3fd9bd312e29)

11 years agoctdbd: remove a nonempty blank line
Michael Adam [Fri, 17 May 2013 09:01:31 +0000 (11:01 +0200)]
ctdbd: remove a nonempty blank line

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit d9e24782a90d9ce29c0e6584b75d2b186142174d)

11 years agoctdbd: update comment describing ctdb_call_send_redirect()
Michael Adam [Fri, 17 May 2013 09:00:32 +0000 (11:00 +0200)]
ctdbd: update comment describing ctdb_call_send_redirect()

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 9a21d417c51fb9cad8f2e87e00ca54d379aef860)

11 years agotests/takeover: New tests to check runstate handling
Martin Schwenke [Mon, 6 May 2013 10:31:08 +0000 (20:31 +1000)]
tests/takeover: New tests to check runstate handling

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c57430998a3bdedc8a904eb3a9cdfde1421aff50)

11 years agorecoverd: Nodes can only takeover IPs if they are in runstate RUNNING
Martin Schwenke [Mon, 6 May 2013 05:36:29 +0000 (15:36 +1000)]
recoverd: Nodes can only takeover IPs if they are in runstate RUNNING

Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined.  Both of
these events can (re)start services.

This stops IPs being hosted before the "startup" event has completed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)

11 years agorecoverd: Handle errors carefully when fetching tunables
Martin Schwenke [Thu, 23 May 2013 09:03:11 +0000 (19:03 +1000)]
recoverd: Handle errors carefully when fetching tunables

If a tunable is not implemented on a remote node then this should not
be fatal.  In this case the takeover run can continue using benign
defaults for the tunables.

However, timeouts and any unexpected errors should be fatal.  These
should abort the takeover run because they can lead to unexpected IP
movements.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c0c27762ea728ed86405b29c642ba9e43200f4ae)

11 years agorecoverd: Set explicit default value when getting tunable from nodes
Martin Schwenke [Thu, 23 May 2013 09:01:01 +0000 (19:01 +1000)]
recoverd: Set explicit default value when getting tunable from nodes

Both of the current defaults are implicitly 0.  It is better to make
the defaults obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1190bb0d9c14dc5889c2df56f6c8986db23d81a1)

11 years agoclient: async_callback() sets result to -ETIME if a control times out
Martin Schwenke [Thu, 23 May 2013 06:09:38 +0000 (16:09 +1000)]
client: async_callback() sets result to -ETIME if a control times out

Otherwise there is no way of treating a timeout differently to a
general failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 40e34773b8063196457746ffe7a048eb87d96d61)

11 years agoctdbd: Update the get_tunable code to return -EINVAL for unknown tunable
Martin Schwenke [Tue, 21 May 2013 05:41:56 +0000 (15:41 +1000)]
ctdbd: Update the get_tunable code to return -EINVAL for unknown tunable

Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)

11 years agorecoverd: Whitespace improvements
Martin Schwenke [Wed, 22 May 2013 07:19:34 +0000 (17:19 +1000)]
recoverd: Whitespace improvements

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 473cfcb019f0cb4a094bf10397f7414f7923ee57)

11 years agorecoverd: Use talloc_array_length() for simpler code
Martin Schwenke [Wed, 22 May 2013 10:56:03 +0000 (20:56 +1000)]
recoverd: Use talloc_array_length() for simpler code

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f6792f478197774d2f3b2258c969b67c83e017ab)

11 years agoctdbd: When the "setup" event fails log an error and exit, don't abort
Martin Schwenke [Fri, 11 Jan 2013 07:02:51 +0000 (18:02 +1100)]
ctdbd: When the "setup" event fails log an error and exit, don't abort

The "setup" event can fail when one of the eventscripts fails to run
its "setup" event.  If this occurs then the eventscript should log an
error.  The stack trace and core file generated when we abort provides
no useful information.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c50eca6fbf49a6c7bf50905334704f8d2d3237d7)

11 years agoeventscripts: 11.natgw should not call ctdb tool in "init" event
Martin Schwenke [Fri, 11 Jan 2013 05:02:31 +0000 (16:02 +1100)]
eventscripts: 11.natgw should not call ctdb tool in "init" event

The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.

Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19)

11 years agoctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
Martin Schwenke [Thu, 18 Apr 2013 10:30:14 +0000 (20:30 +1000)]
ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY

This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).

Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)

11 years agotools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
Martin Schwenke [Fri, 11 Jan 2013 03:09:14 +0000 (14:09 +1100)]
tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments

If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.

At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately.  This behaviour isn't very
friendly.

The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.

The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)

11 years agotools/ctdb: New command runstate to print current runstate
Martin Schwenke [Fri, 11 Jan 2013 03:07:12 +0000 (14:07 +1100)]
tools/ctdb: New command runstate to print current runstate

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit bf20c3ab090f75f59097b36186347cedb1c445d4)

11 years agoctdbd: New control CTDB_CONTROL_GET_RUNSTATE
Martin Schwenke [Tue, 21 May 2013 06:18:28 +0000 (16:18 +1000)]
ctdbd: New control CTDB_CONTROL_GET_RUNSTATE

Also new client function ctdb_ctrl_get_runstate().

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1)

11 years agoctdbd: Start logging process earlier
Martin Schwenke [Thu, 10 Jan 2013 05:48:39 +0000 (16:48 +1100)]
ctdbd: Start logging process earlier

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f43fe3a560d5915c1a9893256f4e7bfe3d7e290a)

11 years agoctdbd: Only start recovery daemon and timed events after setup event
Martin Schwenke [Thu, 10 Jan 2013 05:33:36 +0000 (16:33 +1100)]
ctdbd: Only start recovery daemon and timed events after setup event

This deconstructs ctdb_start_transport(), which did much more than
starting the transport.

This removes a very unlikely race and adds some clarity.  The setup
event is supposed to set the tunables before the first recovery.
However, there was nothing stopping the first recovery from starting
before the setup event had completed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c31feb27dcdb748b5333321c85fe54852dfa1bcf)

11 years agoctdbd: Replace ctdb->done_startup with ctdb->runstate
Martin Schwenke [Thu, 10 Jan 2013 05:06:25 +0000 (16:06 +1100)]
ctdbd: Replace ctdb->done_startup with ctdb->runstate

This allows states, including startup and shutdown states, to be
clearly tracked.  This doesn't include regular runtime "states", which
are handled by node flags.

Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)

11 years agotools/ctdb: Remove duplicate command definition for "sync"
Martin Schwenke [Thu, 23 May 2013 06:06:47 +0000 (16:06 +1000)]
tools/ctdb: Remove duplicate command definition for "sync"

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9e7b7cd04adc5e66e2ffa4edf463a682aaea379b)

11 years agologging: Make sure ringbuffer messages are terminated with a newline
Amitay Isaacs [Wed, 8 May 2013 13:29:55 +0000 (23:29 +1000)]
logging: Make sure ringbuffer messages are terminated with a newline

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit dbb7c550133c92292a7212bdcaaa79f399b0919b)

11 years agotests: Fix output of run_tests usage
Amitay Isaacs [Wed, 8 May 2013 06:25:30 +0000 (16:25 +1000)]
tests: Fix output of run_tests usage

(This used to be ctdb commit 29911fa44a480c17c701528ef46919b2a962a366)

11 years agolocking: Set lock helper path once
Amitay Isaacs [Wed, 8 May 2013 03:45:55 +0000 (13:45 +1000)]
locking: Set lock helper path once

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 80fbe9364350d42658f7f8af250ac87eb1afbc21)

11 years agolocking: Remove functions that are not used anymore
Amitay Isaacs [Wed, 8 May 2013 00:42:08 +0000 (10:42 +1000)]
locking: Remove functions that are not used anymore

These functions were used in locking child process to do the locking.  With
locking helper, these are not required.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c660f33c3eaa1b4a2c4e951c1982979e57374ed4)

11 years agolocking: Remove functions that are not used anymore
Amitay Isaacs [Tue, 30 Apr 2013 05:13:44 +0000 (15:13 +1000)]
locking: Remove functions that are not used anymore

These functions were used in locking child process to do the locking.  With
locking helper, these are not required.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 6ea3212a7b177c6c06b1484cf9e8b2f4036653d9)

11 years agolocking: Use separate locking helper binary for locking
Amitay Isaacs [Tue, 30 Apr 2013 05:07:49 +0000 (15:07 +1000)]
locking: Use separate locking helper binary for locking

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7cde53a6cbe74b1e46f7e1bca298df82c08de866)

11 years agolocking: Create commandline arguments for locking helper
Amitay Isaacs [Tue, 30 Apr 2013 04:32:46 +0000 (14:32 +1000)]
locking: Create commandline arguments for locking helper

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f665e3d540c90579952e590caa5828acb581ae61)

11 years agolocking: Add a standalone helper to lock record/db
Amitay Isaacs [Mon, 22 Apr 2013 05:36:27 +0000 (15:36 +1000)]
locking: Add a standalone helper to lock record/db

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a08b6ac19506160f3fb5925ea025027dce07781d)

11 years agolocking: Use database iterator for unmarking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:14:16 +0000 (14:14 +1000)]
locking: Use database iterator for unmarking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7630ca4116b476636c27407748088ea335f1a06c)

11 years agolocking: Add handler function for unmarking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:16:07 +0000 (14:16 +1000)]
locking: Add handler function for unmarking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit adc113055de98fae276f9b501aff5c03cd25ddc8)

11 years agolocking: Use database iterator for marking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:12:40 +0000 (14:12 +1000)]
locking: Use database iterator for marking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e8ea65b2713417db4a618a9f4633991cfaa93fe6)

11 years agolocking: Add handler function for marking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:07:11 +0000 (14:07 +1000)]
locking: Add handler function for marking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f120e40533780e02ff1cdc41cc6d3af1c4c83258)

11 years agolocking: Use database iterator for unlocking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:10:06 +0000 (14:10 +1000)]
locking: Use database iterator for unlocking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 187ed83f9701c7fa8d3cc476d47c5d2a87d5c308)

11 years agolocking: Add handler function for unlocking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:06:46 +0000 (14:06 +1000)]
locking: Add handler function for unlocking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 725239535f40ca2cca445bb5bf2e181351b330e9)

11 years agolocking: Use database iterator for locking databases
Amitay Isaacs [Tue, 30 Apr 2013 04:08:51 +0000 (14:08 +1000)]
locking: Use database iterator for locking databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d2634d72d9ca0ceeb72cbb1adc95017a234480fd)

11 years agolocking: Add handler function for locking a database
Amitay Isaacs [Tue, 30 Apr 2013 04:06:27 +0000 (14:06 +1000)]
locking: Add handler function for locking a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2a1c933ef7c78ee071e2a640ea10941f1c12e32a)

11 years agolocking: Refactor code to iterate over databases based on priority
Amitay Isaacs [Tue, 30 Apr 2013 03:23:59 +0000 (13:23 +1000)]
locking: Refactor code to iterate over databases based on priority

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a3275854812aca86032704134fdf6a129069c86a)

11 years agolocking: Add newline to debug logs
Amitay Isaacs [Wed, 1 May 2013 02:55:22 +0000 (12:55 +1000)]
locking: Add newline to debug logs

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d98a861716d5f8c1f4387d21666396d3164551b3)

11 years agotools/ctdb: Fix racy ipreallocate code
Amitay Isaacs [Thu, 23 May 2013 03:04:06 +0000 (13:04 +1000)]
tools/ctdb: Fix racy ipreallocate code

This code tried to find the recovery master and send an ipreallocate
request to that node.  When a node is stopped, this code asked the
stopped node for recovery master.  Stopped node does not have up-to-date
information on the current recovery master.  So ipreallocate requests
were sent to the wrong node and ignored by that node which is not the
recovery master.

Send ipreallocate request to all active nodes.  That way we guarantee
that the current recovery master will see it and respond to it.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0577ce3c68e4febf49a1ef5093e918db9d5ec636)

11 years agoctdbd: Print version string in the daemon startup
Amitay Isaacs [Wed, 22 May 2013 05:37:46 +0000 (15:37 +1000)]
ctdbd: Print version string in the daemon startup

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 9d4524d13cbba21bfaf61bd35667984359b379b3)

11 years agobuild: Rename version.h to ctdb_version.h
Amitay Isaacs [Wed, 22 May 2013 04:23:17 +0000 (14:23 +1000)]
build: Rename version.h to ctdb_version.h

This avoids clash with version.h from Samba tree.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d18fcfff674e876abde8d51afec92d9c4a090d2f)

11 years agologging: Fix a bug in ringbuffer
Amitay Isaacs [Thu, 9 May 2013 05:43:10 +0000 (15:43 +1000)]
logging: Fix a bug in ringbuffer

When ringbuffer is full, it does not return any entries.  Simplify
ringbuffer logic by keeping track of number of log entries rather than
last entry.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 939d12b96a0cbebbe6269fa2b14f584058dd6174)

11 years agorecoverd: takeover_run_core() should not use modified node flags
Martin Schwenke [Mon, 13 May 2013 05:27:04 +0000 (15:27 +1000)]
recoverd: takeover_run_core() should not use modified node flags

Modifying the node flags with IP-allocation-only flags is not
necessary.  It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.

Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags.  As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446)

11 years agoctdbd: Update confusing log message
Martin Schwenke [Mon, 20 May 2013 00:47:07 +0000 (10:47 +1000)]
ctdbd: Update confusing log message

Inactive can also mean stopped.  To add information, just print the
flags instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a8605f7e06076e7edf84e0cc160fd3d9ab5c4b64)

11 years agoPackaging: maketarball.sh should be a bash script due to pushd use
Martin Schwenke [Fri, 17 May 2013 06:46:41 +0000 (16:46 +1000)]
Packaging: maketarball.sh should be a bash script due to pushd use

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3105f9e291d0792199ac9e689f6d0e0a47ee4b0d)

11 years agoscripts: Rework notify.sh to use notify.d/ directory
Martin Schwenke [Fri, 17 May 2013 06:42:25 +0000 (16:42 +1000)]
scripts: Rework notify.sh to use notify.d/ directory

This makes it easier to add notification handlers.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d29e9a420b133088bf23a847c8d1dbce56c25eb0)

11 years agoctdbd: Log a message when recovery master changes
Martin Schwenke [Tue, 14 May 2013 06:20:32 +0000 (16:20 +1000)]
ctdbd: Log a message when recovery master changes

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91)

11 years agoctdbd: Log add and delete of IPs
Martin Schwenke [Tue, 14 May 2013 05:38:08 +0000 (15:38 +1000)]
ctdbd: Log add and delete of IPs

At the moment, when someone deletes all the IPs on a node, all we see
are the release IP messages and we have to guess why.

Some would argue that add/release are more significant than
take/release so they should be logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3c3df1d6afec7e3e721f9bcd4e8b8e008fd6e50b)

11 years agoctdbd: Removed bogus comment in ctdb_find_iface()
Martin Schwenke [Tue, 14 May 2013 05:30:53 +0000 (15:30 +1000)]
ctdbd: Removed bogus comment in ctdb_find_iface()

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4a8d90d0812a3242f58a2a0e2aa0f528f60f7013)

11 years agoeventscripts: Fix regression in _loadconfig()
Martin Schwenke [Tue, 14 May 2013 04:56:26 +0000 (14:56 +1000)]
eventscripts: Fix regression in _loadconfig()

fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f1619a36c1beba11533052dc5728fa3adaa08870)

11 years agoinitscript: If CTDB doesn't become ready, print a message before killing
Martin Schwenke [Thu, 9 May 2013 10:44:11 +0000 (20:44 +1000)]
initscript: If CTDB doesn't become ready, print a message before killing

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e6b6b793f61556c21e8daf34abf89ee7b388ecfb)

11 years agobuild: Create sudoers.d dir during make install
Christian Ambach [Wed, 8 May 2013 06:45:09 +0000 (08:45 +0200)]
build: Create sudoers.d dir during make install

otherwise make install into non-standard prefix will fail

Signed-off-by: Christian Ambach <ambi@samba.org>
(This used to be ctdb commit 0c0752515b66661ffae24be5f138bd2fab4dec5c)

11 years agoeventscripts: Do not use bashism for string comparison
Amitay Isaacs [Tue, 14 May 2013 13:18:32 +0000 (23:18 +1000)]
eventscripts: Do not use bashism for string comparison

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit b0cae7d5a00ef3764bae187affc8e9a252f4b329)

11 years agorecoverd: Move IP flags into ctdb_takeover.c
Martin Schwenke [Thu, 9 May 2013 02:53:48 +0000 (12:53 +1000)]
recoverd: Move IP flags into ctdb_takeover.c

These should never be seen outside the IP allocation code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e143abd16ccde2e0edfe103673d31a5fb06b6aef)

11 years agorecoverd: Clear IP flags after IP allocation algorithm has run
Martin Schwenke [Thu, 9 May 2013 02:51:57 +0000 (12:51 +1000)]
recoverd: Clear IP flags after IP allocation algorithm has run

If these flags are left set they will confuse other recovery daemon
code.

Factor the clearing code into new function clear_ipflags().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 45c776958017ea7001f061842c9e0f60e4a25f23)

11 years agorecoverd: Remove unused mask argument and initial mask calculation
Martin Schwenke [Fri, 3 May 2013 10:46:15 +0000 (20:46 +1000)]
recoverd: Remove unused mask argument and initial mask calculation

This has been replaced by set_ipflags() and associated functionality.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0a3822573db296e73cc897835f783c8abc084b3)

11 years agorecoverd: When calculating rebalance candidates don't consider flags
Martin Schwenke [Fri, 3 May 2013 10:41:32 +0000 (20:41 +1000)]
recoverd: When calculating rebalance candidates don't consider flags

This is really a check to see if a node is already hosting IPs.  If
so, we assume it was previously healthy so it isn't considered as a
rebalance candidate.  There's no need to limit this to healthy node,
since this is checked elsewhere.

Due to this the variable newly_healthy is renamed everywhere to
rebalance_candidates.

The mask argument is now completely unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 65e0ea6c2c0629e19349ba4b9affa221fde2b070)

11 years agorecoverd: Remove unused mask argument from IP allocation functions
Martin Schwenke [Fri, 3 May 2013 10:13:40 +0000 (20:13 +1000)]
recoverd: Remove unused mask argument from IP allocation functions

This is a no-op and is in a separate commit to make the previous
commit less cumbersome.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)