ctdb.git
8 years agoscripts: Add support for CTDB_DBDIR in tmpfs
Martin Schwenke [Fri, 23 Oct 2015 03:04:04 +0000 (14:04 +1100)]
scripts: Add support for CTDB_DBDIR in tmpfs

The tmpfs is mounted and unmounted by ctdbd_wrapper.  Format is
CTDB_DBDIR=tmpfs:<tmpfs-options>.  The only default for the tmpfs is
mode=700 - to override, specify a different value in <tmpfs-options>.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Mon Nov  9 10:58:32 CET 2015 on sn-devel-104

(Imported from commit be670ef0103878d8d939de5972b567c4db404082)

8 years agoscripts: Improve CTDB wrapper shutdown code
Martin Schwenke [Fri, 23 Oct 2015 03:04:04 +0000 (14:04 +1100)]
scripts: Improve CTDB wrapper shutdown code

This will make it easier to run things after CTDB is stopped.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit f05c6d32cce334d29e373f1e74f0b52cab14409d)

8 years agoscripts: Drop use of "smbcontrol winbindd ip-dropped ..."
Martin Schwenke [Mon, 8 Feb 2016 04:55:17 +0000 (15:55 +1100)]
scripts: Drop use of "smbcontrol winbindd ip-dropped ..."

This is unnecessary in Samba >= 4.0 because winbindd monitors IP
address itself and no longer needs to be told when they are dropped.
The smbcontrol commands can hang if a node has recovery mode active
because smbcontrol is unable to connect to the registry.  Therefore,
the smbcontrol commands should be removed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11719

Signed-off-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 519564bb35a0f840bc4d7c8c5a92441c97b49791)

8 years agoscripts: Improve error handling for 50.samba testparm failure
Martin Schwenke [Thu, 30 Jul 2015 06:49:35 +0000 (16:49 +1000)]
scripts: Improve error handling for 50.samba testparm failure

Also add tests.  Update testparm stub to fake error and timeout.  Add
timeout stub.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 7d04778c82a8f657b6ba0173c29529fa03ab7a25)

8 years agotests: Run transaction tests with externally imposed timeout
Martin Schwenke [Wed, 8 Oct 2014 01:22:06 +0000 (12:22 +1100)]
tests: Run transaction tests with externally imposed timeout

This works around cases where ctdb_transaction gets stuck - this still
needs to be debugged.  However, this change will at least cause
individual tests to fail rather than having whole test runs time out.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit f4871b8736f22941b227c19656319033c0c812e8)

8 years agodaemon: Reset database statistics when resetting statistics
Amitay Isaacs [Thu, 2 Apr 2015 02:53:09 +0000 (13:53 +1100)]
daemon: Reset database statistics when resetting statistics

When the ctdb statistics is reset, reset per database statistics to keep
it consistent with ctdb statistics.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 7949ce103f2062aa703a24f72e11be96dc497a7a)

8 years agosystem: Remove unused system specific calls
Amitay Isaacs [Mon, 3 Aug 2015 05:02:43 +0000 (15:02 +1000)]
system: Remove unused system specific calls

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit d9030d8c10ebe6f95f33cbc691b5756d97395b0f)

8 years agodaemon: Check if updates are in flight when releasing all IPs
Martin Schwenke [Fri, 24 Jul 2015 05:32:42 +0000 (15:32 +1000)]
daemon: Check if updates are in flight when releasing all IPs

Some code involved in releasing IPs is not re-entrant.  Memory
corruption can occur if, for example, overlapping attempts are made to
ban a node.  We haven't been able to recreate the corruption but this
should protect against it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 952a50485f68b3cffdf57da84aa9bb9fde630b7e)

8 years agobanning: If node is already banned, do not run ctdb_local_node_got_banned()
Amitay Isaacs [Mon, 27 Jul 2015 06:51:08 +0000 (16:51 +1000)]
banning: If node is already banned, do not run ctdb_local_node_got_banned()

This calls release_all_ips() only once on the first ban.  If the node gets
banned again due to event script timeout while running release_all_ips(),
then avoid calling release_all_ips() in re-entrant fashion.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 8eb04d09b119e234c88150e1dc35fc5057f9c926)

8 years agoclient: Return the correct status sent from the daemon
Amitay Isaacs [Thu, 23 Jul 2015 21:39:26 +0000 (07:39 +1000)]
client: Return the correct status sent from the daemon

If a control fails and error message is set, the returned status of the
control is always set to -1 ignoring the status passed by the daemon.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 1286b02e24a521dafa7061d09fb5c21d1ebb3011)

8 years agodaemon: Correctly process the exit code from failed eventscripts
Amitay Isaacs [Tue, 21 Jul 2015 06:37:04 +0000 (16:37 +1000)]
daemon: Correctly process the exit code from failed eventscripts

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Jul 22 15:03:53 CEST 2015 on sn-devel-104

(Imported from commit 00ec3c477eba50206801b451ae4eb64c12aba5db)

8 years agotool: Correctly print timed out event scripts output
Amitay Isaacs [Mon, 20 Jul 2015 06:37:58 +0000 (16:37 +1000)]
tool: Correctly print timed out event scripts output

The timed out error is ignored for certain events (start_recovery,
recoverd, takeip, releaseip).  If these events time out, then the debug
hung script outputs the following:

 3 scripts were executed last releaseip cycle
 00.ctdb              Status:OK    Duration:4.381 Thu Jul 16 23:45:24 2015
 01.reclock           Status:OK    Duration:13.422 Thu Jul 16 23:45:28 2015
 10.external          Status:DISABLED
 10.interface         Status:OK    Duration:-1437083142.208 Thu Jul 16 23:45:42 2015

The endtime for timed out scripts is not set.  Since the status is not
returned as -ETIME for some events, ctdb scriptstatus prints -ve duration.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 71b89b2b7a9768de437347e6678370b2682da892)

8 years agodaemon: Ignore SIGUSR1
Martin Schwenke [Tue, 21 Jul 2015 02:23:27 +0000 (12:23 +1000)]
daemon: Ignore SIGUSR1

No use dying or failing eventscripts if someone sends a random
SIGUSR1.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jul 21 11:00:17 CEST 2015 on sn-devel-104

(Imported from commit 65515919142c922fe6ddf63d0f50449eec445b30)

8 years agodaemon: Return correct sequence number for CONTROL_GET_DB_SEQNUM
Amitay Isaacs [Tue, 14 Jul 2015 06:54:59 +0000 (16:54 +1000)]
daemon: Return correct sequence number for CONTROL_GET_DB_SEQNUM

Due to the missing cast of uint64_t, CONTROL_GET_DB_SEQNUM always returned
seqnum <= 256.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11398

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jul 14 13:03:25 CEST 2015 on sn-devel-104

(Imported from commit 1023db2543f7785e4527a4565db91edcde4ca7f1)

8 years agodaemon: Allow a new monitor event to cancel one already in progress
Martin Schwenke [Tue, 14 Jul 2015 03:43:14 +0000 (13:43 +1000)]
daemon: Allow a new monitor event to cancel one already in progress

Before commit cbffbb7c2f406fc1d8ebad3c531cc2757232690e this was
possible and some users depend on this behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 182ebc07289c776ca104e648911a53209bcdaf00)

8 years agodaemon: Improve error messages when eventscript control is cancelled
Martin Schwenke [Mon, 6 Jul 2015 02:02:00 +0000 (12:02 +1000)]
daemon: Improve error messages when eventscript control is cancelled

Warn specifically about cancellation instead of printing a generic
error message.  Also pass back an error message for the tool - it
could just rely on the status but it already looks at the error
message.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 122a4fda7272ec4d63452037f0b838d2bdc5a79a)

8 years agotools: Avoiding printing "(null)" on "ctdb eventscript" error
Martin Schwenke [Mon, 6 Jul 2015 01:48:28 +0000 (11:48 +1000)]
tools: Avoiding printing "(null)" on "ctdb eventscript" error

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b71d18d2dc090e99d67c6bd8552380b44f8db810)

8 years agodaemon: Avoid double-free during monitor cancellation
Amitay Isaacs [Fri, 10 Jul 2015 04:02:29 +0000 (14:02 +1000)]
daemon: Avoid double-free during monitor cancellation

The eventscript state should never be freed externally, so it should
never be allocated off a temporary context.  It will either be freed
by the handler or in the cancellation code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit f951ff13838e796cd6661d800daf460247cac60b)

8 years agotests: Add some 10.interfaces VLAN tests
Martin Schwenke [Wed, 8 Jul 2015 12:22:09 +0000 (22:22 +1000)]
tests: Add some 10.interfaces VLAN tests

One without a bond, one with a bond.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8ed0cacaf4aa9fc63b8c8d610a6164c5d01e473a)

8 years agotests: Add VLAN support to the "ip link" stub
Martin Schwenke [Wed, 8 Jul 2015 12:14:51 +0000 (22:14 +1000)]
tests: Add VLAN support to the "ip link" stub

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8e41cb1e4e7b4a7d92628771260649ded4432772)

8 years agotests: Interface number in "ip link show" stub defaults to 42
Martin Schwenke [Wed, 8 Jul 2015 11:39:51 +0000 (21:39 +1000)]
tests: Interface number in "ip link show" stub defaults to 42

It needs to have a default for the standalone case, when it is not run
in a loop inside "ip addr show".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 4f84d42b511a4c9a79bd835eeca0a80082e76227)

8 years agoscripts: Support monitoring of interestingly named VLANs on bonds
Martin Schwenke [Wed, 8 Jul 2015 11:23:48 +0000 (21:23 +1000)]
scripts: Support monitoring of interestingly named VLANs on bonds

VLAN interfaces on bonds with a name other than <iface>.<id>@<iface>
are not currently supported.  That is, where the VLAN name isn't based
on the underlying bond name.  Such VLAN interfaces can be created with
the "ip link" command, as opposed to the "vconfig" command, or by
renaming a VLAN interface.

This is improved by determining the underlying interface name for a
VLAN from the output of "ip link".

No serious attempt is made to support VLANs with '@' in their name,
although this seems to be legal.  Why would you do that?

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit bc71251433ce618c95c674d7cbe75b01a94adad9)

8 years agodaemon: Fix valgrind invalid read error in db_statistics control
Amitay Isaacs [Thu, 9 Jul 2015 04:55:59 +0000 (14:55 +1000)]
daemon: Fix valgrind invalid read error in db_statistics control

  ==20761== Invalid read of size 8
  ==20761==    at 0x11BE30: ctdb_ctrl_dbstatistics (ctdb_client.c:1286)
  ==20761==    by 0x12BA89: control_dbstatistics (ctdb.c:713)
  ==20761==    by 0x1312E0: main (ctdb.c:6543)
  ==20761==  Address 0x713b0d0 is 0 bytes after a block of size 560 alloc'd
  ==20761==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
  ==20761==    by 0x5CB0954: _talloc_memdup (talloc.c:615)
  ==20761==    by 0x11395C: ctdb_control_recv (ctdb_client.c:1146)
  ==20761==    by 0x11BDD7: ctdb_ctrl_dbstatistics (ctdb_client.c:1265)
  ==20761==    by 0x12BA89: control_dbstatistics (ctdb.c:713)
  ==20761==    by 0x1312E0: main (ctdb.c:6543)

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 9aa90482f8ffbddf898eb8a900112f45d82f0930)

8 years agodaemon: Promote debug messages about --start-as-* to NOTICE level
Martin Schwenke [Wed, 17 Jun 2015 05:05:30 +0000 (15:05 +1000)]
daemon: Promote debug messages about --start-as-* to NOTICE level

It is important to know when ctdbd is started with --start-as-stopped
or --start-as-disabled.  Given that this only happens once it makes
sense to promote these debug items to NOTICE level.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit eb159f3ff530de8828631b04e17bf0990aed906e)

8 years agorecoverd: Clear IP assignment tree on election loss
Martin Schwenke [Thu, 11 Jun 2015 05:49:25 +0000 (15:49 +1000)]
recoverd: Clear IP assignment tree on election loss

If a node was previously recovery master (say, 20 years ago) and it
becomes recovery master again then, if IP assignments have changed,
verify_remote_ip_allocation() can produce messages like the following
when called during recovery:

  ctdbd: recoverd:Inconsistent IP allocation - node 0 thinks 10.1.1.1 is held by node 0 while it is assigned to node 1

When a node loses an election it should clear all data specific to it
being the recovery master.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b234ae0a900052b03ca22efab8fa1b9e11f44ecc)

8 years agorecoverd: Add new function clear_ip_assignment_tree()
Martin Schwenke [Thu, 11 Jun 2015 05:46:27 +0000 (15:46 +1000)]
recoverd: Add new function clear_ip_assignment_tree()

This needs to be cleared to avoid stale data when a new recovery
master is elected.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 036c2a92438585ab6b99a22fcf67b67890c525f0)

8 years agovacuum: revert "Do not delete VACUUM MIGRATED records immediately"
Michael Adam [Fri, 12 Jun 2015 08:59:54 +0000 (10:59 +0200)]
vacuum: revert "Do not delete VACUUM MIGRATED records immediately"

This reverts commit 257311e337065f089df688cbf261d2577949203d.

That commit was due to a misunderstanding, and it
does not fix what it was supposed to fix.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1898200481f64676e596e52dc177c8d70ca1a00c)

8 years agoib: make sure the tevent_fd is removed before the fd is closed
Stefan Metzmacher [Fri, 5 Jun 2015 08:30:39 +0000 (10:30 +0200)]
ib: make sure the tevent_fd is removed before the fd is closed

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11316

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 53ff3e4f31f3debd98f9293171c023a0a406858d)

8 years agolocking: move all auto_mark logic into process_callbacks()
Stefan Metzmacher [Tue, 2 Jun 2015 10:43:17 +0000 (12:43 +0200)]
locking: move all auto_mark logic into process_callbacks()

The caller should not dereference lock_ctx after invoking
process_callbacks(), it might be destroyed already.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 12 15:28:57 CEST 2015 on sn-devel-104

(Imported from commit b3a18d66c00dba73a3f56a6f95781b4d34db1fe2)

8 years agolocking: make process_callbacks() more robust
Stefan Metzmacher [Tue, 2 Jun 2015 10:39:17 +0000 (12:39 +0200)]
locking: make process_callbacks() more robust

We should not dereference lock_ctx after invoking the callback
in the auto_mark == false case. The callback could have destroyed it.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a2690bc3f4e28a2ed50ccb47cb404fc8570fde6d)

8 years agolocking: Add a comment to explain auto_mark usage
Amitay Isaacs [Tue, 2 Jun 2015 03:15:37 +0000 (13:15 +1000)]
locking: Add a comment to explain auto_mark usage

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 89849c4d31c0bb0c47864e11abc89efe7d812d87)

8 years agolocking: Avoid resetting talloc destructor
Amitay Isaacs [Tue, 2 Jun 2015 01:25:44 +0000 (11:25 +1000)]
locking: Avoid resetting talloc destructor

Let ctdb_lock_request_destructor() take care of the proper cleanup.
If the request if freed from the callback function, then the lock context
should not be freed.  Setting request->lctx to NULL takes care of that
in the destructor.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit bc747030d435447e62262541cf2e74be4c4229d8)

8 years agolocking: Avoid memory leak in the failure case
Amitay Isaacs [Tue, 2 Jun 2015 01:15:11 +0000 (11:15 +1000)]
locking: Avoid memory leak in the failure case

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 2b352ff20597b9e34b3777d35deca1bf56209f8a)

8 years agolocking: Set destructor when lock_context is created
Amitay Isaacs [Mon, 1 Jun 2015 14:22:07 +0000 (00:22 +1000)]
locking: Set destructor when lock_context is created

There is already code in the destructor to correctly remove it from the
pending or the active queue.  This also ensures that when lock context
is in pending queue and if the lock request gets freed, the lock context
is correctly removed from the pending queue.

Thanks to Stefan Metzmacher for noticing this and suggesting the fix.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 5ae6a8f2fff5b5f4d46f496fd83f555be4b3d448)

8 years agolocking: Set the lock_ctx->request to NULL when request is freed
Stefan Metzmacher [Mon, 1 Jun 2015 14:15:11 +0000 (00:15 +1000)]
locking: Set the lock_ctx->request to NULL when request is freed

The code was added to ctdb_lock_context_destructor() to ensure that
the if a lock_ctx gets freed first, the lock_request does not have a
dangling pointer.  However, the reverse is also true.  When a lock_request
is freed, then lock_ctx should not have a dangling pointer.

In commit 374cbc7b0ff68e04ee4e395935509c7df817b3c0, the code for second
condition was dropped causing a regression.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 752ec31bcbbfe9f5b3b1c5dde4179d69f41cb53c)

8 years agolocking: Avoid memory corruption in ctdb_lock_context_destructor
Stefan Metzmacher [Tue, 26 May 2015 14:45:34 +0000 (16:45 +0200)]
locking: Avoid memory corruption in ctdb_lock_context_destructor

If the lock request is freed from within the callback, then setting
lock_ctx->request to NULL in ctdb_lock_context_destructor will end up
corrupting memory.  In this case, lock_ctx->request could be reallocated
and pointing to something else.  This may cause unexpected abort trying
to dereference a NULL pointer.

So, set lock_ctx->request to NULL before processing callbacks.

This avoids the following valgrind problem.

==3636== Invalid write of size 8
==3636==    at 0x151F3D: ctdb_lock_context_destructor (ctdb_lock.c:276)
==3636==    by 0x58B3618: _talloc_free_internal (talloc.c:993)
==3636==    by 0x58AD692: _talloc_free_children_internal (talloc.c:1472)
==3636==    by 0x58AD692: _talloc_free_internal (talloc.c:1019)
==3636==    by 0x58AD692: _talloc_free (talloc.c:1594)
==3636==    by 0x15292E: ctdb_lock_handler (ctdb_lock.c:471)
==3636==    by 0x56A535A: epoll_event_loop (tevent_epoll.c:728)
==3636==    by 0x56A535A: epoll_event_loop_once (tevent_epoll.c:926)
==3636==    by 0x56A3826: std_event_loop_once (tevent_standard.c:114)
==3636==    by 0x569FFFC: _tevent_loop_once (tevent.c:533)
==3636==    by 0x56A019A: tevent_common_loop_wait (tevent.c:637)
==3636==    by 0x56A37C6: std_event_loop_wait (tevent_standard.c:140)
==3636==    by 0x11E03A: ctdb_start_daemon (ctdb_daemon.c:1320)
==3636==    by 0x118557: main (ctdbd.c:321)
==3636==  Address 0x9c5b660 is 96 bytes inside a block of size 120 free'd
==3636==    at 0x4C29D17: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3636==    by 0x58B32D3: _talloc_free_internal (talloc.c:1063)
==3636==    by 0x58B3232: _talloc_free_children_internal (talloc.c:1472)
==3636==    by 0x58B3232: _talloc_free_internal (talloc.c:1019)
==3636==    by 0x58B3232: _talloc_free_children_internal (talloc.c:1472)
==3636==    by 0x58B3232: _talloc_free_internal (talloc.c:1019)
==3636==    by 0x58AD692: _talloc_free_children_internal (talloc.c:1472)
==3636==    by 0x58AD692: _talloc_free_internal (talloc.c:1019)
==3636==    by 0x58AD692: _talloc_free (talloc.c:1594)
==3636==    by 0x11EC30: daemon_incoming_packet (ctdb_daemon.c:844)
==3636==    by 0x136F4A: lock_fetch_callback (ctdb_ltdb_server.c:268)
==3636==    by 0x152489: process_callbacks (ctdb_lock.c:353)
==3636==    by 0x152489: ctdb_lock_handler (ctdb_lock.c:468)
==3636==    by 0x56A535A: epoll_event_loop (tevent_epoll.c:728)
==3636==    by 0x56A535A: epoll_event_loop_once (tevent_epoll.c:926)
==3636==    by 0x56A3826: std_event_loop_once (tevent_standard.c:114)
==3636==    by 0x569FFFC: _tevent_loop_once (tevent.c:533)
==3636==    by 0x56A019A: tevent_common_loop_wait (tevent.c:637)
==3636==    by 0x56A37C6: std_event_loop_wait (tevent_standard.c:140)
==3636==    by 0x11E03A: ctdb_start_daemon (ctdb_daemon.c:1320)
==3636==    by 0x118557: main (ctdbd.c:321)

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11293

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit ee02e40e869fd46f113d016122dd5384b7887228)

9 years agoscripts: Add alternative network family monitoring for NFS
Martin Schwenke [Tue, 28 Apr 2015 03:51:00 +0000 (13:51 +1000)]
scripts: Add alternative network family monitoring for NFS

For example, adding a file called nfs-rpc-checks.d/20.nfsd@udp.check
will cause NFS to be checked on UDP as well, using a separate counter.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Apr 30 09:24:12 CEST 2015 on sn-devel-104

(Imported from commit e359d826a42656bb02ca2ab85f0fa886a046cb58)

9 years agotests: Switch to tcp check in rpcinfo stub
Amitay Isaacs [Fri, 27 Mar 2015 01:00:56 +0000 (12:00 +1100)]
tests: Switch to tcp check in rpcinfo stub

Use -T tcp instead of deprecated options -u and -t.  Also, check for
localhost.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Mar 27 09:16:50 CET 2015 on sn-devel-104

(Imported from commit 079575d80f5b28e452abf80efc4d005fb6dac270)

9 years agoscripts: Use tcp connection for checking RPC services
Amitay Isaacs [Fri, 27 Mar 2015 01:04:03 +0000 (12:04 +1100)]
scripts: Use tcp connection for checking RPC services

It's possible for a RPC service to register only for UDP and not TCP.
Since we assume all the NFS operations are over TCP, always check RPC
services over TCP.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 14886ed00c998c2ac4deb70f650584e9b371345d)

9 years agoscripts: Respect $RPCMOUNTDOPTS when restarting rpc.mountd
Martin Schwenke [Tue, 24 Mar 2015 09:12:51 +0000 (20:12 +1100)]
scripts: Respect $RPCMOUNTDOPTS when restarting rpc.mountd

$RPCMOUNTDOPTS is ignored when restarting rpc.statd due to the service
being unresponsive.  This variable can be used to increase the number
of rpc.mountd threads when there are a lot of clients reattaching so
ignoring it can mean that only a single rpc.mount thread is started.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 130202d635d8712575fa201a12ef257f4278b862)

9 years agodaemon: Drop tunable that is no longer in use
Amitay Isaacs [Wed, 30 Jul 2014 04:31:54 +0000 (14:31 +1000)]
daemon: Drop tunable that is no longer in use

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 62ba95a9f347d2ac0e4fb53dc62b94f557e17e8b)

9 years agorecoverd: Fix typo in comment
Amitay Isaacs [Wed, 30 Jul 2014 02:32:08 +0000 (12:32 +1000)]
recoverd: Fix typo in comment

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 41ed26cbf7b81e372ea0b5cc3d96dfe217a0cf58)

9 years agodoc: Update NEWS ctdb-2.5.5
Amitay Isaacs [Mon, 13 Apr 2015 04:17:12 +0000 (14:17 +1000)]
doc: Update NEWS

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
9 years agoincludes: Remove some unnecessary declarations
Martin Schwenke [Fri, 5 Sep 2014 06:09:34 +0000 (16:09 +1000)]
includes: Remove some unnecessary declarations

To accommodate removing file_lines_load() from here, drop the #ifdef
around the declaration in util.h.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9726e17e366382776c87a8aaa63884665c604896)

9 years agologging: Move variable debug_extra from debug.*
Martin Schwenke [Sat, 16 Aug 2014 06:17:02 +0000 (16:17 +1000)]
logging: Move variable debug_extra from debug.*

debug_extra is CTDB-specific.  Moving it will help with the
transitions to Samba's updated debug.[ch].

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8b39141c46458974d5476b2925f2dd5d51d9180d)

9 years agologging: Factor out ctdb_logging.h from includes.h
Martin Schwenke [Tue, 9 Sep 2014 03:52:07 +0000 (13:52 +1000)]
logging: Factor out ctdb_logging.h from includes.h

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 97dc127b81c1923755b59aad6e735aa679af3f64)

9 years agorecoverd: Change include of dlinklist.h to contain directory
Martin Schwenke [Fri, 15 Aug 2014 06:18:05 +0000 (16:18 +1000)]
recoverd: Change include of dlinklist.h to contain directory

This makes it consistent with the rest of the code and avoids problems
when some variant of lib/util isn't in the include path.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 0c0f323bb3e9146dc584a461b225586670fa9c2e)

9 years agotools: Move definition of timeval_delta() to tools/ctdb.c
Martin Schwenke [Fri, 15 Aug 2014 05:53:03 +0000 (15:53 +1000)]
tools: Move definition of timeval_delta() to tools/ctdb.c

This function is only used in this file.  Samba's lib/util doesn't
have timeval_delta(), so staging a clean transition.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6e1568149ede06d48b91bbc7ecd8c55da3b41a41)

9 years agodaemon: Drop the argument to fault_setup()
Martin Schwenke [Fri, 15 Aug 2014 05:55:20 +0000 (15:55 +1000)]
daemon: Drop the argument to fault_setup()

Samba's version doesn't accept an argument, so this aids a smooth
transition.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c5c74e47ee672e9e9605c5c4b96733d899b6f9b6)

9 years agoutil: Add extra max_size argument to file_lines_load()
Martin Schwenke [Fri, 15 Aug 2014 06:11:45 +0000 (16:11 +1000)]
util: Add extra max_size argument to file_lines_load()

This is part of a migration to Samba's lib/util.  CTDB always passes 0
(i.e. no max_size) so use a simple assert() to enforce this, rather
than changing a lot of code that will be discarded anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a4e76b58a5086e1339dea53b72437ed179e6025a)

9 years agocommon: Move hex_decode_talloc() to the lock helper
Martin Schwenke [Wed, 6 Aug 2014 06:36:58 +0000 (16:36 +1000)]
common: Move hex_decode_talloc() to the lock helper

This is the only place it is used.

After migrating to Samba's lib/util, the lock helper can be changed to
use strhex_to_data_blob().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 94a5e28ffb53a268865666038678e78cbbb39de3)

9 years agocommon: Add some missing #includes
Martin Schwenke [Thu, 4 Sep 2014 03:33:58 +0000 (13:33 +1000)]
common: Add some missing #includes

To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 751ad4b62561b140b7a33d66e63907411a748501)

9 years agodaemon: Move some inline declarations to header file
Martin Schwenke [Thu, 4 Sep 2014 03:31:15 +0000 (13:31 +1000)]
daemon: Move some inline declarations to header file

To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a81dccf7ad8345a1c44dc7a08e2320bd88e1aaa5)

9 years agotests: Add missing declarations caused by #define magic
Martin Schwenke [Thu, 4 Sep 2014 03:30:09 +0000 (13:30 +1000)]
tests: Add missing declarations caused by #define magic

Some declarations get lost because they basically get #define-d away,
so they need to be repeated after the #undef-s.  Also, some functions
are introduced due the #define-s.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6336b958d61ba6901edbaddac8bc10539c8f30ab)

9 years agotests: Mark some functions as static
Martin Schwenke [Thu, 4 Sep 2014 03:28:34 +0000 (13:28 +1000)]
tests: Mark some functions as static

To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6674949317dd4b2c1855571ea378eb6bc3b2e86c)

9 years agoutil: Remove util/strlist.c and references to str_util_*()
Martin Schwenke [Thu, 4 Sep 2014 02:34:46 +0000 (12:34 +1000)]
util: Remove util/strlist.c and references to str_util_*()

They're not used in CTDB.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5de4a97fe941c27080061480cdd7ed8f60f4438e)

9 years agoFix some "declarations after code" problems
Martin Schwenke [Thu, 4 Sep 2014 01:21:24 +0000 (11:21 +1000)]
Fix some "declarations after code" problems

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b0f9d3305850bdcce171b53e7bbbc9628a4e3c20)

9 years agoutil: Variables should be declared extern in headers
Martin Schwenke [Thu, 4 Sep 2014 01:20:28 +0000 (11:20 +1000)]
util: Variables should be declared extern in headers

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1d16555fa0ad562dcd8c4bbffaca454e68bcabbf)

9 years agoChange default debug level to NOTICE (2)
Martin Schwenke [Mon, 9 Feb 2015 01:04:41 +0000 (12:04 +1100)]
Change default debug level to NOTICE (2)

This was true for the daemon until commit
b4589b954e1090a934fafd3f8e3c2cf1ed785c61.

Defaulting to ERR in the ctdb CLI tool encourages logging notices at
ERR level, so default to NOTICE instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 664d62b61108657d3011cf0bcbe260533c97676f)

9 years agotests: Check for readable, not executable, script
Martin Schwenke [Fri, 6 Mar 2015 00:36:18 +0000 (11:36 +1100)]
tests: Check for readable, not executable, script

Scripts in eventscript unit tests are run under an explicitly
specified shell so they do not need to be executable.  Checking that
the script is executable breaks on scripts that are installed without
the execute bit set, such as disabled eventscripts.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Mar  6 04:40:07 CET 2015 on sn-devel-104

(Imported from commit f6efe0c5c2378f477e528ac9c6571a732aa2c49b)

9 years agolocking: Back-off from logging every 10 seconds
Amitay Isaacs [Wed, 4 Mar 2015 04:36:05 +0000 (15:36 +1100)]
locking: Back-off from logging every 10 seconds

If ctdb_lock_helper cannot get a lock within 10 seconds, ctdb daemon
logs a message and invokes an external debug script.  This is repeated
every 10 seconds.

In case of a contention or on a loaded system, there can be multiple
ctdb_lock_helper processes waiting to get lock on record(s).  For each
lock request taking longer, ctdb daemon will flood the log every
10 seconds.  Instead of logging aggressively every 10 seconds, relax
logging to every 100s and 1000s if the elapsed time has exceeded 100s
and 1000s respectively.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Thu Mar  5 12:06:44 CET 2015 on sn-devel-104

(Imported from commit 3f97be6d0fc166ccc3c97b7f71a01a4f9adb5ddd)

9 years agotests: Correctly cascade test failures from the end of pipes
Amitay Isaacs [Thu, 5 Mar 2015 02:11:46 +0000 (13:11 +1100)]
tests: Correctly cascade test failures from the end of pipes

Some eventscript unit test failures get lost because _passed=false is
set in the tail of a pipe.  Add a new function test_fail() and call it
when necessary to ensure the value of _passed is set correctly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar  5 07:16:54 CET 2015 on sn-devel-104

(Imported from commit 956d1dbfd91615032de337b0d84b40c16657b8c1)

9 years agoscripts: Add a 'rm' stub so statd-callout tests work correctly
Amitay Isaacs [Thu, 5 Mar 2015 02:10:32 +0000 (13:10 +1100)]
scripts: Add a 'rm' stub so statd-callout tests work correctly

statd-callout tries to remove global files from /var/lib/nfs/statd and
this causes errors in tests.  Add an rm stub that ignores attempts to
remove these files but invokes /bin/rm for anything else.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 956e51707d7ddcff060352f54d11ff42bdcc51ef)

9 years agoscripts: Remove unused function nfs_statd_update()
Martin Schwenke [Fri, 13 Feb 2015 04:42:20 +0000 (15:42 +1100)]
scripts: Remove unused function nfs_statd_update()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 50ddc2c35643389c2f249c6ad4496ab73a1bfc99)

9 years agoscripts: Change statd-callout to be more scalable
Martin Schwenke [Fri, 13 Feb 2015 09:55:43 +0000 (20:55 +1100)]
scripts: Change statd-callout to be more scalable

Updating ctdb.tdb on each add-client, del-client and each delete
during notify was too ambitious.  Persistent transactions do not
perform well enough to do this.

Revert to having add-client and del-client create touch files.  Each
monitor event calls "statd-callout update" to convert touch files into
ctdb.tdb records.

Update testcases to do the "update" and add an extra test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 500c6e194babe06b6aead7a053a9442c94db6e38)

9 years agoscripts: Fix a regression in statd-callout
Martin Schwenke [Thu, 26 Feb 2015 04:34:51 +0000 (15:34 +1100)]
scripts: Fix a regression in statd-callout

Commit 4638010abb116aed0c180207aaa11475277aecb7 changed from using
gensub() to gsub() in awk.  However, it didn't halve the number of
backslashes in the target strings.  This is necessary because
backslash is used in gensub() target strings to allow substitution of
text matching parenthesised subexpressions.  This is not the case with
gsub().

So, halve the number of backslashes in the target string where gsub()
is used in statd-callout.  This is the only target string broken by
changes made by the above commit

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 032441d9a2974584cde455e4dbd5cc33fe6a23c2)

9 years agotests: Unit tests for statd-callout
Martin Schwenke [Wed, 4 Mar 2015 00:51:20 +0000 (11:51 +1100)]
tests: Unit tests for statd-callout

With improvements to unit test infrastructure to support.  This
includes linking the real statd-callout into etc-ctdb/ in place of the
placeholder script.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 22602f76bc1ec91e807a8f1cd45ba6fb4c05e622)

9 years agotests: Make setup of public addresses more obvious
Martin Schwenke [Fri, 27 Feb 2015 04:20:56 +0000 (15:20 +1100)]
tests: Make setup of public addresses more obvious

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit d98c7ba382189161c5b8cbbebbdfbe36f1456572)

9 years agotests: Extend eventscript unit test infrastructure for other scripts
Martin Schwenke [Fri, 27 Feb 2015 04:19:04 +0000 (15:19 +1100)]
tests: Extend eventscript unit test infrastructure for other scripts

There's so much infrastructure here that it would be a shame not to
use it for testing things like statd-callout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 7e7c24ca7a422f2258962216b0184eda8d49827f)

9 years agotests: Support testing scripts that change directory
Martin Schwenke [Fri, 27 Feb 2015 04:17:30 +0000 (15:17 +1100)]
tests: Support testing scripts that change directory

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9317d82c19a0eb51ff6293d00328a5c36b063a2c)

9 years agotests: Extend ctdb stub to support "ip" with and without -X
Martin Schwenke [Fri, 27 Feb 2015 04:15:18 +0000 (15:15 +1100)]
tests: Extend ctdb stub to support "ip" with and without -X

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 2aeb518637af29da03984470d874b94dfb18e34e)

9 years agotests: Extend ctdb stub to support "ptrans", "pdelete", "catdb"
Martin Schwenke [Fri, 27 Feb 2015 04:13:23 +0000 (15:13 +1100)]
tests: Extend ctdb stub to support "ptrans", "pdelete", "catdb"

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit d057ca04a9eec0f2aa3d792da0a4648e3716685a)

9 years agotest: Remove unused function check_ctdb_logfile()
Martin Schwenke [Tue, 12 Aug 2014 04:29:34 +0000 (14:29 +1000)]
test: Remove unused function check_ctdb_logfile()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 85cc4efbff601dc25a351ec838de168eb3c7d29a)

9 years agoio: Do not use sys_write to write to client sockets
Amitay Isaacs [Mon, 23 Feb 2015 01:38:11 +0000 (12:38 +1100)]
io: Do not use sys_write to write to client sockets

When sending messages to clients, ctdb checks for EAGAIN error code and
schedules next write in the subsequent event loop.  Using sys_write in
these places causes ctdb to loop hard till a client is able to read from
the socket.  With real time scheduling, ctdb daemon spins consuming 100%
of CPU trying to write to the client sockets.  This can be quite harmful
when running under VMs or machines with single CPU.

This regression was introduced when all read/write calls were replaced to
use sys_read/sys_write wrappers (c1558adeaa980fb4bd6177d36250ec8262e9b9fe).

The existing code backs off in case of EAGAIN failures and waits for an
event loop to process the write again.  This should give ctdb clients
a chance to get scheduled and to process the ctdb socket.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Feb 24 12:29:30 CET 2015 on sn-devel-104

(Imported from commit 04a061e4d19d5bdbd8179fb0fab8b0875eec243e)

9 years agoscripts: Improve messages about invalid tunables during "setup"
Martin Schwenke [Sat, 14 Feb 2015 01:53:08 +0000 (12:53 +1100)]
scripts: Improve messages about invalid tunables during "setup"

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 18 08:03:33 CET 2015 on sn-devel-104

(Imported from commit dc32f11b871a7d4e8ea6fd1d01491d89103decf7)

9 years agotool: Print a warning when setting an obsolete tunable variable
Martin Schwenke [Sun, 8 Feb 2015 23:33:35 +0000 (10:33 +1100)]
tool: Print a warning when setting an obsolete tunable variable

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c3706e7fb07bcb35f7d894c4e8e0c12b4a62d0db)

9 years agoclient: Return a value of 1 when setting obsolete tunable variable
Martin Schwenke [Sun, 8 Feb 2015 23:32:47 +0000 (10:32 +1100)]
client: Return a value of 1 when setting obsolete tunable variable

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 54f0c39e5a33871847aa9fe2c070c7f638f54cc4)

9 years agotests: New tests for 00.ctdb "setup" event - set tunables from config
Martin Schwenke [Sun, 15 Feb 2015 03:39:51 +0000 (14:39 +1100)]
tests: New tests for 00.ctdb "setup" event - set tunables from config

Unit test infrastructure tweaks to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 2c7c35377e5452e37925b970253b70875a8d7470)

9 years agoscripts: Fix tunable setup code by making it shell-agnostic
Martin Schwenke [Mon, 16 Feb 2015 03:04:09 +0000 (14:04 +1100)]
scripts: Fix tunable setup code by making it shell-agnostic

All tunables set in configuration are currently set to 0 on system
where /bin/sh is dash (and perhaps other non-bash shells).  dash puts
single quotes around all values in the output of the "set" builtin
command, whereas bash only puts them around values when something
needs to be quoted.  Tunables always have a simple integer value so
dash will quote them and bash won't.  The setup code currently passes
the raw value, including any quotes to "ctdb setvar ...".  This
command does no error checking on the input, so "'1'" is converted to
0.

Change the code so that the value is determined from the shell
variable and is independent of the "set" output.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 39686f45056d942de5ebe3263a533a99ca17c79e)

9 years agorecoverd: Abort when daemon can take recovery lock during recovery
Martin Schwenke [Tue, 27 Jan 2015 01:55:42 +0000 (12:55 +1100)]
recoverd: Abort when daemon can take recovery lock during recovery

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104

(Imported from commit 39d2fd330a60ea590d76213f8cb406a42fa8d680)

9 years agorecoverd: Improve error messages on recovery lock coherence fail
Martin Schwenke [Wed, 17 Dec 2014 09:33:19 +0000 (20:33 +1100)]
recoverd: Improve error messages on recovery lock coherence fail

When the daemon is able to take the recovery lock during recovery we
might as well guess that the cluster filesystem has a lock coherence
problem and print a more useful message.  This will be more helpful to
those trying out cluster filesystems that don't have lock coherence or
that are difficult to setup.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 432d6774891eba30a959cd2d8ee8469d189c7872)

9 years agorecoverd: Don't release and re-take the recovery lock
Martin Schwenke [Tue, 9 Dec 2014 02:51:27 +0000 (13:51 +1100)]
recoverd: Don't release and re-take the recovery lock

Just continue to hold it, otherwise a broken node might win an
election and grab the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 48c91407abd5e34463d3a10cb6fce47ec4a0d5f6)

9 years agorecoverd: Simplify ctdb_recovery_lock()
Martin Schwenke [Tue, 9 Dec 2014 03:50:38 +0000 (14:50 +1100)]
recoverd: Simplify ctdb_recovery_lock()

Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).

This means that when it is called we need to keep the old behaviour
and explicitly release the lock.  In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1d6ed91f5518d462ba368bca03be923428710157)

9 years agorecoverd: Remove check_recovery_lock()
Martin Schwenke [Tue, 9 Dec 2014 03:45:08 +0000 (14:45 +1100)]
recoverd: Remove check_recovery_lock()

This has not done anything useful since commit
b9d8bb23af8abefb2d967e9b4e9d6e60c4a3b520.  Instead, just check that
the lock is held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit be19a17faf6da97365c425c5b423e9b74f9c9e0c)

9 years agorecoverd: Improve logging when recovery lock file is changed
Martin Schwenke [Tue, 9 Dec 2014 03:09:40 +0000 (14:09 +1100)]
recoverd: Improve logging when recovery lock file is changed

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 668ed5366237b61f0ff618f32555ce29cca5e6f3)

9 years agorecoverd: New function ctdb_recovery_unlock()
Martin Schwenke [Tue, 9 Dec 2014 03:07:20 +0000 (14:07 +1100)]
recoverd: New function ctdb_recovery_unlock()

Unlock the recovery lock file.  This way knowledge of the file
descriptor isn't sprinkled throughout the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit db32a2bce54b9618fe247b33d6de81bd5f7a3b62)

9 years agorecoverd: New function ctdb_recovery_have_lock()
Martin Schwenke [Tue, 9 Dec 2014 02:50:22 +0000 (13:50 +1100)]
recoverd: New function ctdb_recovery_have_lock()

True if this recovery daemon holds the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 72701be663ddb265320a022a22130a3437bbf6bc)

9 years agodaemon: Log a warning when setting obsolete tunables
Martin Schwenke [Tue, 9 Dec 2014 02:49:06 +0000 (13:49 +1100)]
daemon: Log a warning when setting obsolete tunables

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 4d3b52f1cec46f66f8d0827bc8f458cd8c86b5a5)

9 years agodaemon: Mark tunable VerifyRecoveryLock as obsolete
Martin Schwenke [Tue, 9 Dec 2014 02:47:42 +0000 (13:47 +1100)]
daemon: Mark tunable VerifyRecoveryLock as obsolete

It is pointless having a recovery lock but not sanity checking that it
is working.  Also, the logic that uses this tunable is confusing.  In
some places the recovery lock is released unnecessarily because the
tunable isn't set.

Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.

Update documentation that references this tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit d110fe231849d76ecb83378c934627dc64b74c72)

9 years agodoc: Improve documentation of the recovery lock
Martin Schwenke [Tue, 3 Feb 2015 03:27:11 +0000 (14:27 +1100)]
doc: Improve documentation of the recovery lock

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a01744c08ff5b8aca4af99842acfc78a87af9297)

9 years agotests: Add new "ctdb setreclock" test
Martin Schwenke [Mon, 2 Feb 2015 10:21:20 +0000 (21:21 +1100)]
tests: Add new "ctdb setreclock" test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb  4 05:40:55 CET 2015 on sn-devel-104

(Imported from commit 0e89d586b2b7fa6a165a49862d2dce0d13f8b157)

9 years agodaemon: Fix SET_RECLOCK_FILE regression
Martin Schwenke [Wed, 28 Jan 2015 07:51:42 +0000 (18:51 +1100)]
daemon: Fix SET_RECLOCK_FILE regression

If the recovery lock file is unset then this dereferences a NULL
pointer.  The regression is due to commit
6f1ac7af0f87d85402d708231e45a69713bba026.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5e00673f2d95b6257a05324d2ae068004e29ff85)

9 years agoscripts: Call iptables/ip6tables directly from iptables_wrapper
Martin Schwenke [Tue, 30 Dec 2014 05:04:00 +0000 (16:04 +1100)]
scripts: Call iptables/ip6tables directly from iptables_wrapper

Drops the iptables() and ip6tables() functions and, hence, the
hardcoding of paths /sbin/iptables and /sbin/ip6tables.  The latter
avoids problems on openSUSE where (for example) /usr/sbin/iptables is
used instead.

This means that locking around ip*tables commands is only done when
iptables_wrapper is called directly.  This is fine because the only
conflict is when "releaseip" or "takeip"/"updateip" events are run in
parallel.  The other uses in 11.natgw and 70.iscsi are in events where
there will be no collisions.

Making 11.natgw support IPv6 is unnecessary.  Just put a static IPv6
address on each interface - they're plentiful.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jan 28 08:29:55 CET 2015 on sn-devel-104

(Imported from commit ab51f283e7a7f4fc82a94d39e7bb3a68e8aac554)

9 years agoscripts: Error message, comment and whitespace cleanups
Martin Schwenke [Tue, 30 Dec 2014 06:07:09 +0000 (17:07 +1100)]
scripts: Error message, comment and whitespace cleanups

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9b67c1fa3748678552400a81172d124e59d5eb79)

9 years agoscripts: iSCSI eventscript should fail when PNN can't be determined
Martin Schwenke [Tue, 30 Dec 2014 06:03:46 +0000 (17:03 +1100)]
scripts: iSCSI eventscript should fail when PNN can't be determined

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1a5414b6d25ed1b1abdafd8594183b84af33a6fb)

9 years agoscripts: Make 70.iscsi IPv6-aware
Martin Schwenke [Tue, 30 Dec 2014 06:01:21 +0000 (17:01 +1100)]
scripts: Make 70.iscsi IPv6-aware

Block iSCSI port for families of all address the node is configured to
host.

Could just unconditional add blocking using ip6tables instead.
However, this would produce errors when no IPv6 public addresses are
configured and ip6tables is not installed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit d1bd26e5eb25aee2ce82ef178692a64073a99aa0)

9 years agoimprove helpfulness of debug message when taking reclock fails
Michael Adam [Thu, 8 Jan 2015 23:10:37 +0000 (00:10 +0100)]
improve helpfulness of debug message when taking reclock fails

Print out the errno if the fcntl call.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Richard Sharpe <rsharpe@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jan  9 04:25:02 CET 2015 on sn-devel-104

(Imported from commit a59fb322d60b7152110cc0638dd9b76dd259ac15)

9 years agodaemon: Handle out-of-memory when setting recovery lock file
Martin Schwenke [Tue, 9 Dec 2014 02:40:23 +0000 (13:40 +1100)]
daemon: Handle out-of-memory when setting recovery lock file

Log a message when the reclock file actually changes and avoid a
memory allocation when it doesn't change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit 6f1ac7af0f87d85402d708231e45a69713bba026)

9 years agoscripts: Don't use the GNU awk gensub() function
Martin Schwenke [Fri, 19 Dec 2014 03:19:32 +0000 (14:19 +1100)]
scripts: Don't use the GNU awk gensub() function

This is a gawk extension and can't be used reliably if just running
"awk".  It is simple enough to switch to using the standard sub() and
gsub() functions.

The alternative is to switch to explicitly running "gawk".  However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit 4638010abb116aed0c180207aaa11475277aecb7)

9 years agoscripts: Try to deal with Ubuntu having /usr/sbin/service
Martin Schwenke [Mon, 1 Dec 2014 01:21:16 +0000 (12:21 +1100)]
scripts: Try to deal with Ubuntu having /usr/sbin/service

Falling back to running the initscript doesn't work because it detects
that upstart is being used and fails.  This was observed when trying
to start winbind on Ubuntu 11.04.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit a5c5eee7d186d938c5b458cb6dbf0c78cb548b63)