ctdb.git
10 years agoeventscripts: Use set_proc() to update /proc
Martin Schwenke [Fri, 7 Mar 2014 02:37:21 +0000 (13:37 +1100)]
eventscripts: Use set_proc() to update /proc

In case we want to write some unit tests in the future.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 5519a05766ca9e0a8ace39c49661b9ad52440cac)

10 years agodaemon: Optimise deletion of IPs
Martin Schwenke [Wed, 22 Jan 2014 06:01:19 +0000 (17:01 +1100)]
daemon: Optimise deletion of IPs

Previous commits maintained the ordering between
ctdb_remove_orphaned_ifaces() and ctdb_vnn_unassign_iface().  This
meant that ctdb_remove_orphaned_ifaces() needed to steal the orphaned
interfaces and they would be freed later.

Unassign the interface first and things get simpler.
ctdb_remove_orphaned_ifaces() is now self-contained.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Sun Mar 23 06:20:43 CET 2014 on sn-devel-104

(Imported from commit 20c719677a28afa1d1b912b9fadbf384e9e65de7)

10 years agodaemon: Make delete IP wait until the IP is released
Martin Schwenke [Wed, 22 Jan 2014 02:30:47 +0000 (13:30 +1100)]
daemon: Make delete IP wait until the IP is released

reloadips really expects deleted IPs to be released before completing.
Otherwise the recovery daemon starts failing the local IP check.  The
races that follow can cause a node to be banned.

To make the error handling simple, do the actual deletion in
release_ip_callback().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9b907536fb657fa15c02858caf0ffff633ecd478)

10 years agotests: Improve tickle tests
Martin Schwenke [Fri, 28 Feb 2014 04:54:05 +0000 (15:54 +1100)]
tests: Improve tickle tests

It is hard to diagnose failures in the NFS tickle test because there's
no way of telling if the test node doesn't have the tickle or if it
didn't get propagated.

Factor out check_tickles() into local.bash and give it some
parameters.

Have the NFS test call it first to ensure the tickle has been
registered.  Then use new function check_tickles_all() to ensure the
tickle has been propagated to all nodes.  Give this a bit of extra
time (double the timeout) just in case we're racing with the update.

Add a useful comment to the CIFS test so that I stop asking myself how
the test could ever have worked reliably.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(Imported from commit bafb9151ccb5722df36f9ba168716f4f4fa01cdc)

10 years agodaemon: Do not disable monitoring when running eventscripts
Martin Schwenke [Tue, 4 Mar 2014 04:06:11 +0000 (15:06 +1100)]
daemon: Do not disable monitoring when running eventscripts

This is racy and cbffbb7c2f406fc1d8ebad3c531cc2757232690e makes it
unnecessary.

The eventscript code still knows that monitor events are special
compared to other events.  However, the general concept of monitoring
is no longer tangled up with running scripts.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit ecafbce1b1cf53ae8c3de9eb5201192f7fe1f67d)

10 years agoeventscripts: Attach to persistent ctdb.tdb in "startup" event
Martin Schwenke [Sun, 2 Mar 2014 18:50:14 +0000 (05:50 +1100)]
eventscripts: Attach to persistent ctdb.tdb in "startup" event

"statd-callout notify" currently complains until an add-client or
del-client is done.

Given that we might use ctdb.tdb for something else in the future it
makes sense attach to it in the "startup" event.  This could be done
in the background but it should be so lightweight that a timeout will
indicate serious problems.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 87d58fd07b1294688b8fc6dbdf3dbb6cb12d3a80)

10 years agodaemon: Fix tickle updates to recently started nodes
Martin Schwenke [Thu, 13 Mar 2014 05:53:15 +0000 (16:53 +1100)]
daemon: Fix tickle updates to recently started nodes

Commit 0723fedcedd4a97870f7b1224945f1587363c9bf added a cheap
implemention of ctdb_control_startup() that simply flags the recipient
node as needing to send updates for each IP when the tickle update
loop next fires.  Commit 026996550d726836091ff5ebd1ebf925bf237bb0
ensures that a node only sends tickle updates once being flagged to do
so.

CTDB_CONTROL_STARTUP is broadcast to all nodes, so this is a good
start.  However, the tickle updates are only broadcast to connected
nodes.  A recently started node may not yet be considered to be
connected because the keepalive monitoring loop may not yet have
marked the node as connected.  This means that the tickle update loop
races with the keepalive monitoring loop.  If the tickle update loop
wins then updates will not be sent to the recently started node.

The simplest improvement is to stop the tickle update from depending
on whether a node is connected or not.  So instead of broadcasting
tickle updates to connected nodes, they are broadcast to all nodes.
Since no reply is expected, this should work just fine.

While looking at this code, ctdb_ctrl_set_tcp_tickles() is named like
a client function.  It isn't a client function.  Also, 2 of the
arguments are ignored.  So rename this function to
ctdb_send_set_tcp_tickles_for_ip() and remove the ignored arguments.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(Imported from commit 4f79fa6c7c843502fcdaa2dead534ea3719b9f69)

10 years agotools-ctdb: Parse IP addresses when reading a list from a file
Martin Schwenke [Mon, 3 Mar 2014 02:20:06 +0000 (13:20 +1100)]
tools-ctdb: Parse IP addresses when reading a list from a file

This way this logic is centralised.  It also means that the IP address
comparisons in the NAT gateway code are IPv6 safe.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 441e0998370bfd7b0de5dd9aed7e2abbcf64cf73)

10 years agotools-ctdb: Remove redundant filtering of trailing empty lines
Martin Schwenke [Mon, 3 Mar 2014 05:23:42 +0000 (16:23 +1100)]
tools-ctdb: Remove redundant filtering of trailing empty lines

There is a check for empty lines in the loop just below.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fad2b6b074495eb1dc036cce293456857985f8f5)

10 years agotools-ctdb: Use DLIST_ADD_END() to avoid reversing the list
Martin Schwenke [Mon, 3 Mar 2014 02:04:25 +0000 (13:04 +1100)]
tools-ctdb: Use DLIST_ADD_END() to avoid reversing the list

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 555aa06c41e7f77bf241e04ccf771009645e9c27)

10 years agotools-ctdb: Factor out function read_pnn_node_file()
Martin Schwenke [Mon, 3 Mar 2014 01:57:30 +0000 (12:57 +1100)]
tools-ctdb: Factor out function read_pnn_node_file()

Factor it from read_nodes_file().  Use it there and in
read_natgw_nodes_file().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 91895b33c52c0a81904c3ea36042d4574422f5fd)

10 years agotests: Add "ctdb listnodes" and "ctdb xpnn" stub tests
Martin Schwenke [Mon, 17 Mar 2014 02:42:35 +0000 (13:42 +1100)]
tests: Add "ctdb listnodes" and "ctdb xpnn" stub tests

Tests for xpnn need to implement a stub for ctdb_sys_have_ip().  The
cheapest way of doing this is to read a fake nodemap using the
existing code and check if the IP of the "current" node is the one
being asked about.  However, the fake state initialisation isn't
currently available to without_daemon commands because it is meant to
represent daemon state.  However, it can be made available by moving
the relevant code into a new stub for tevent_context_init().  The stub
still needs to initialise a tevent context - this can be done by
calling a lower level function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 79d28000043bd463beecaeac47855d3a4970eaf2)

10 years agotools-ctdb: Read NAT gateway nodes from a separate function
Martin Schwenke [Mon, 3 Mar 2014 01:45:23 +0000 (12:45 +1100)]
tools-ctdb: Read NAT gateway nodes from a separate function

Now it gets easier to refactor.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9bede494743f0ce13493fe718ed8f0c3c5f2959c)

10 years agotools-ctdb: Add and use function filter_nodemap_by_natgw_nodes()
Martin Schwenke [Mon, 3 Mar 2014 00:41:32 +0000 (11:41 +1100)]
tools-ctdb: Add and use function filter_nodemap_by_natgw_nodes()

Add another filter function, like the ones for capabilities and flags
to, for filtering by NAT gateway nodes.  This makes the main
natgw_list function more readable.

Note that this drops the early filtering of disconnected nodes, so
they will now be listed in a NAT gateway group.  This makes sense.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit ba69742ccd822562ca2135d2466e09bf1216644b)

10 years agotools-ctdb: Update LVS commands to use filter_nodemap_by_flags()
Martin Schwenke [Wed, 19 Feb 2014 07:45:18 +0000 (18:45 +1100)]
tools-ctdb: Update LVS commands to use filter_nodemap_by_flags()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e728a35dc19c397cb17e1bf434401df25c35f337)

10 years agotools-ctdb: Update LVS commands to use filter_nodemap_by_capabilities()
Martin Schwenke [Wed, 19 Feb 2014 06:12:08 +0000 (17:12 +1100)]
tools-ctdb: Update LVS commands to use filter_nodemap_by_capabilities()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 26c9a591e539e33dd0896ec1e2958192b3e4efd4)

10 years agotools-ctdb: Fixes for "lvs" and "lvsmaster" commands
Martin Schwenke [Fri, 28 Feb 2014 09:16:34 +0000 (20:16 +1100)]
tools-ctdb: Fixes for "lvs" and "lvsmaster" commands

The index of the nodes array in nodemap isn't the PNN.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5fb7e386ac4452786512d077a00b4907ef39cb51)

10 years agotools-ctdb: Generalise find_natgw() -> filter_nodemap_by_flags()
Martin Schwenke [Tue, 18 Feb 2014 06:11:53 +0000 (17:11 +1100)]
tools-ctdb: Generalise find_natgw() -> filter_nodemap_by_flags()

Instead of just finding the first node that doesn't have any flags in
flag_mask set, change it into a function that filters a nodemap to
exclude nodes with the given flags.

This makes the NATGW code simpler but also provides a function that
can be used in other code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 33b1fcbd7083668cfd58b1cfb1172b6134cd07ca)

10 years agotools-ctdb: Update natgwlist to filter nodes by NATGW capability
Martin Schwenke [Tue, 18 Feb 2014 04:52:37 +0000 (15:52 +1100)]
tools-ctdb: Update natgwlist to filter nodes by NATGW capability

Check capabilities once to build a filtered node list instead of
repeatedly checking capabilities

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 798bd58370f6ea7bc70db96edd23ae86caf6bf79)

10 years agotests: New natgwlist tests where nodes capability not set
Martin Schwenke [Tue, 18 Feb 2014 01:29:44 +0000 (12:29 +1100)]
tests: New natgwlist tests where nodes capability not set

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 771005386403acf15a81be5de2a3798384a37d8a)

10 years agotests: Update ctdb stub LVS tests and add some new ones
Martin Schwenke [Tue, 18 Feb 2014 01:12:06 +0000 (12:12 +1100)]
tests: Update ctdb stub LVS tests and add some new ones

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 75cf99b9da1677fa83197d111d757e14041dae05)

10 years agotests: Support fake capabilities in CTDB tool stub
Martin Schwenke [Tue, 18 Feb 2014 00:34:11 +0000 (11:34 +1100)]
tests: Support fake capabilities in CTDB tool stub

... and add a test to make sure it works.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 3af858e6f4990599c23b54d05c42187400fd1426)

10 years agotests: Remove old, unused copy of a CTDB tool unit test
Martin Schwenke [Tue, 18 Feb 2014 00:02:49 +0000 (11:02 +1100)]
tests: Remove old, unused copy of a CTDB tool unit test

This looks to have got left behind a long time ago when things got
moved around...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 263e5eabf8d55c7f53db597b8fcede831c211e45)

10 years agotools-ctdb: Don't close stderr when running without_daemon commands
Martin Schwenke [Mon, 17 Mar 2014 02:28:14 +0000 (13:28 +1100)]
tools-ctdb: Don't close stderr when running without_daemon commands

It looks like the original without_daemon code still tried to
establish a client connection to the daemon.  Closing stderr looks to
be a cheap way of hiding the errors when this failed.

However, later cleanups avoid the client connection altogether, so do
not close stderr.  Now debug output from without_daemon commands
actually appears.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a947cf6c0c3e1453ec833033dcd2edaa9490a55b)

10 years agopmda: Fix metric identifiers
David Disseldorp [Thu, 20 Mar 2014 13:23:01 +0000 (14:23 +0100)]
pmda: Fix metric identifiers

The commit "pmda: Use upstream assigned PCP domain id" updated the
Performance Metrics Namespace (pmns) file, without changing the
corresponding metric identifiers used by the agent.

This change fixes the agent metric identifier values to match the pmns
definitions.

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Sat Mar 22 00:07:31 CET 2014 on sn-devel-104

(Imported from commit 7fdb21cc321bc0b9a759393467fe42f26cdc812a)

10 years agovacuum: fix delete list counts in delete_marshall_traverse_first
Michael Adam [Fri, 21 Feb 2014 22:43:17 +0000 (23:43 +0100)]
vacuum: fix delete list counts in delete_marshall_traverse_first

when bumping skipped, decrement left, so the sum is correct

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Mar  6 03:32:33 CET 2014 on sn-devel-104

(Imported from commit d8e110ed7dacda18860cce0c86e4e44f0b83dd42)

10 years agovacuum: fix possible cause for delelete_list processing counts left records > 0
Michael Adam [Wed, 19 Feb 2014 23:58:17 +0000 (00:58 +0100)]
vacuum: fix possible cause for delelete_list processing counts left records > 0

We need to have left records == 0 at the end of the delete list processing.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5b81848e50b43b7ab7889f5217e05ca42e452c8f)

10 years agovacuum: systematize counters into various structs
Michael Adam [Wed, 19 Feb 2014 23:32:08 +0000 (00:32 +0100)]
vacuum: systematize counters into various structs

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 551e9d791c146473b45f8a9fb1574e0ad7cca6b2)

10 years agovacuum: remove unused counter vdata->total
Michael Adam [Wed, 19 Feb 2014 23:29:47 +0000 (00:29 +0100)]
vacuum: remove unused counter vdata->total

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit ec3f33c314207f93529d7c9a2bfb82bf05e3a08b)

10 years agovacuum: make ctdb_process_delete_list() void
Michael Adam [Sun, 16 Feb 2014 00:08:18 +0000 (01:08 +0100)]
vacuum: make ctdb_process_delete_list() void

The overall return code was not really used anyways.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 49eb503c5d3133e1476a15f9d11ce4269407e6c6)

10 years agovacuum: make ctdb_process_vacuum_fetch_lists() void.
Michael Adam [Sat, 15 Feb 2014 23:37:43 +0000 (00:37 +0100)]
vacuum: make ctdb_process_vacuum_fetch_lists() void.

This constantly returns 0 anyways.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 776d4e88f2a6068016dedd37003cdde10f8090a5)

10 years agovacuum: make ctdb_vacuum_traverse_db() void.
Michael Adam [Sat, 15 Feb 2014 23:35:34 +0000 (00:35 +0100)]
vacuum: make ctdb_vacuum_traverse_db() void.

Failure in traversal of the DB should not
prevent further processing.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 19948702992c94553e1a611540ad398de9f9d8b9)

10 years agovacuum: don't stop in process_vacuum_fetch_lists when sending to one node fails.
Michael Adam [Sat, 15 Feb 2014 23:26:00 +0000 (00:26 +0100)]
vacuum: don't stop in process_vacuum_fetch_lists when sending to one node fails.

We should try to continue vacuuming as much as possible.
Failure to send records to one lmaster doesn't mean the
others will fail too.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 7983946680ac0de8f82dfee6f0f849a11653d042)

10 years agovacuum: catch and log errors to traverse the delete list in ctdb_process_delete_list()
Michael Adam [Sat, 15 Feb 2014 17:06:09 +0000 (18:06 +0100)]
vacuum: catch and log errors to traverse the delete list in ctdb_process_delete_list()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit f3483de240987f05cc20f747ac467c8ed81bb03e)

10 years agovacuum: catch and log error of traverse in ctdb_process_delete_queue()
Michael Adam [Sat, 15 Feb 2014 16:59:22 +0000 (17:59 +0100)]
vacuum: catch and log error of traverse in ctdb_process_delete_queue()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 72ea9759930bd29b59c0f831d2cb2f4f1e2e643d)

10 years agovacuum: use tdb_parse_record instead of tdb_fetch in delete_marshall_traverse_first()
Michael Adam [Sat, 15 Feb 2014 12:03:51 +0000 (13:03 +0100)]
vacuum: use tdb_parse_record instead of tdb_fetch in delete_marshall_traverse_first()

Spare malloc and free.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 83fa09e78c6ca8e08cb2659f013a05b4b340f0aa)

10 years agovacuum: simplify delete_marshall_traverse_first: use tdb_null
Michael Adam [Sat, 15 Feb 2014 12:01:33 +0000 (13:01 +0100)]
vacuum: simplify delete_marshall_traverse_first: use tdb_null

we know anyways the record to store is empty at this point.
So skip pointer calculations.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 974aa73160d50b7cf63b4a5e6dd7a7b1408ece42)

10 years agovacuum: remove VacuumLimit criterion for triggering a repack
Michael Adam [Fri, 14 Feb 2014 21:05:21 +0000 (22:05 +0100)]
vacuum: remove VacuumLimit criterion for triggering a repack

With the new vacuuming, we consider it an error if there are
records left for deletion after processing the various lists.
All records that can be deleted should have been deleted by
tdb_delete calls.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 16837bc309aa9a86fc21d7f59a8fce0b947428a3)

10 years agovacuum: treat value 0 of tunable VacuumLimit as turning off repacking
Michael Adam [Wed, 12 Feb 2014 16:41:28 +0000 (17:41 +0100)]
vacuum: treat value 0 of tunable VacuumLimit as turning off repacking

I.e. no number of records found to delete will trigger the
repacking.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6fdd25008f384408dfc103b90ab40b8e64ce18b0)

10 years agovacuum: add consistency check for counts at end of process_delete_list()
Michael Adam [Fri, 14 Feb 2014 21:02:41 +0000 (22:02 +0100)]
vacuum: add consistency check for counts at end of process_delete_list()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1b2fc1f096f80db5974eab021f12f0ad9af24882)

10 years agovacuum: log error if records are left for deletion after ctdb_process_delete_list()
Michael Adam [Fri, 14 Feb 2014 21:01:38 +0000 (22:01 +0100)]
vacuum: log error if records are left for deletion after ctdb_process_delete_list()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit cf407d208afdc70b31ce5013591c869b36f588f1)

10 years agovacuum: use tdb_parse_record instead of tdb_fetch in delete_record_traverse()
Michael Adam [Fri, 14 Feb 2014 20:50:59 +0000 (21:50 +0100)]
vacuum: use tdb_parse_record instead of tdb_fetch in delete_record_traverse()

Spare malloc and free.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 23be632449524e74d73fcb6fd3875a6d5a428d89)

10 years agovacuum: update comment for ctdb_process_delete_queue
Michael Adam [Fri, 14 Feb 2014 17:48:02 +0000 (18:48 +0100)]
vacuum: update comment for ctdb_process_delete_queue

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit be2f1a0c790d08571dd757fb7b2941d367175008)

10 years agovacuum: rename ctdb_vacuum_db_fast --> ctdb_process_delete_queue
Michael Adam [Fri, 14 Feb 2014 17:47:25 +0000 (18:47 +0100)]
vacuum: rename ctdb_vacuum_db_fast --> ctdb_process_delete_queue

This describes more precisely what this does.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit cd877b6a21a5080b3d9ae8ee7ac8cf27c4fd9512)

10 years agovacuum: update comment for ctdb_vacuum_traverse_db
Michael Adam [Fri, 14 Feb 2014 17:46:49 +0000 (18:46 +0100)]
vacuum: update comment for ctdb_vacuum_traverse_db

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c4478bd40daee22bbe5696bb17d0a1bff164c7f7)

10 years agovacuum: rename ctdb_vacuum_db_full --> ctdb_vacuum_traverse_db
Michael Adam [Fri, 14 Feb 2014 17:42:37 +0000 (18:42 +0100)]
vacuum: rename ctdb_vacuum_db_full --> ctdb_vacuum_traverse_db

This describes more precisely what it actually is, nowadays.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 0309d5eda4f28c7e99e93e0da6c02757ea0cae8f)

10 years agovacuum: change full db traverse vacuuming to fill delete queue
Michael Adam [Fri, 14 Feb 2014 17:38:31 +0000 (18:38 +0100)]
vacuum: change full db traverse vacuuming to fill delete queue

This lets the "fast vacuum" delete queue traverse do the actual work.

On the positive side, we note that this lets the "full vacuuming"
treat the records that have never been migrated with data correctly.
These had previously been added to the delete list for complicated
cross-node deletion instead of directly deleting them.

On the other hand side, there might be a slight overhead
since the records are read again in the delete queu traverse,
but this is OK because this change is in preparation of
untangling the db traverse altogether from the vacuum run,
making it independent.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit ba49deb2344c0a9a8f76c9fd0136bdeadad6af89)

10 years agovacuum: run the fast vacuum after the db traverse
Michael Adam [Fri, 14 Feb 2014 17:27:14 +0000 (18:27 +0100)]
vacuum: run the fast vacuum after the db traverse

This in preparation of modifying the db traverse to
fill the delete_queue that is processed by the fast
vacuum run, instead of filling the same lists as the
fast vacuum run for further processing.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit d0b7b3882511769b1bfc1d0d4fdc0dba288e6ccd)

10 years agovacuum: rename private->private_data in repack_traverse
Michael Adam [Fri, 14 Feb 2014 17:08:20 +0000 (18:08 +0100)]
vacuum: rename private->private_data in repack_traverse

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 04c2115b606f2346fc7315a503b3dae0189e0737)

10 years agovacuum: rename private->private_data in vacuum_traverse
Michael Adam [Fri, 14 Feb 2014 17:07:55 +0000 (18:07 +0100)]
vacuum: rename private->private_data in vacuum_traverse

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 951efa1097a113910c06ce78d1c9fb70e3f4d75e)

10 years agovacuum: extract check for full vacuum run out of ctdb_vacuum_db_full()
Michael Adam [Fri, 14 Feb 2014 17:03:02 +0000 (18:03 +0100)]
vacuum: extract check for full vacuum run out of ctdb_vacuum_db_full()

This is more consistent.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 01f359cafccb5ae3bea312d628dad92746520527)

10 years agovacuum: add consistency check for counts to ctdb_vacuum_db_fast()
Michael Adam [Fri, 14 Feb 2014 16:58:01 +0000 (17:58 +0100)]
vacuum: add consistency check for counts to ctdb_vacuum_db_fast()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c88fd19714b98769887dbff59d8c1d077cf351d5)

10 years agovacuum: use tdb_parse_record instead of tdb_fetch in delete_queue_traverse()
Michael Adam [Fri, 14 Feb 2014 14:28:22 +0000 (15:28 +0100)]
vacuum: use tdb_parse_record instead of tdb_fetch in delete_queue_traverse()

this spares malloc and free

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5d5907c7cf09567e73092578917624c8789c7471)

10 years agovacuum: simplify delete_record_traverse() - free treats NULL
Michael Adam [Fri, 14 Feb 2014 14:35:01 +0000 (15:35 +0100)]
vacuum: simplify delete_record_traverse() - free treats NULL

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fe68b3c4942a4660c9b35c6316856644c32f5631)

10 years agovacuum: simplify delete_queue_traverse() - free treats NULL pointers.
Michael Adam [Fri, 14 Feb 2014 14:34:23 +0000 (15:34 +0100)]
vacuum: simplify delete_queue_traverse() - free treats NULL pointers.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 593bddf2e82fcb9666449c40625b972ff9c7961c)

10 years agovacuum: reduce indentation in delete_queue_traverse
Michael Adam [Fri, 14 Feb 2014 14:30:08 +0000 (15:30 +0100)]
vacuum: reduce indentation in delete_queue_traverse

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 24bec3d31b16c4c83b5ed76ecffccbfda53858fd)

10 years agovacuum: treat value 0 of tunable RepackLimit as turned off.
Michael Adam [Wed, 12 Feb 2014 16:40:31 +0000 (17:40 +0100)]
vacuum: treat value 0 of tunable RepackLimit as turned off.

I.e. when RepackLimit is set to 0, no size of the freelist
should trigger a repack in vacuuming.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 48f2d1158820bfb063ba0a0bbfb6f496a8e7522d)

10 years agovacuum: fix treatment of remaining records and statistics in delete_record_traverse()
Michael Adam [Fri, 14 Feb 2014 00:55:39 +0000 (01:55 +0100)]
vacuum: fix treatment of remaining records and statistics in delete_record_traverse()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit af5568b26761dadbb652d92f8c8ced477b38c7cc)

10 years agovacuum: cast freelist_size in comparison.
Michael Adam [Wed, 12 Feb 2014 16:38:56 +0000 (17:38 +0100)]
vacuum: cast freelist_size in comparison.

At this point, it is >= 0 anyways.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b4e0b01a8c8415bec9c7dbbe4494813917dddfe5)

10 years agovacuum: improve output of delete list statistics
Michael Adam [Thu, 13 Feb 2014 23:53:23 +0000 (00:53 +0100)]
vacuum: improve output of delete list statistics

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6a46a255307a070c887525ee1d79810ba12442bb)

10 years agodaemon: Do not support connection tracking if there are no public IPs
Amitay Isaacs [Tue, 11 Feb 2014 07:07:08 +0000 (18:07 +1100)]
daemon: Do not support connection tracking if there are no public IPs

CTDB tracks connections to be able to send tickle ACKs and gratuitous
ARPs.  When there are no public IPs, there is no need for tickle ACKs
and gratuitous ARPs.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar  4 03:01:38 CET 2014 on sn-devel-104

(Imported from commit fb2631f5dfd3ec58fd277dbe155afab58f882202)

10 years agoutil: Do not use mlockall() on AIX
Amitay Isaacs [Tue, 11 Feb 2014 06:57:42 +0000 (17:57 +1100)]
util: Do not use mlockall() on AIX

Memory lockdown causes recovery daemon to crash on AIX.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit af3a168ed3b0dcac4086d2d90bfdef65590b68dc)

10 years agobuild: AIX does not have working C99 vsnprintf, requires libreplace
Amitay Isaacs [Thu, 6 Feb 2014 05:32:42 +0000 (16:32 +1100)]
build: AIX does not have working C99 vsnprintf, requires libreplace

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 44520dcefc226ff1a93f77c8c7cf79d1c5244c3a)

10 years agobuild: Remove auto-generated header file in distclean
Amitay Isaacs [Thu, 6 Feb 2014 05:27:09 +0000 (16:27 +1100)]
build: Remove auto-generated header file in distclean

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 96203d9126d77c45ee53e6b536720863851a42aa)

10 years agorecoverd: Check if callback function is registered before calling
Amitay Isaacs [Thu, 27 Feb 2014 01:41:23 +0000 (12:41 +1100)]
recoverd: Check if callback function is registered before calling

Fix suggested by by Kevin Osborn <kosborn@overlandstorage.com>.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 27 13:54:59 CET 2014 on sn-devel-104

(Imported from commit 7d05baa96b5c49629803a98ec8160d2c5c51c839)

10 years agodaemon: After updating tickles on other nodes, set update flag to false
Amitay Isaacs [Wed, 29 Jan 2014 04:54:35 +0000 (15:54 +1100)]
daemon: After updating tickles on other nodes, set update flag to false

tcp_update_flag is set to true whenever tickles are added or deleted.
This flag is used to determine whether or not to send tickles list to
other nodes.  Once tickles list is sent to other nodes successfully,
set tcp_update_flag to false, so ctdbd does not keep sending same tickles
list every TickleUpdateInterval (20 seconds).

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 026996550d726836091ff5ebd1ebf925bf237bb0)

10 years agodaemon: Implement ctdb_control_startup()
Martin Schwenke [Thu, 27 Feb 2014 02:47:28 +0000 (13:47 +1100)]
daemon: Implement ctdb_control_startup()

This doesn't implement what was recommended.  That would require
careful error handling, probably with a fallback to this code anyway.
This is simple and does no worse that the current code.  That is, the
new node is updated on the next call to tdb_update_tcp_tickles().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 0723fedcedd4a97870f7b1224945f1587363c9bf)

10 years agodaemon: Fix whitespaces
Amitay Isaacs [Wed, 22 Jan 2014 04:00:48 +0000 (15:00 +1100)]
daemon: Fix whitespaces

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 75ca1216a63d0a404466bfb94a1fba1e478b80c6)

10 years agodaemon: Always talloc tickle array off vnn instead of ctdb->nodes
Amitay Isaacs [Wed, 22 Jan 2014 04:00:33 +0000 (15:00 +1100)]
daemon: Always talloc tickle array off vnn instead of ctdb->nodes

This fixes ctdb crash reported in bug #10366.
Fix suggested by Kevin Osborn <kosborn@overlandstorage.com>.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit f2cd999189ee841fe81115e0873ab5e6a3fc265d)

10 years agoscripts: Enhancements to hung script debugging
Martin Schwenke [Fri, 7 Feb 2014 06:37:00 +0000 (17:37 +1100)]
scripts: Enhancements to hung script debugging

* Add stack dumps for "interesting" processes that sometimes get
  stuck, so try to print stack traces for them if they appear in the
  pstree output.

* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
  CTDB_DEBUG_HUNG_SCRIPT_STACKPAT.  These are primarily for testing
  but the latter may be useful for live debugging.

* Load CTDB configuration so that above configuration variables can be
  set/changed without restarting ctdbd.

Add a test that tries to ensure that all of this is working.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 2532149f8f9bbe6d3c8f5ac6e5e4bc2ad1681e27)

10 years agoeventscripts: Switch on dumping of stuck nfsd threads
Martin Schwenke [Thu, 20 Feb 2014 04:20:44 +0000 (15:20 +1100)]
eventscripts: Switch on dumping of stuck nfsd threads

This feature was added quite a while ago but was not enabled by
default.  It is a useful feature so enable it to dump stack traces of
up to 5 stuck processes by default.

This can be disabled by setting:

  CTDB_NFS_DUMP_STUCK_THREADS=0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104

(Imported from commit fcf846a795085d24468548165d92762a628ef54d)

10 years agovacuum: move retrieval of freelist to after vacuum run
Michael Adam [Mon, 10 Feb 2014 01:44:56 +0000 (02:44 +0100)]
vacuum: move retrieval of freelist to after vacuum run

The fast vacuum run may have increased the freelist size.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 14 03:15:30 CET 2014 on sn-devel-104

(Imported from commit 0535f73c3abdcd77cb3f5e9f81641fa2a4e1764b)

10 years agovacuum: fix debug message typo in add_record_to_delete_list()
Michael Adam [Thu, 13 Feb 2014 15:44:04 +0000 (16:44 +0100)]
vacuum: fix debug message typo in add_record_to_delete_list()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit bd474985b1db572cb08eff39b25ecae2b9d0dea8)

10 years agotests: Handle interactions with monitor events
Martin Schwenke [Wed, 12 Feb 2014 04:33:19 +0000 (15:33 +1100)]
tests: Handle interactions with monitor events

In the first case, reconfiguration can longer happen in a monitor
event, so this is no longer a problem.  Drop it.

Running a monitor event by hand no longer cancels the existing monitor
event.  Instead the hand-run event fails.  So do this differently and
just wait for a monitor event before continuing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104

(Imported from commit a9ccdec008ebcb1b286eede4f43167e3e4d4cbe0)

10 years agorecoverd: Fix a bug in the LCP2 rebalancing code
Martin Schwenke [Fri, 7 Feb 2014 06:19:20 +0000 (17:19 +1100)]
recoverd: Fix a bug in the LCP2 rebalancing code

srcimbl gets changed on every iteration of the loop.  The value that
should be stored for the new imbalance of the source node is
minsrcimbl.

To help diagnose this, added some extra debug that can be left in.

The extra debug changes the output of a couple of tests.  Note that
the resulting IP allocations in those tests is unchanged - only the
debug output is changed.

Also add some new tests that illustrates the bug.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit f1a20d748f6ab4702be5b17047a3fbfa0f3e8d0c)

10 years agotests: New test to ensure "ctdb reloadips" manipulates IPs correctly
Martin Schwenke [Tue, 11 Feb 2014 22:49:11 +0000 (09:49 +1100)]
tests: New test to ensure "ctdb reloadips" manipulates IPs correctly

This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps.  First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct.  Then about 1/2
of the IPs and deleted and remaining IPs are checked.  Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 50fc53d7f11a3c28fd4ef5318d90f842bbc0f19c)

10 years agodaemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug script
Amitay Isaacs [Tue, 11 Feb 2014 06:29:26 +0000 (17:29 +1100)]
daemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug script

If CTDB_DEUB_HUNG_SCRIPT is set, use that instead of the default
debug script.  This code was dropped by mistake in commit
18c1f432102f1a5093927be9276d001180539e50.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Feb 12 08:47:47 CET 2014 on sn-devel-104

(Imported from commit 276b233c0090d51b59dbe06ae66a14ee09cbb4c2)

10 years agoeventscripts: Create extra files for ganesha recovery
Srikrishan Malik [Mon, 10 Feb 2014 05:49:08 +0000 (11:19 +0530)]
eventscripts: Create extra files for ganesha recovery

This adds new files for Ganesha's recovery.  myreleaseip_* are used by
the recovery thread on the node where IP is released. The releaseip_*
and tekeip_* files are used by recovery thread where IP is taken over.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 9a2a5a2f7c7d3d6b4c03bb97e134ca0452a83bb8)

10 years agoeventscripts: Run mmlsconfig only once and use cached results
Srikrishan Malik [Mon, 10 Feb 2014 05:40:48 +0000 (11:10 +0530)]
eventscripts: Run mmlsconfig only once and use cached results

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 6b378f2f76e433023e57dd78bc3f98e0ef1f34f1)

10 years agodoc: Update NEWS ctdb-2.5.2
Amitay Isaacs [Fri, 31 Jan 2014 07:30:56 +0000 (18:30 +1100)]
doc: Update NEWS

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agodoc: Fix usage string for ctdb readkey/writekey
Amitay Isaacs [Fri, 31 Jan 2014 01:46:21 +0000 (12:46 +1100)]
doc: Fix usage string for ctdb readkey/writekey

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104

(Imported from commit 35eb6cb521d54708f0bbba515f645327846b4e70)

10 years agodaemon: Return negative status only if there are known errors
Amitay Isaacs [Thu, 23 Jan 2014 03:57:53 +0000 (14:57 +1100)]
daemon: Return negative status only if there are known errors

If event script does not exist or does not have execute permissions, then
return negative errno to distinguish from the exit errors of event script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1566790e5a738f12db1dfb519589c1842d74b8e5)

10 years agotests/eventscripts: Avoid errors on broken pipe
Martin Schwenke [Tue, 28 Jan 2014 03:34:15 +0000 (14:34 +1100)]
tests/eventscripts: Avoid errors on broken pipe

ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104

(Imported from commit b112a3317cbedc73a6e17b3f711fec84f0d41d4e)

10 years agotests/eventscripts: Improve ip command stub secondary handling
Martin Schwenke [Tue, 28 Jan 2014 05:07:53 +0000 (16:07 +1100)]
tests/eventscripts: Improve ip command stub secondary handling

It should support primary and secondaries per network instead of per
interface.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1640f36d5831b2575d117fac335f3324ceefa9f8)

10 years agodaemon: reloadips must register state of asynchronous controls
Martin Schwenke [Wed, 22 Jan 2014 05:02:46 +0000 (16:02 +1100)]
daemon: reloadips must register state of asynchronous controls

Otherwise ctdb_client_async_wait() is a no-op.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e5778cc172eb9fab6382f1c600326f6cc99b9162)

10 years agotests: in the stub "ip link show" command use echo instead of cat
Michael Adam [Wed, 27 Nov 2013 22:43:53 +0000 (23:43 +0100)]
tests: in the stub "ip link show" command use echo instead of cat

This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104

(Imported from commit e2db9c524f40f8771ae19b2be47a56f7a9d887af)

10 years agotest: remove unused ip2ipmask from integration.bash
Michael Adam [Wed, 27 Nov 2013 21:28:06 +0000 (22:28 +0100)]
test: remove unused ip2ipmask from integration.bash

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fd5e8905a09875d13ef109133edd361a82cf8e1e)

10 years agotests:76_ctdb_pdb_recovery: change from using ctdb pstore to ctdb ptrans.
Michael Adam [Wed, 27 Nov 2013 10:42:28 +0000 (11:42 +0100)]
tests:76_ctdb_pdb_recovery: change from using ctdb pstore to ctdb ptrans.

This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e281cfa8db4a2506f9016718373cdc80f4aa9c1f)

10 years agotests:76_ctdb_pdb_recovery: fix a typo in a message
Michael Adam [Wed, 27 Nov 2013 22:28:24 +0000 (23:28 +0100)]
tests:76_ctdb_pdb_recovery: fix a typo in a message

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 30dead171f82b5da31cbcbab88eaa70a896d9c55)

10 years agotests:76_ctdb_pdb_recovery: fix a typo in a message
Michael Adam [Wed, 27 Nov 2013 10:40:53 +0000 (11:40 +0100)]
tests:76_ctdb_pdb_recovery: fix a typo in a message

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 3e083f96ff02cbf419513e16a200e8d4d0c2c227)

10 years agotests: in the stub ip command, avoid broken pipe by using echo instead of cat
Michael Adam [Wed, 27 Nov 2013 11:13:40 +0000 (12:13 +0100)]
tests: in the stub ip command, avoid broken pipe by using echo instead of cat

This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 70f469e05e279e29980df2af10dd89c53001b236)

10 years agotests/integration: Update NFS tickles test and supporting code
Martin Schwenke [Thu, 28 Nov 2013 05:43:55 +0000 (16:43 +1100)]
tests/integration: Update NFS tickles test and supporting code

This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred.  This code is horrible and fragile.

A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.

Also remove the old code that checks for tickle information in shared
storage.  CTDB hasn't done things this way for a long time.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit ef0e8cc1928dbd12c862a5e96710471ce3b4d023)

10 years agoeventscripts: Do not mark node unhealthy if no fs is available
Srikrishan Malik [Fri, 13 Dec 2013 07:35:53 +0000 (13:05 +0530)]
eventscripts: Do not mark node unhealthy if no fs is available

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104

(Imported from commit 164ee000df2a3ffc91690c60d08e4ea7ff1a33f2)

10 years agodaemon: Simplify listing event scripts using scandir
Amitay Isaacs [Thu, 16 Jan 2014 02:05:58 +0000 (13:05 +1100)]
daemon: Simplify listing event scripts using scandir

Instead of using RB tree for sorting the script names (incorrectly since
it's only using the leading numbers in the script name), use scandir
with alphasort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104

(Imported from commit eee450fec2f7cb5f45c47162fd5b7c0717978598)

10 years agodaemon: Do not run monitor event if any other event is already running
Amitay Isaacs [Thu, 19 Dec 2013 02:01:25 +0000 (13:01 +1100)]
daemon: Do not run monitor event if any other event is already running

Any currently running monitor events are cancelled if any other events
are scheduled.  However, this does not stop monitor events to be run
when other events are already running.

Keep track of the number of active events and schedule monitor event
only if there are no active events.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit cbffbb7c2f406fc1d8ebad3c531cc2757232690e)

10 years agoeventscripts: Move all eventscript state under $CTDB_VARDIR/state
Martin Schwenke [Wed, 18 Dec 2013 06:08:55 +0000 (17:08 +1100)]
eventscripts: Move all eventscript state under $CTDB_VARDIR/state

Services can be flagged for reconfigure when they release IPs at
shutdown.  The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.

$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event.  Just push the service
state subdirectories down a level and put everything else in a
subdirectory.

This way all the eventscript state gets cleaned up every time CTDB
starts up.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104

(Imported from commit b7bfe46636d07c71f83daff884ec339c9b4aee72)

10 years agodaemon: Untangle serialisation of 1st recovery -> startup -> monitor
Martin Schwenke [Wed, 18 Dec 2013 04:37:11 +0000 (15:37 +1100)]
daemon: Untangle serialisation of 1st recovery -> startup -> monitor

At the moment ctdb_check_healthy() is overloaded to wait until the
first recovery is complete, handle the "startup" event and also
actually handle monitoring.  This is untidy and hard to follow.

Instead, have the daemon explicitly wait for 1st recovery after the
"setup" event.  When first recovery is complete, schedule a function
to handle the "startup" event.  When the "startup" event succeeds then
explicitly enable monitoring.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e6304d1e1adc86fc9c1199feb7b4802614fbc70f)

10 years agoeventscripts: Print a count if killing TCP connections times out
Martin Schwenke [Mon, 13 Jan 2014 05:34:50 +0000 (16:34 +1100)]
eventscripts: Print a count if killing TCP connections times out

Also update related test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 50e00b3e5224d53df0f3cc882e71737f928e01cd)

10 years agoeventscripts: Reconfigure lock should be released quickly
Martin Schwenke [Wed, 18 Dec 2013 02:51:22 +0000 (13:51 +1100)]
eventscripts: Reconfigure lock should be released quickly

Currently the lock is held until the corresponding eventscript
completes, since the process still exists.  If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time.  The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held.  This can cause an unwanted monitor replay.

Change this so that the lock is released immediately after the
reconfiguration is complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8eb20c23476d390bb8a12ba01c9f06e7ac4a1453)

10 years agorecoverd: Do not refuse disabling takeover runs on inactive nodes
Martin Schwenke [Wed, 18 Dec 2013 08:15:39 +0000 (19:15 +1100)]
recoverd: Do not refuse disabling takeover runs on inactive nodes

Failure might be expected when disabling takeover runs on banned
nodes, since they might be suffering from performance problems or
similar.  More broadly, administrators who reconfigure a cluster that
isn't in a happy state aren't necessarily doing something sensible.

However, allowing takeover runs to be disabled on inactive nodes stops
reconfiguration of stopped nodes.  This is probaby an unreasonable
limitation, so drop it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e77d5f99e396d71c1d354b3f8dc5ddf9ba5c5ee9)