ctdb.git
10 years agovacuum: rename private->private_data in vacuum_traverse
Michael Adam [Fri, 14 Feb 2014 17:07:55 +0000 (18:07 +0100)]
vacuum: rename private->private_data in vacuum_traverse

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 951efa1097a113910c06ce78d1c9fb70e3f4d75e)

10 years agovacuum: extract check for full vacuum run out of ctdb_vacuum_db_full()
Michael Adam [Fri, 14 Feb 2014 17:03:02 +0000 (18:03 +0100)]
vacuum: extract check for full vacuum run out of ctdb_vacuum_db_full()

This is more consistent.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 01f359cafccb5ae3bea312d628dad92746520527)

10 years agovacuum: add consistency check for counts to ctdb_vacuum_db_fast()
Michael Adam [Fri, 14 Feb 2014 16:58:01 +0000 (17:58 +0100)]
vacuum: add consistency check for counts to ctdb_vacuum_db_fast()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c88fd19714b98769887dbff59d8c1d077cf351d5)

10 years agovacuum: use tdb_parse_record instead of tdb_fetch in delete_queue_traverse()
Michael Adam [Fri, 14 Feb 2014 14:28:22 +0000 (15:28 +0100)]
vacuum: use tdb_parse_record instead of tdb_fetch in delete_queue_traverse()

this spares malloc and free

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5d5907c7cf09567e73092578917624c8789c7471)

10 years agovacuum: simplify delete_record_traverse() - free treats NULL
Michael Adam [Fri, 14 Feb 2014 14:35:01 +0000 (15:35 +0100)]
vacuum: simplify delete_record_traverse() - free treats NULL

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fe68b3c4942a4660c9b35c6316856644c32f5631)

10 years agovacuum: simplify delete_queue_traverse() - free treats NULL pointers.
Michael Adam [Fri, 14 Feb 2014 14:34:23 +0000 (15:34 +0100)]
vacuum: simplify delete_queue_traverse() - free treats NULL pointers.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 593bddf2e82fcb9666449c40625b972ff9c7961c)

10 years agovacuum: reduce indentation in delete_queue_traverse
Michael Adam [Fri, 14 Feb 2014 14:30:08 +0000 (15:30 +0100)]
vacuum: reduce indentation in delete_queue_traverse

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 24bec3d31b16c4c83b5ed76ecffccbfda53858fd)

10 years agovacuum: treat value 0 of tunable RepackLimit as turned off.
Michael Adam [Wed, 12 Feb 2014 16:40:31 +0000 (17:40 +0100)]
vacuum: treat value 0 of tunable RepackLimit as turned off.

I.e. when RepackLimit is set to 0, no size of the freelist
should trigger a repack in vacuuming.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 48f2d1158820bfb063ba0a0bbfb6f496a8e7522d)

10 years agovacuum: fix treatment of remaining records and statistics in delete_record_traverse()
Michael Adam [Fri, 14 Feb 2014 00:55:39 +0000 (01:55 +0100)]
vacuum: fix treatment of remaining records and statistics in delete_record_traverse()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit af5568b26761dadbb652d92f8c8ced477b38c7cc)

10 years agovacuum: cast freelist_size in comparison.
Michael Adam [Wed, 12 Feb 2014 16:38:56 +0000 (17:38 +0100)]
vacuum: cast freelist_size in comparison.

At this point, it is >= 0 anyways.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b4e0b01a8c8415bec9c7dbbe4494813917dddfe5)

10 years agovacuum: improve output of delete list statistics
Michael Adam [Thu, 13 Feb 2014 23:53:23 +0000 (00:53 +0100)]
vacuum: improve output of delete list statistics

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6a46a255307a070c887525ee1d79810ba12442bb)

10 years agodaemon: Do not support connection tracking if there are no public IPs
Amitay Isaacs [Tue, 11 Feb 2014 07:07:08 +0000 (18:07 +1100)]
daemon: Do not support connection tracking if there are no public IPs

CTDB tracks connections to be able to send tickle ACKs and gratuitous
ARPs.  When there are no public IPs, there is no need for tickle ACKs
and gratuitous ARPs.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar  4 03:01:38 CET 2014 on sn-devel-104

(Imported from commit fb2631f5dfd3ec58fd277dbe155afab58f882202)

10 years agoutil: Do not use mlockall() on AIX
Amitay Isaacs [Tue, 11 Feb 2014 06:57:42 +0000 (17:57 +1100)]
util: Do not use mlockall() on AIX

Memory lockdown causes recovery daemon to crash on AIX.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit af3a168ed3b0dcac4086d2d90bfdef65590b68dc)

10 years agobuild: AIX does not have working C99 vsnprintf, requires libreplace
Amitay Isaacs [Thu, 6 Feb 2014 05:32:42 +0000 (16:32 +1100)]
build: AIX does not have working C99 vsnprintf, requires libreplace

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 44520dcefc226ff1a93f77c8c7cf79d1c5244c3a)

10 years agobuild: Remove auto-generated header file in distclean
Amitay Isaacs [Thu, 6 Feb 2014 05:27:09 +0000 (16:27 +1100)]
build: Remove auto-generated header file in distclean

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 96203d9126d77c45ee53e6b536720863851a42aa)

10 years agorecoverd: Check if callback function is registered before calling
Amitay Isaacs [Thu, 27 Feb 2014 01:41:23 +0000 (12:41 +1100)]
recoverd: Check if callback function is registered before calling

Fix suggested by by Kevin Osborn <kosborn@overlandstorage.com>.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 27 13:54:59 CET 2014 on sn-devel-104

(Imported from commit 7d05baa96b5c49629803a98ec8160d2c5c51c839)

10 years agodaemon: After updating tickles on other nodes, set update flag to false
Amitay Isaacs [Wed, 29 Jan 2014 04:54:35 +0000 (15:54 +1100)]
daemon: After updating tickles on other nodes, set update flag to false

tcp_update_flag is set to true whenever tickles are added or deleted.
This flag is used to determine whether or not to send tickles list to
other nodes.  Once tickles list is sent to other nodes successfully,
set tcp_update_flag to false, so ctdbd does not keep sending same tickles
list every TickleUpdateInterval (20 seconds).

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 026996550d726836091ff5ebd1ebf925bf237bb0)

10 years agodaemon: Implement ctdb_control_startup()
Martin Schwenke [Thu, 27 Feb 2014 02:47:28 +0000 (13:47 +1100)]
daemon: Implement ctdb_control_startup()

This doesn't implement what was recommended.  That would require
careful error handling, probably with a fallback to this code anyway.
This is simple and does no worse that the current code.  That is, the
new node is updated on the next call to tdb_update_tcp_tickles().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 0723fedcedd4a97870f7b1224945f1587363c9bf)

10 years agodaemon: Fix whitespaces
Amitay Isaacs [Wed, 22 Jan 2014 04:00:48 +0000 (15:00 +1100)]
daemon: Fix whitespaces

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 75ca1216a63d0a404466bfb94a1fba1e478b80c6)

10 years agodaemon: Always talloc tickle array off vnn instead of ctdb->nodes
Amitay Isaacs [Wed, 22 Jan 2014 04:00:33 +0000 (15:00 +1100)]
daemon: Always talloc tickle array off vnn instead of ctdb->nodes

This fixes ctdb crash reported in bug #10366.
Fix suggested by Kevin Osborn <kosborn@overlandstorage.com>.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit f2cd999189ee841fe81115e0873ab5e6a3fc265d)

10 years agoscripts: Enhancements to hung script debugging
Martin Schwenke [Fri, 7 Feb 2014 06:37:00 +0000 (17:37 +1100)]
scripts: Enhancements to hung script debugging

* Add stack dumps for "interesting" processes that sometimes get
  stuck, so try to print stack traces for them if they appear in the
  pstree output.

* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
  CTDB_DEBUG_HUNG_SCRIPT_STACKPAT.  These are primarily for testing
  but the latter may be useful for live debugging.

* Load CTDB configuration so that above configuration variables can be
  set/changed without restarting ctdbd.

Add a test that tries to ensure that all of this is working.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 2532149f8f9bbe6d3c8f5ac6e5e4bc2ad1681e27)

10 years agoeventscripts: Switch on dumping of stuck nfsd threads
Martin Schwenke [Thu, 20 Feb 2014 04:20:44 +0000 (15:20 +1100)]
eventscripts: Switch on dumping of stuck nfsd threads

This feature was added quite a while ago but was not enabled by
default.  It is a useful feature so enable it to dump stack traces of
up to 5 stuck processes by default.

This can be disabled by setting:

  CTDB_NFS_DUMP_STUCK_THREADS=0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104

(Imported from commit fcf846a795085d24468548165d92762a628ef54d)

10 years agovacuum: move retrieval of freelist to after vacuum run
Michael Adam [Mon, 10 Feb 2014 01:44:56 +0000 (02:44 +0100)]
vacuum: move retrieval of freelist to after vacuum run

The fast vacuum run may have increased the freelist size.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 14 03:15:30 CET 2014 on sn-devel-104

(Imported from commit 0535f73c3abdcd77cb3f5e9f81641fa2a4e1764b)

10 years agovacuum: fix debug message typo in add_record_to_delete_list()
Michael Adam [Thu, 13 Feb 2014 15:44:04 +0000 (16:44 +0100)]
vacuum: fix debug message typo in add_record_to_delete_list()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit bd474985b1db572cb08eff39b25ecae2b9d0dea8)

10 years agotests: Handle interactions with monitor events
Martin Schwenke [Wed, 12 Feb 2014 04:33:19 +0000 (15:33 +1100)]
tests: Handle interactions with monitor events

In the first case, reconfiguration can longer happen in a monitor
event, so this is no longer a problem.  Drop it.

Running a monitor event by hand no longer cancels the existing monitor
event.  Instead the hand-run event fails.  So do this differently and
just wait for a monitor event before continuing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104

(Imported from commit a9ccdec008ebcb1b286eede4f43167e3e4d4cbe0)

10 years agorecoverd: Fix a bug in the LCP2 rebalancing code
Martin Schwenke [Fri, 7 Feb 2014 06:19:20 +0000 (17:19 +1100)]
recoverd: Fix a bug in the LCP2 rebalancing code

srcimbl gets changed on every iteration of the loop.  The value that
should be stored for the new imbalance of the source node is
minsrcimbl.

To help diagnose this, added some extra debug that can be left in.

The extra debug changes the output of a couple of tests.  Note that
the resulting IP allocations in those tests is unchanged - only the
debug output is changed.

Also add some new tests that illustrates the bug.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit f1a20d748f6ab4702be5b17047a3fbfa0f3e8d0c)

10 years agotests: New test to ensure "ctdb reloadips" manipulates IPs correctly
Martin Schwenke [Tue, 11 Feb 2014 22:49:11 +0000 (09:49 +1100)]
tests: New test to ensure "ctdb reloadips" manipulates IPs correctly

This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps.  First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct.  Then about 1/2
of the IPs and deleted and remaining IPs are checked.  Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 50fc53d7f11a3c28fd4ef5318d90f842bbc0f19c)

10 years agodaemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug script
Amitay Isaacs [Tue, 11 Feb 2014 06:29:26 +0000 (17:29 +1100)]
daemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug script

If CTDB_DEUB_HUNG_SCRIPT is set, use that instead of the default
debug script.  This code was dropped by mistake in commit
18c1f432102f1a5093927be9276d001180539e50.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Feb 12 08:47:47 CET 2014 on sn-devel-104

(Imported from commit 276b233c0090d51b59dbe06ae66a14ee09cbb4c2)

10 years agoeventscripts: Create extra files for ganesha recovery
Srikrishan Malik [Mon, 10 Feb 2014 05:49:08 +0000 (11:19 +0530)]
eventscripts: Create extra files for ganesha recovery

This adds new files for Ganesha's recovery.  myreleaseip_* are used by
the recovery thread on the node where IP is released. The releaseip_*
and tekeip_* files are used by recovery thread where IP is taken over.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 9a2a5a2f7c7d3d6b4c03bb97e134ca0452a83bb8)

10 years agoeventscripts: Run mmlsconfig only once and use cached results
Srikrishan Malik [Mon, 10 Feb 2014 05:40:48 +0000 (11:10 +0530)]
eventscripts: Run mmlsconfig only once and use cached results

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 6b378f2f76e433023e57dd78bc3f98e0ef1f34f1)

10 years agodoc: Update NEWS ctdb-2.5.2
Amitay Isaacs [Fri, 31 Jan 2014 07:30:56 +0000 (18:30 +1100)]
doc: Update NEWS

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agodoc: Fix usage string for ctdb readkey/writekey
Amitay Isaacs [Fri, 31 Jan 2014 01:46:21 +0000 (12:46 +1100)]
doc: Fix usage string for ctdb readkey/writekey

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104

(Imported from commit 35eb6cb521d54708f0bbba515f645327846b4e70)

10 years agodaemon: Return negative status only if there are known errors
Amitay Isaacs [Thu, 23 Jan 2014 03:57:53 +0000 (14:57 +1100)]
daemon: Return negative status only if there are known errors

If event script does not exist or does not have execute permissions, then
return negative errno to distinguish from the exit errors of event script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1566790e5a738f12db1dfb519589c1842d74b8e5)

10 years agotests/eventscripts: Avoid errors on broken pipe
Martin Schwenke [Tue, 28 Jan 2014 03:34:15 +0000 (14:34 +1100)]
tests/eventscripts: Avoid errors on broken pipe

ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104

(Imported from commit b112a3317cbedc73a6e17b3f711fec84f0d41d4e)

10 years agotests/eventscripts: Improve ip command stub secondary handling
Martin Schwenke [Tue, 28 Jan 2014 05:07:53 +0000 (16:07 +1100)]
tests/eventscripts: Improve ip command stub secondary handling

It should support primary and secondaries per network instead of per
interface.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1640f36d5831b2575d117fac335f3324ceefa9f8)

10 years agodaemon: reloadips must register state of asynchronous controls
Martin Schwenke [Wed, 22 Jan 2014 05:02:46 +0000 (16:02 +1100)]
daemon: reloadips must register state of asynchronous controls

Otherwise ctdb_client_async_wait() is a no-op.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e5778cc172eb9fab6382f1c600326f6cc99b9162)

10 years agotests: in the stub "ip link show" command use echo instead of cat
Michael Adam [Wed, 27 Nov 2013 22:43:53 +0000 (23:43 +0100)]
tests: in the stub "ip link show" command use echo instead of cat

This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104

(Imported from commit e2db9c524f40f8771ae19b2be47a56f7a9d887af)

10 years agotest: remove unused ip2ipmask from integration.bash
Michael Adam [Wed, 27 Nov 2013 21:28:06 +0000 (22:28 +0100)]
test: remove unused ip2ipmask from integration.bash

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fd5e8905a09875d13ef109133edd361a82cf8e1e)

10 years agotests:76_ctdb_pdb_recovery: change from using ctdb pstore to ctdb ptrans.
Michael Adam [Wed, 27 Nov 2013 10:42:28 +0000 (11:42 +0100)]
tests:76_ctdb_pdb_recovery: change from using ctdb pstore to ctdb ptrans.

This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e281cfa8db4a2506f9016718373cdc80f4aa9c1f)

10 years agotests:76_ctdb_pdb_recovery: fix a typo in a message
Michael Adam [Wed, 27 Nov 2013 22:28:24 +0000 (23:28 +0100)]
tests:76_ctdb_pdb_recovery: fix a typo in a message

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 30dead171f82b5da31cbcbab88eaa70a896d9c55)

10 years agotests:76_ctdb_pdb_recovery: fix a typo in a message
Michael Adam [Wed, 27 Nov 2013 10:40:53 +0000 (11:40 +0100)]
tests:76_ctdb_pdb_recovery: fix a typo in a message

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 3e083f96ff02cbf419513e16a200e8d4d0c2c227)

10 years agotests: in the stub ip command, avoid broken pipe by using echo instead of cat
Michael Adam [Wed, 27 Nov 2013 11:13:40 +0000 (12:13 +0100)]
tests: in the stub ip command, avoid broken pipe by using echo instead of cat

This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 70f469e05e279e29980df2af10dd89c53001b236)

10 years agotests/integration: Update NFS tickles test and supporting code
Martin Schwenke [Thu, 28 Nov 2013 05:43:55 +0000 (16:43 +1100)]
tests/integration: Update NFS tickles test and supporting code

This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred.  This code is horrible and fragile.

A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.

Also remove the old code that checks for tickle information in shared
storage.  CTDB hasn't done things this way for a long time.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit ef0e8cc1928dbd12c862a5e96710471ce3b4d023)

10 years agoeventscripts: Do not mark node unhealthy if no fs is available
Srikrishan Malik [Fri, 13 Dec 2013 07:35:53 +0000 (13:05 +0530)]
eventscripts: Do not mark node unhealthy if no fs is available

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104

(Imported from commit 164ee000df2a3ffc91690c60d08e4ea7ff1a33f2)

10 years agodaemon: Simplify listing event scripts using scandir
Amitay Isaacs [Thu, 16 Jan 2014 02:05:58 +0000 (13:05 +1100)]
daemon: Simplify listing event scripts using scandir

Instead of using RB tree for sorting the script names (incorrectly since
it's only using the leading numbers in the script name), use scandir
with alphasort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104

(Imported from commit eee450fec2f7cb5f45c47162fd5b7c0717978598)

10 years agodaemon: Do not run monitor event if any other event is already running
Amitay Isaacs [Thu, 19 Dec 2013 02:01:25 +0000 (13:01 +1100)]
daemon: Do not run monitor event if any other event is already running

Any currently running monitor events are cancelled if any other events
are scheduled.  However, this does not stop monitor events to be run
when other events are already running.

Keep track of the number of active events and schedule monitor event
only if there are no active events.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit cbffbb7c2f406fc1d8ebad3c531cc2757232690e)

10 years agoeventscripts: Move all eventscript state under $CTDB_VARDIR/state
Martin Schwenke [Wed, 18 Dec 2013 06:08:55 +0000 (17:08 +1100)]
eventscripts: Move all eventscript state under $CTDB_VARDIR/state

Services can be flagged for reconfigure when they release IPs at
shutdown.  The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.

$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event.  Just push the service
state subdirectories down a level and put everything else in a
subdirectory.

This way all the eventscript state gets cleaned up every time CTDB
starts up.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104

(Imported from commit b7bfe46636d07c71f83daff884ec339c9b4aee72)

10 years agodaemon: Untangle serialisation of 1st recovery -> startup -> monitor
Martin Schwenke [Wed, 18 Dec 2013 04:37:11 +0000 (15:37 +1100)]
daemon: Untangle serialisation of 1st recovery -> startup -> monitor

At the moment ctdb_check_healthy() is overloaded to wait until the
first recovery is complete, handle the "startup" event and also
actually handle monitoring.  This is untidy and hard to follow.

Instead, have the daemon explicitly wait for 1st recovery after the
"setup" event.  When first recovery is complete, schedule a function
to handle the "startup" event.  When the "startup" event succeeds then
explicitly enable monitoring.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e6304d1e1adc86fc9c1199feb7b4802614fbc70f)

10 years agoeventscripts: Print a count if killing TCP connections times out
Martin Schwenke [Mon, 13 Jan 2014 05:34:50 +0000 (16:34 +1100)]
eventscripts: Print a count if killing TCP connections times out

Also update related test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 50e00b3e5224d53df0f3cc882e71737f928e01cd)

10 years agoeventscripts: Reconfigure lock should be released quickly
Martin Schwenke [Wed, 18 Dec 2013 02:51:22 +0000 (13:51 +1100)]
eventscripts: Reconfigure lock should be released quickly

Currently the lock is held until the corresponding eventscript
completes, since the process still exists.  If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time.  The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held.  This can cause an unwanted monitor replay.

Change this so that the lock is released immediately after the
reconfiguration is complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8eb20c23476d390bb8a12ba01c9f06e7ac4a1453)

10 years agorecoverd: Do not refuse disabling takeover runs on inactive nodes
Martin Schwenke [Wed, 18 Dec 2013 08:15:39 +0000 (19:15 +1100)]
recoverd: Do not refuse disabling takeover runs on inactive nodes

Failure might be expected when disabling takeover runs on banned
nodes, since they might be suffering from performance problems or
similar.  More broadly, administrators who reconfigure a cluster that
isn't in a happy state aren't necessarily doing something sensible.

However, allowing takeover runs to be disabled on inactive nodes stops
reconfiguration of stopped nodes.  This is probaby an unreasonable
limitation, so drop it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e77d5f99e396d71c1d354b3f8dc5ddf9ba5c5ee9)

10 years agorecoverd: Ignore failed ipreallocated controls to inactive nodes
Martin Schwenke [Tue, 26 Nov 2013 01:35:44 +0000 (12:35 +1100)]
recoverd: Ignore failed ipreallocated controls to inactive nodes

Currently timeouts for controls to inactive nodes can cause banning
credits to be applied.  This should not happen.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a955d0bedce888597633c0c88082f29e1d26e503)

10 years agodaemon: Remove ctdb_fork_with_logging()
Amitay Isaacs [Wed, 18 Dec 2013 03:09:52 +0000 (14:09 +1100)]
daemon: Remove ctdb_fork_with_logging()

This function has been replaced with ctdb_vfork_with_logging().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 16 04:05:35 CET 2014 on sn-devel-104

(Imported from commit a92fd11ad1ccc904a999a254d249bbdc74f08f84)

10 years agotests: Set CTDB_EVENT_HELPER when running with local daemons
Amitay Isaacs [Mon, 13 Jan 2014 04:16:46 +0000 (15:16 +1100)]
tests: Set CTDB_EVENT_HELPER when running with local daemons

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit dd98b9df6651054dabefdf439735042a78cfea2e)

10 years agodaemon: Remove unused code to run eventscripts
Amitay Isaacs [Tue, 17 Dec 2013 08:22:20 +0000 (19:22 +1100)]
daemon: Remove unused code to run eventscripts

Eventscripts are now executed using a helper.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 97575e1ba0b7fecef2b26f2da1c0d8cb769a37a8)

10 years agodaemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 2)
Amitay Isaacs [Wed, 18 Dec 2013 03:07:57 +0000 (14:07 +1100)]
daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 2)

Use ctdb_event_helper to run debug-hung-script.sh.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 18c1f432102f1a5093927be9276d001180539e50)

10 years agodaemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 1)
Amitay Isaacs [Tue, 17 Dec 2013 08:19:51 +0000 (19:19 +1100)]
daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 1)

Use ctdb_event_helper to run eventscripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit d86662a925a072eb0374ad7743f4cf95c447bebb)

10 years agodaemon: Add helper process to execute event scripts
Amitay Isaacs [Mon, 16 Dec 2013 04:40:01 +0000 (15:40 +1100)]
daemon: Add helper process to execute event scripts

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 69324b61f0669022c7204ee08a4c7104865d4e9b)

10 years agodaemon: Add ctdb_vfork_with_logging()
Amitay Isaacs [Mon, 16 Dec 2013 04:39:29 +0000 (15:39 +1100)]
daemon: Add ctdb_vfork_with_logging()

This will be used to spawn lightweight helper processes to run
eventscripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 2879404388ed04af199a7e4451605b4435e8cc23)

10 years agodaemon: No need to call event scripts with CTDB_CALLED_BY_USER
Amitay Isaacs [Mon, 16 Dec 2013 04:57:42 +0000 (15:57 +1100)]
daemon: No need to call event scripts with CTDB_CALLED_BY_USER

This was added to support external monitoring using CTDB event scripts.
However, it was never used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 7aa20ccb5c747707fca349e9e0847cd0fca8c839)

10 years agodaemon: Deprecate RELOAD and STATUS events
Amitay Isaacs [Mon, 23 Dec 2013 00:46:48 +0000 (11:46 +1100)]
daemon: Deprecate RELOAD and STATUS events

These events have never been used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit bafa467021b7b2f17c61904b9f70f695a4395921)

10 years agocommon: mkdir_p should not try to create .
Amitay Isaacs [Tue, 17 Dec 2013 08:48:29 +0000 (19:48 +1100)]
common: mkdir_p should not try to create .

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit b8c6bcc365ce08ddc0ebf51c002d53c08f144981)

10 years agoeventscripts: Do not reconfigure in "monitor" events
Martin Schwenke [Mon, 9 Dec 2013 04:54:52 +0000 (15:54 +1100)]
eventscripts: Do not reconfigure in "monitor" events

"monitor" events can be cancelled.  If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped.  In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).

A long time ago we did service reconfiguration in "monitor" events
following failovers.  Service reconfiguration was then moved to the
"ipreallocated" event.  However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur.  The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated".  Therefore, IPs can be deleted without
running the required service reconfiguration.

The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.

This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.

Also update the associated tests.  Make the first confirm that the
monitor event no longer does reconfiguration.  Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104

(Imported from commit fdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd)

10 years agopackaging:RPM: don't run autogen.
Michael Adam [Fri, 6 Dec 2013 00:37:34 +0000 (01:37 +0100)]
packaging:RPM: don't run autogen.

autogen is already run in maketarball.sh which generates
the tarball for the RPM.

This way, we don't have a rpm build dependency on autoconf.
Recent changes introduced a dependency into autoconf
version >= 2.60, so this fix allows the generated
source RPM to be built also on older platforms.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec  9 05:47:00 CET 2013 on sn-devel-104

(Imported from commit c65ad56d40c2ac286dc9d726119d04384981d0b3)

10 years agopackaging:RPM: package the new manpages
Michael Adam [Fri, 6 Dec 2013 00:33:57 +0000 (01:33 +0100)]
packaging:RPM: package the new manpages

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 7dbb068aa7e77f34377e762bbd65cb7ca72b85b4)

10 years agobuild: install the new manpages
Michael Adam [Fri, 6 Dec 2013 00:31:11 +0000 (01:31 +0100)]
build: install the new manpages

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 0e8340229b0efa6291218a24865e52acb24bb12c)

10 years agoUpdate NEWS ctdb-2.5.1
Martin Schwenke [Mon, 25 Nov 2013 08:28:10 +0000 (19:28 +1100)]
Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoscripts: Be careful when generating unique pids for stack traces
Amitay Isaacs [Tue, 26 Nov 2013 04:41:50 +0000 (15:41 +1100)]
scripts: Be careful when generating unique pids for stack traces

sort expects the data to be line based, so make it so.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoconfig: Simplify the default CTDB configuration file
Amitay Isaacs [Tue, 26 Nov 2013 03:38:58 +0000 (14:38 +1100)]
config: Simplify the default CTDB configuration file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

10 years agoscripts: Replace hard-coded /var/ctdb with CTDB_VARDIR
Amitay Isaacs [Tue, 26 Nov 2013 03:29:52 +0000 (14:29 +1100)]
scripts: Replace hard-coded /var/ctdb with CTDB_VARDIR

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoscripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT
Amitay Isaacs [Tue, 26 Nov 2013 02:27:46 +0000 (13:27 +1100)]
scripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT

If these configuration variables are not defined, then there should
a default fallback.  This is a workaround till CTDB compile time
configuration can be accessed at runtime.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoeventscripts: Perform share check before NFS RPC checks in 60.ganesha
Amitay Isaacs [Tue, 26 Nov 2013 00:39:54 +0000 (11:39 +1100)]
eventscripts: Perform share check before NFS RPC checks in 60.ganesha

If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotools/ctdb: Improve error checking when parsing node string
Martin Schwenke [Fri, 22 Nov 2013 02:57:31 +0000 (13:57 +1100)]
tools/ctdb: Improve error checking when parsing node string

If a node isn't numeric then it is silently converted to 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agorecoverd: Only respond to currently queued ipreallocated requests
Martin Schwenke [Fri, 22 Nov 2013 02:57:03 +0000 (13:57 +1100)]
recoverd: Only respond to currently queued ipreallocated requests

Otherwise new requests can come in during the latter parts of the
takeover run when the IP allocation algorithm has already run, and the
new requests will be dequeued even though they haven't really be
processed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoscripts: Add an early exit to statd-callout's notify case
Martin Schwenke [Tue, 19 Nov 2013 04:40:08 +0000 (15:40 +1100)]
scripts: Add an early exit to statd-callout's notify case

If $statd_state is empty then the loop will run once and print
spurious errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Remove the nfs_statd_update() call from 60.ganesha
Martin Schwenke [Tue, 19 Nov 2013 04:37:58 +0000 (15:37 +1100)]
eventscripts: Remove the nfs_statd_update() call from 60.ganesha

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/integration: Neaten up some of the persistent database tests
Martin Schwenke [Mon, 18 Nov 2013 10:04:49 +0000 (21:04 +1100)]
tests/integration: Neaten up some of the persistent database tests

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotools/ctdb: Fix tstore command to generate ltdb header internally
Amitay Isaacs [Mon, 18 Nov 2013 04:09:27 +0000 (15:09 +1100)]
tools/ctdb: Fix tstore command to generate ltdb header internally

This fixes an alignment discrepancy on 32-bit vs 64-bit platforms.

  sizeof(struct ctdb_ltdb_header) = 20  (32-bit)
                                  = 24  (64-bit)

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests/takeover: Fix bogus test description
Martin Schwenke [Fri, 15 Nov 2013 04:31:03 +0000 (15:31 +1100)]
tests/takeover: Fix bogus test description

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/simple: User sleep_for() instead of sleep
Martin Schwenke [Fri, 15 Nov 2013 04:23:14 +0000 (15:23 +1100)]
tests/simple: User sleep_for() instead of sleep

Progress...

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/simple: Update persistent DB tests
Martin Schwenke [Fri, 15 Nov 2013 04:21:58 +0000 (15:21 +1100)]
tests/simple: Update persistent DB tests

* Low level DB checks should ignore the sequence number record.

* A restart is needed after messing with the RecoverPDBBySeqNum
  tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agorecoverd: For persistent databases a sequence number of 0 is valid
Martin Schwenke [Fri, 15 Nov 2013 04:20:40 +0000 (15:20 +1100)]
recoverd: For persistent databases a sequence number of 0 is valid

Otherwise recovery ends up done by RSN when it is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agolocking: Use vfork instead of fork to exec helpers
Amitay Isaacs [Tue, 19 Nov 2013 04:31:39 +0000 (15:31 +1100)]
locking: Use vfork instead of fork to exec helpers

There is a significant overhead using fork() over vfork(), specially
when the child process execs a helper.  The overhead is in memory space
and time.

    # strace -c ./test_fork 1024 200
    count=1024, size=204800, total=200M
    failed fork=0
    time for fork() = 4879.597000 us
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    4.543321        3304      1375       375 clone
      0.00    0.000071           0      1033           mmap
      0.00    0.000000           0         1           read
      0.00    0.000000           0         3           write
      0.00    0.000000           0         2           open
      0.00    0.000000           0         2           close
      0.00    0.000000           0         3           fstat
      0.00    0.000000           0         3           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         3           brk
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    4.543392                  2429       376 total

    # strace -c ./test_vfork 1024 200
    count=1024, size=204800, total=200M
    failed fork=0
    time for fork() = 82.041000 us
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     96.47    0.001204           1      1000           vfork
      3.53    0.000044           0      1033           mmap
      0.00    0.000000           0         1           read
      0.00    0.000000           0         3           write
      0.00    0.000000           0         2           open
      0.00    0.000000           0         2           close
      0.00    0.000000           0         3           fstat
      0.00    0.000000           0         3           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         3           brk
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.001248                  2054         1 total

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agocommon: Refactor code to keep track of child processes
Amitay Isaacs [Tue, 19 Nov 2013 05:13:20 +0000 (16:13 +1100)]
common: Refactor code to keep track of child processes

This code can then be used to track child processes created with vfork().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoscripts: Run a single instance of debug_locks.sh at a give time
Amitay Isaacs [Fri, 15 Nov 2013 07:59:04 +0000 (18:59 +1100)]
scripts: Run a single instance of debug_locks.sh at a give time

This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.

Also, improve the logging format with separators.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agolocking: Update current lock statistics when lock is scheduled
Amitay Isaacs [Fri, 15 Nov 2013 07:36:09 +0000 (18:36 +1100)]
locking: Update current lock statistics when lock is scheduled

When a child process is created for a lock request, the current locks
statistics should be updated immediately.  This will provide accurate
information on number of active lock requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agolocking: Do not merge multiple lock requests to avoid unfair scheduling
Amitay Isaacs [Mon, 18 Nov 2013 04:48:22 +0000 (15:48 +1100)]
locking: Do not merge multiple lock requests to avoid unfair scheduling

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agolocking: Implement active lock requests limit per database
Amitay Isaacs [Fri, 15 Nov 2013 04:58:59 +0000 (15:58 +1100)]
locking: Implement active lock requests limit per database

This limit was currently a global limit and not per database.  This
prevents any database freeze lock requests from getting scheduled if
the global limit was reached.

Only individual record requests should be limited and database freeze
requests should always get scheduled.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoscripts: Rewrite statd-callout to avoid 10 minute lag
Martin Schwenke [Fri, 8 Nov 2013 05:41:11 +0000 (16:41 +1100)]
scripts: Rewrite statd-callout to avoid 10 minute lag

This is naive and assumes no performance problems when updating
persistent DBs.  It also does no error handling.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoclient: Treat empty __db_sequence_number__ record as 0
Amitay Isaacs [Wed, 13 Nov 2013 06:45:25 +0000 (17:45 +1100)]
client: Treat empty __db_sequence_number__ record as 0

This fixes the issue of transaction commit failing due to an empty
__db_sequence_number__ record in persistent database left by previous
cancelled transaction.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agodoc: Update ctdb.1 - primarily to add pdelete/pfetch/pstore/ptrans
Martin Schwenke [Wed, 13 Nov 2013 05:19:00 +0000 (16:19 +1100)]
doc: Update ctdb.1 - primarily to add pdelete/pfetch/pstore/ptrans

Also:

* More <refentryinfo> above <refmeta> to make the XML valid.

* Describe DB argument in introduction and use it for database
  commands.

* Remove unnecessary format="linespecific" from <screen> tags, since
  it will not be allowed in DocBook 5.0.

* Sort the items in "INTERNAL COMMANDS".

* Update/simplify some command descriptions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotools/ctdb: New ptrans command
Martin Schwenke [Wed, 6 Nov 2013 02:43:53 +0000 (13:43 +1100)]
tools/ctdb: New ptrans command

Also add test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoonnode: New -i option to stop stdin from being closed
Martin Schwenke [Wed, 13 Nov 2013 03:04:17 +0000 (14:04 +1100)]
onnode: New -i option to stop stdin from being closed

This can be useful for piping data to onnode in certain circumstances.

There are now also enough command-line options that they should
definitely be alphabetically ordered.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/integration: try_command_on_node() shouldn't lose onnode options
Martin Schwenke [Wed, 13 Nov 2013 03:13:52 +0000 (14:13 +1100)]
tests/integration: try_command_on_node() shouldn't lose onnode options

Currently it only passes the last (non -v) option seen.  It should
pass them all.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agorecoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUN
Martin Schwenke [Tue, 12 Nov 2013 04:16:49 +0000 (15:16 +1100)]
recoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUN

When running a mixed version cluster, compatibility with older
versions was was broken during recent refactorisation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoscripts: debug_locks.sh should use configuration to find TDB location
Martin Schwenke [Mon, 4 Nov 2013 01:56:39 +0000 (12:56 +1100)]
scripts: debug_locks.sh should use configuration to find TDB location

That is, don't use fixed paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agorecoverd: A node refuses to play against itself
Martin Schwenke [Fri, 1 Nov 2013 03:34:20 +0000 (14:34 +1100)]
recoverd: A node refuses to play against itself

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agorecoverd: Remove duplicate code to update flags during recovery
Martin Schwenke [Thu, 14 Nov 2013 03:25:47 +0000 (14:25 +1100)]
recoverd: Remove duplicate code to update flags during recovery

This also happens earlier in do_recovery() and the nodemap is not
updated after that, so this update is redundant.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agobuild: Update to latest upstream config.guess
Martin Schwenke [Thu, 14 Nov 2013 03:14:10 +0000 (14:14 +1100)]
build: Update to latest upstream config.guess

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotools/ctdb: Fix db commands when dbid is given instead of name
Amitay Isaacs [Wed, 13 Nov 2013 04:25:46 +0000 (15:25 +1100)]
tools/ctdb: Fix db commands when dbid is given instead of name

Signed-off-by: Amitay Isaacs <amitay@gmail.com>