Amitay Isaacs [Fri, 31 Jan 2014 07:30:56 +0000 (18:30 +1100)]
doc: Update NEWS
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Fri, 31 Jan 2014 01:46:21 +0000 (12:46 +1100)]
doc: Fix usage string for ctdb readkey/writekey
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104
(Imported from commit
35eb6cb521d54708f0bbba515f645327846b4e70)
Amitay Isaacs [Thu, 23 Jan 2014 03:57:53 +0000 (14:57 +1100)]
daemon: Return negative status only if there are known errors
If event script does not exist or does not have execute permissions, then
return negative errno to distinguish from the exit errors of event script.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
1566790e5a738f12db1dfb519589c1842d74b8e5)
Martin Schwenke [Tue, 28 Jan 2014 03:34:15 +0000 (14:34 +1100)]
tests/eventscripts: Avoid errors on broken pipe
ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104
(Imported from commit
b112a3317cbedc73a6e17b3f711fec84f0d41d4e)
Martin Schwenke [Tue, 28 Jan 2014 05:07:53 +0000 (16:07 +1100)]
tests/eventscripts: Improve ip command stub secondary handling
It should support primary and secondaries per network instead of per
interface.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
1640f36d5831b2575d117fac335f3324ceefa9f8)
Martin Schwenke [Wed, 22 Jan 2014 05:02:46 +0000 (16:02 +1100)]
daemon: reloadips must register state of asynchronous controls
Otherwise ctdb_client_async_wait() is a no-op.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
e5778cc172eb9fab6382f1c600326f6cc99b9162)
Michael Adam [Wed, 27 Nov 2013 22:43:53 +0000 (23:43 +0100)]
tests: in the stub "ip link show" command use echo instead of cat
This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104
(Imported from commit
e2db9c524f40f8771ae19b2be47a56f7a9d887af)
Michael Adam [Wed, 27 Nov 2013 21:28:06 +0000 (22:28 +0100)]
test: remove unused ip2ipmask from integration.bash
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
fd5e8905a09875d13ef109133edd361a82cf8e1e)
Michael Adam [Wed, 27 Nov 2013 10:42:28 +0000 (11:42 +0100)]
tests:76_ctdb_pdb_recovery: change from using ctdb pstore to ctdb ptrans.
This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
e281cfa8db4a2506f9016718373cdc80f4aa9c1f)
Michael Adam [Wed, 27 Nov 2013 22:28:24 +0000 (23:28 +0100)]
tests:76_ctdb_pdb_recovery: fix a typo in a message
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
30dead171f82b5da31cbcbab88eaa70a896d9c55)
Michael Adam [Wed, 27 Nov 2013 10:40:53 +0000 (11:40 +0100)]
tests:76_ctdb_pdb_recovery: fix a typo in a message
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
3e083f96ff02cbf419513e16a200e8d4d0c2c227)
Michael Adam [Wed, 27 Nov 2013 11:13:40 +0000 (12:13 +0100)]
tests: in the stub ip command, avoid broken pipe by using echo instead of cat
This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
70f469e05e279e29980df2af10dd89c53001b236)
Martin Schwenke [Thu, 28 Nov 2013 05:43:55 +0000 (16:43 +1100)]
tests/integration: Update NFS tickles test and supporting code
This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred. This code is horrible and fragile.
A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.
Also remove the old code that checks for tickle information in shared
storage. CTDB hasn't done things this way for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit
ef0e8cc1928dbd12c862a5e96710471ce3b4d023)
Srikrishan Malik [Fri, 13 Dec 2013 07:35:53 +0000 (13:05 +0530)]
eventscripts: Do not mark node unhealthy if no fs is available
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104
(Imported from commit
164ee000df2a3ffc91690c60d08e4ea7ff1a33f2)
Amitay Isaacs [Thu, 16 Jan 2014 02:05:58 +0000 (13:05 +1100)]
daemon: Simplify listing event scripts using scandir
Instead of using RB tree for sorting the script names (incorrectly since
it's only using the leading numbers in the script name), use scandir
with alphasort.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104
(Imported from commit
eee450fec2f7cb5f45c47162fd5b7c0717978598)
Amitay Isaacs [Thu, 19 Dec 2013 02:01:25 +0000 (13:01 +1100)]
daemon: Do not run monitor event if any other event is already running
Any currently running monitor events are cancelled if any other events
are scheduled. However, this does not stop monitor events to be run
when other events are already running.
Keep track of the number of active events and schedule monitor event
only if there are no active events.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
cbffbb7c2f406fc1d8ebad3c531cc2757232690e)
Martin Schwenke [Wed, 18 Dec 2013 06:08:55 +0000 (17:08 +1100)]
eventscripts: Move all eventscript state under $CTDB_VARDIR/state
Services can be flagged for reconfigure when they release IPs at
shutdown. The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.
$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event. Just push the service
state subdirectories down a level and put everything else in a
subdirectory.
This way all the eventscript state gets cleaned up every time CTDB
starts up.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104
(Imported from commit
b7bfe46636d07c71f83daff884ec339c9b4aee72)
Martin Schwenke [Wed, 18 Dec 2013 04:37:11 +0000 (15:37 +1100)]
daemon: Untangle serialisation of 1st recovery -> startup -> monitor
At the moment ctdb_check_healthy() is overloaded to wait until the
first recovery is complete, handle the "startup" event and also
actually handle monitoring. This is untidy and hard to follow.
Instead, have the daemon explicitly wait for 1st recovery after the
"setup" event. When first recovery is complete, schedule a function
to handle the "startup" event. When the "startup" event succeeds then
explicitly enable monitoring.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
e6304d1e1adc86fc9c1199feb7b4802614fbc70f)
Martin Schwenke [Mon, 13 Jan 2014 05:34:50 +0000 (16:34 +1100)]
eventscripts: Print a count if killing TCP connections times out
Also update related test
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
50e00b3e5224d53df0f3cc882e71737f928e01cd)
Martin Schwenke [Wed, 18 Dec 2013 02:51:22 +0000 (13:51 +1100)]
eventscripts: Reconfigure lock should be released quickly
Currently the lock is held until the corresponding eventscript
completes, since the process still exists. If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time. The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held. This can cause an unwanted monitor replay.
Change this so that the lock is released immediately after the
reconfiguration is complete.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
8eb20c23476d390bb8a12ba01c9f06e7ac4a1453)
Martin Schwenke [Wed, 18 Dec 2013 08:15:39 +0000 (19:15 +1100)]
recoverd: Do not refuse disabling takeover runs on inactive nodes
Failure might be expected when disabling takeover runs on banned
nodes, since they might be suffering from performance problems or
similar. More broadly, administrators who reconfigure a cluster that
isn't in a happy state aren't necessarily doing something sensible.
However, allowing takeover runs to be disabled on inactive nodes stops
reconfiguration of stopped nodes. This is probaby an unreasonable
limitation, so drop it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
e77d5f99e396d71c1d354b3f8dc5ddf9ba5c5ee9)
Martin Schwenke [Tue, 26 Nov 2013 01:35:44 +0000 (12:35 +1100)]
recoverd: Ignore failed ipreallocated controls to inactive nodes
Currently timeouts for controls to inactive nodes can cause banning
credits to be applied. This should not happen.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
a955d0bedce888597633c0c88082f29e1d26e503)
Amitay Isaacs [Wed, 18 Dec 2013 03:09:52 +0000 (14:09 +1100)]
daemon: Remove ctdb_fork_with_logging()
This function has been replaced with ctdb_vfork_with_logging().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 16 04:05:35 CET 2014 on sn-devel-104
(Imported from commit
a92fd11ad1ccc904a999a254d249bbdc74f08f84)
Amitay Isaacs [Mon, 13 Jan 2014 04:16:46 +0000 (15:16 +1100)]
tests: Set CTDB_EVENT_HELPER when running with local daemons
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
dd98b9df6651054dabefdf439735042a78cfea2e)
Amitay Isaacs [Tue, 17 Dec 2013 08:22:20 +0000 (19:22 +1100)]
daemon: Remove unused code to run eventscripts
Eventscripts are now executed using a helper.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
97575e1ba0b7fecef2b26f2da1c0d8cb769a37a8)
Amitay Isaacs [Wed, 18 Dec 2013 03:07:57 +0000 (14:07 +1100)]
daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 2)
Use ctdb_event_helper to run debug-hung-script.sh.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
18c1f432102f1a5093927be9276d001180539e50)
Amitay Isaacs [Tue, 17 Dec 2013 08:19:51 +0000 (19:19 +1100)]
daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 1)
Use ctdb_event_helper to run eventscripts.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
d86662a925a072eb0374ad7743f4cf95c447bebb)
Amitay Isaacs [Mon, 16 Dec 2013 04:40:01 +0000 (15:40 +1100)]
daemon: Add helper process to execute event scripts
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
69324b61f0669022c7204ee08a4c7104865d4e9b)
Amitay Isaacs [Mon, 16 Dec 2013 04:39:29 +0000 (15:39 +1100)]
daemon: Add ctdb_vfork_with_logging()
This will be used to spawn lightweight helper processes to run
eventscripts.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
2879404388ed04af199a7e4451605b4435e8cc23)
Amitay Isaacs [Mon, 16 Dec 2013 04:57:42 +0000 (15:57 +1100)]
daemon: No need to call event scripts with CTDB_CALLED_BY_USER
This was added to support external monitoring using CTDB event scripts.
However, it was never used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
7aa20ccb5c747707fca349e9e0847cd0fca8c839)
Amitay Isaacs [Mon, 23 Dec 2013 00:46:48 +0000 (11:46 +1100)]
daemon: Deprecate RELOAD and STATUS events
These events have never been used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
bafa467021b7b2f17c61904b9f70f695a4395921)
Amitay Isaacs [Tue, 17 Dec 2013 08:48:29 +0000 (19:48 +1100)]
common: mkdir_p should not try to create .
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit
b8c6bcc365ce08ddc0ebf51c002d53c08f144981)
Martin Schwenke [Mon, 9 Dec 2013 04:54:52 +0000 (15:54 +1100)]
eventscripts: Do not reconfigure in "monitor" events
"monitor" events can be cancelled. If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped. In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).
A long time ago we did service reconfiguration in "monitor" events
following failovers. Service reconfiguration was then moved to the
"ipreallocated" event. However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur. The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated". Therefore, IPs can be deleted without
running the required service reconfiguration.
The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.
This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.
Also update the associated tests. Make the first confirm that the
monitor event no longer does reconfiguration. Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
(Imported from commit
fdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd)
Michael Adam [Fri, 6 Dec 2013 00:37:34 +0000 (01:37 +0100)]
packaging:RPM: don't run autogen.
autogen is already run in maketarball.sh which generates
the tarball for the RPM.
This way, we don't have a rpm build dependency on autoconf.
Recent changes introduced a dependency into autoconf
version >= 2.60, so this fix allows the generated
source RPM to be built also on older platforms.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec 9 05:47:00 CET 2013 on sn-devel-104
(Imported from commit
c65ad56d40c2ac286dc9d726119d04384981d0b3)
Michael Adam [Fri, 6 Dec 2013 00:33:57 +0000 (01:33 +0100)]
packaging:RPM: package the new manpages
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
7dbb068aa7e77f34377e762bbd65cb7ca72b85b4)
Michael Adam [Fri, 6 Dec 2013 00:31:11 +0000 (01:31 +0100)]
build: install the new manpages
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit
0e8340229b0efa6291218a24865e52acb24bb12c)
Martin Schwenke [Mon, 25 Nov 2013 08:28:10 +0000 (19:28 +1100)]
Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 26 Nov 2013 04:41:50 +0000 (15:41 +1100)]
scripts: Be careful when generating unique pids for stack traces
sort expects the data to be line based, so make it so.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 26 Nov 2013 03:38:58 +0000 (14:38 +1100)]
config: Simplify the default CTDB configuration file
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Tue, 26 Nov 2013 03:29:52 +0000 (14:29 +1100)]
scripts: Replace hard-coded /var/ctdb with CTDB_VARDIR
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 26 Nov 2013 02:27:46 +0000 (13:27 +1100)]
scripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT
If these configuration variables are not defined, then there should
a default fallback. This is a workaround till CTDB compile time
configuration can be accessed at runtime.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 26 Nov 2013 00:39:54 +0000 (11:39 +1100)]
eventscripts: Perform share check before NFS RPC checks in 60.ganesha
If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 22 Nov 2013 02:57:31 +0000 (13:57 +1100)]
tools/ctdb: Improve error checking when parsing node string
If a node isn't numeric then it is silently converted to 0.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 22 Nov 2013 02:57:03 +0000 (13:57 +1100)]
recoverd: Only respond to currently queued ipreallocated requests
Otherwise new requests can come in during the latter parts of the
takeover run when the IP allocation algorithm has already run, and the
new requests will be dequeued even though they haven't really be
processed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Tue, 19 Nov 2013 04:40:08 +0000 (15:40 +1100)]
scripts: Add an early exit to statd-callout's notify case
If $statd_state is empty then the loop will run once and print
spurious errors.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 19 Nov 2013 04:37:58 +0000 (15:37 +1100)]
eventscripts: Remove the nfs_statd_update() call from 60.ganesha
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 18 Nov 2013 10:04:49 +0000 (21:04 +1100)]
tests/integration: Neaten up some of the persistent database tests
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Mon, 18 Nov 2013 04:09:27 +0000 (15:09 +1100)]
tools/ctdb: Fix tstore command to generate ltdb header internally
This fixes an alignment discrepancy on 32-bit vs 64-bit platforms.
sizeof(struct ctdb_ltdb_header) = 20 (32-bit)
= 24 (64-bit)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 15 Nov 2013 04:31:03 +0000 (15:31 +1100)]
tests/takeover: Fix bogus test description
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 15 Nov 2013 04:23:14 +0000 (15:23 +1100)]
tests/simple: User sleep_for() instead of sleep
Progress...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 15 Nov 2013 04:21:58 +0000 (15:21 +1100)]
tests/simple: Update persistent DB tests
* Low level DB checks should ignore the sequence number record.
* A restart is needed after messing with the RecoverPDBBySeqNum
tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 15 Nov 2013 04:20:40 +0000 (15:20 +1100)]
recoverd: For persistent databases a sequence number of 0 is valid
Otherwise recovery ends up done by RSN when it is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 19 Nov 2013 04:31:39 +0000 (15:31 +1100)]
locking: Use vfork instead of fork to exec helpers
There is a significant overhead using fork() over vfork(), specially
when the child process execs a helper. The overhead is in memory space
and time.
# strace -c ./test_fork 1024 200
count=1024, size=204800, total=200M
failed fork=0
time for fork() = 4879.597000 us
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 4.543321 3304 1375 375 clone
0.00 0.000071 0 1033 mmap
0.00 0.000000 0 1 read
0.00 0.000000 0 3 write
0.00 0.000000 0 2 open
0.00 0.000000 0 2 close
0.00 0.000000 0 3 fstat
0.00 0.000000 0 3 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 4.543392 2429 376 total
# strace -c ./test_vfork 1024 200
count=1024, size=204800, total=200M
failed fork=0
time for fork() = 82.041000 us
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
96.47 0.001204 1 1000 vfork
3.53 0.000044 0 1033 mmap
0.00 0.000000 0 1 read
0.00 0.000000 0 3 write
0.00 0.000000 0 2 open
0.00 0.000000 0 2 close
0.00 0.000000 0 3 fstat
0.00 0.000000 0 3 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.001248 2054 1 total
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Tue, 19 Nov 2013 05:13:20 +0000 (16:13 +1100)]
common: Refactor code to keep track of child processes
This code can then be used to track child processes created with vfork().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 15 Nov 2013 07:59:04 +0000 (18:59 +1100)]
scripts: Run a single instance of debug_locks.sh at a give time
This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.
Also, improve the logging format with separators.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 15 Nov 2013 07:36:09 +0000 (18:36 +1100)]
locking: Update current lock statistics when lock is scheduled
When a child process is created for a lock request, the current locks
statistics should be updated immediately. This will provide accurate
information on number of active lock requests.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 18 Nov 2013 04:48:22 +0000 (15:48 +1100)]
locking: Do not merge multiple lock requests to avoid unfair scheduling
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Fri, 15 Nov 2013 04:58:59 +0000 (15:58 +1100)]
locking: Implement active lock requests limit per database
This limit was currently a global limit and not per database. This
prevents any database freeze lock requests from getting scheduled if
the global limit was reached.
Only individual record requests should be limited and database freeze
requests should always get scheduled.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 8 Nov 2013 05:41:11 +0000 (16:41 +1100)]
scripts: Rewrite statd-callout to avoid 10 minute lag
This is naive and assumes no performance problems when updating
persistent DBs. It also does no error handling.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 13 Nov 2013 06:45:25 +0000 (17:45 +1100)]
client: Treat empty __db_sequence_number__ record as 0
This fixes the issue of transaction commit failing due to an empty
__db_sequence_number__ record in persistent database left by previous
cancelled transaction.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Wed, 13 Nov 2013 05:19:00 +0000 (16:19 +1100)]
doc: Update ctdb.1 - primarily to add pdelete/pfetch/pstore/ptrans
Also:
* More <refentryinfo> above <refmeta> to make the XML valid.
* Describe DB argument in introduction and use it for database
commands.
* Remove unnecessary format="linespecific" from <screen> tags, since
it will not be allowed in DocBook 5.0.
* Sort the items in "INTERNAL COMMANDS".
* Update/simplify some command descriptions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 6 Nov 2013 02:43:53 +0000 (13:43 +1100)]
tools/ctdb: New ptrans command
Also add test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Wed, 13 Nov 2013 03:04:17 +0000 (14:04 +1100)]
onnode: New -i option to stop stdin from being closed
This can be useful for piping data to onnode in certain circumstances.
There are now also enough command-line options that they should
definitely be alphabetically ordered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 13 Nov 2013 03:13:52 +0000 (14:13 +1100)]
tests/integration: try_command_on_node() shouldn't lose onnode options
Currently it only passes the last (non -v) option seen. It should
pass them all.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 12 Nov 2013 04:16:49 +0000 (15:16 +1100)]
recoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUN
When running a mixed version cluster, compatibility with older
versions was was broken during recent refactorisation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Mon, 4 Nov 2013 01:56:39 +0000 (12:56 +1100)]
scripts: debug_locks.sh should use configuration to find TDB location
That is, don't use fixed paths.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 1 Nov 2013 03:34:20 +0000 (14:34 +1100)]
recoverd: A node refuses to play against itself
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Thu, 14 Nov 2013 03:25:47 +0000 (14:25 +1100)]
recoverd: Remove duplicate code to update flags during recovery
This also happens earlier in do_recovery() and the nodemap is not
updated after that, so this update is redundant.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 14 Nov 2013 03:14:10 +0000 (14:14 +1100)]
build: Update to latest upstream config.guess
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Wed, 13 Nov 2013 04:25:46 +0000 (15:25 +1100)]
tools/ctdb: Fix db commands when dbid is given instead of name
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 13 Nov 2013 03:33:31 +0000 (14:33 +1100)]
tests: CTDB tool should always be invoked as $CTDB instad of ctdb
$CTDB_TEST_WRAPPER is required only to run test functions or test binaries
on remote nodes. For running ctdb command, $CTDB is sufficient.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 13 Nov 2013 03:25:59 +0000 (14:25 +1100)]
tests: No need to run onnode in parallel for single node
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 13 Nov 2013 03:19:43 +0000 (14:19 +1100)]
tests: Remove -q option to try_command_on_node
This option is always passed to onnode by default.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 11 Nov 2013 01:41:17 +0000 (12:41 +1100)]
tests: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 11 Nov 2013 01:41:00 +0000 (12:41 +1100)]
tcp: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 11 Nov 2013 01:40:44 +0000 (12:40 +1100)]
tools/ctdb: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 11 Nov 2013 01:40:28 +0000 (12:40 +1100)]
common: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 11 Nov 2013 01:39:48 +0000 (12:39 +1100)]
client: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Mon, 11 Nov 2013 01:39:27 +0000 (12:39 +1100)]
server: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Thu, 7 Nov 2013 05:01:49 +0000 (16:01 +1100)]
tests: Fix calling of ctdb tool from test
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Thu, 7 Nov 2013 04:54:28 +0000 (15:54 +1100)]
Revert "tests: If transaction_start fails, try again"
This reverts commit
ed7d999214ee009e480c26410a04fa105028cb8e.
This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Thu, 7 Nov 2013 04:54:20 +0000 (15:54 +1100)]
client: Make g_lock_lock() wait till lock is obtained
This makes the behaviour of g_lock_lock() similar to that implemented in
Samba. Now ctdb_transaction_start() will return NULL only when there are
failures and not when another transaction is active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Srikrishan Malik [Thu, 31 Oct 2013 06:24:58 +0000 (11:54 +0530)]
eventscript: Fix link creation failure if the link already exist but the target path is missing
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Martin Schwenke [Wed, 16 Oct 2013 00:46:54 +0000 (11:46 +1100)]
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 30 Oct 2013 02:22:21 +0000 (13:22 +1100)]
web: Add links to new manpages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Mon, 23 Sep 2013 06:26:16 +0000 (16:26 +1000)]
doc: Major updates to manual pages
This includes new manpages for ctdb.7, ctdb.conf.5 and ctdb-tunables.7.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 30 Oct 2013 01:37:15 +0000 (12:37 +1100)]
tunables: Remove obsolete tunables
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Wed, 30 Oct 2013 01:17:37 +0000 (12:17 +1100)]
recoverd: Rebalancing should be done regardless tunable
Rebalance target nodes should be set even if a deferred rebalance is
not configured. The user can explicitly cause a takeover run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 30 Oct 2013 00:32:28 +0000 (11:32 +1100)]
recoverd: Improve an error message in the election code
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 29 Oct 2013 05:38:42 +0000 (16:38 +1100)]
Revert "if a new node enters the cluster, that node will already be frozen at start"
This is unnecessary due to
03e2e436db5cfd29a56d13f5d2101e42389bfc94.
Furthermore, if a node doesn't force an election but wins it then it
can fail to record that it is the new recovery master. This can lead
to a reverse split brain where there is no recovery master.
This reverts commit
c5035657606283d2e35bea40992505e84ca8e7be.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Conflicts:
server/ctdb_recoverd.c
Martin Schwenke [Tue, 29 Oct 2013 03:05:41 +0000 (14:05 +1100)]
ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO
This is important enough that we should see it when the log level is
DEBUG_NOTICE.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 28 Oct 2013 05:20:44 +0000 (16:20 +1100)]
tests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test
This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 28 Oct 2013 05:14:40 +0000 (16:14 +1100)]
tests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test
This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.
This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration. Fixing it is hard and involves adding a more
complex stub for testparm. We already have that in the eventscript
unit tests above.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 28 Oct 2013 05:00:54 +0000 (16:00 +1100)]
eventscripts: Rewrite the smb.conf cache file handling
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning. Instead, put a
timeout on a foreground update.
If the foreground update fails:
* If there's no available cache file then die.
* If there is a previous cache file then use it and log a warning.
* Do a background update at the end of the monitor event.
Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value. Update the
associated test to add a comma.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Fri, 25 Oct 2013 05:25:25 +0000 (16:25 +1100)]
tools/ctdb: Fix documentation string for ban command
Ban time of 0 is not supported.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 24 Oct 2013 00:13:16 +0000 (11:13 +1100)]
Revert "recoverd: Disable takeover runs on other nodes for 5 minutes"
5 minutes is too long to leave the cluster in limbo if the recovery
daemon dies during a takeover run, even though this is quite unlikely.
We need a new recover master to be able to do takeover runs fairly
quickly.
This reverts commit
71080676bb4acbd0d9b595a30cf7fe6dddbf426f.
Martin Schwenke [Thu, 24 Oct 2013 03:15:53 +0000 (14:15 +1100)]
tools/onnode: Fix healthy/ok node handling
This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output. The fake "ctdb -Y status" output in the
test was never updated to reflect this change.
Instead of making sure that all columns are "0", just check that
they're not "1". This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.
Also update associated tests. The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.
This fixes samba bz#8122.
Signed-off-by: Martin Schwenke <martin@meltin.net>
onnode test fixup
Signed-off-by: Martin Schwenke <martin@meltin.net>
Amitay Isaacs [Mon, 28 Oct 2013 07:49:51 +0000 (18:49 +1100)]
daemon: Change the default recovery method for persistent databases
Use sequence numbers to do recovery for persistent databases instead of
RSNs. This fixes the problem of registry corruption during recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Amitay Isaacs [Wed, 23 Oct 2013 04:37:41 +0000 (15:37 +1100)]
packaging: Create runtime directories for CTDB
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Martin Schwenke [Wed, 23 Oct 2013 00:28:26 +0000 (11:28 +1100)]
initscript: Update systemd configuration to put PID file in /run/ctdb
Elsewhere we're moving the socket to /var/run/ctdb. We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".
Signed-off-by: Martin Schwenke <martin@meltin.net>