git.samba.org - ctdb.git/log

Amitay Isaacs [Fri, 31 Jan 2014 07:30:56 +0000 (18:30 +1100)]

doc: Update NEWS

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>

Amitay Isaacs [Fri, 31 Jan 2014 01:46:21 +0000 (12:46 +1100)]

doc: Fix usage string for ctdb readkey/writekey

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104

(Imported from commit 35eb6cb521d54708f0bbba515f645327846b4e70)

commit | commitdiff | tree

Amitay Isaacs [Thu, 23 Jan 2014 03:57:53 +0000 (14:57 +1100)]

daemon: Return negative status only if there are known errors

If event script does not exist or does not have execute permissions, then
return negative errno to distinguish from the exit errors of event script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1566790e5a738f12db1dfb519589c1842d74b8e5)

commit | commitdiff | tree

Martin Schwenke [Tue, 28 Jan 2014 03:34:15 +0000 (14:34 +1100)]

tests/eventscripts: Avoid errors on broken pipe

ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104

(Imported from commit b112a3317cbedc73a6e17b3f711fec84f0d41d4e)

commit | commitdiff | tree

Martin Schwenke [Tue, 28 Jan 2014 05:07:53 +0000 (16:07 +1100)]

tests/eventscripts: Improve ip command stub secondary handling

It should support primary and secondaries per network instead of per
interface.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1640f36d5831b2575d117fac335f3324ceefa9f8)

commit | commitdiff | tree

Martin Schwenke [Wed, 22 Jan 2014 05:02:46 +0000 (16:02 +1100)]

daemon: reloadips must register state of asynchronous controls

Otherwise ctdb_client_async_wait() is a no-op.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e5778cc172eb9fab6382f1c600326f6cc99b9162)

commit | commitdiff | tree

Michael Adam [Wed, 27 Nov 2013 22:43:53 +0000 (23:43 +0100)]

tests: in the stub "ip link show" command use echo instead of cat

This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104

(Imported from commit e2db9c524f40f8771ae19b2be47a56f7a9d887af)

commit | commitdiff | tree

Michael Adam [Wed, 27 Nov 2013 21:28:06 +0000 (22:28 +0100)]

test: remove unused ip2ipmask from integration.bash

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fd5e8905a09875d13ef109133edd361a82cf8e1e)

commit | commitdiff | tree

Michael Adam [Wed, 27 Nov 2013 10:42:28 +0000 (11:42 +0100)]

tests:76_ctdb_pdb_recovery: change from using ctdb pstore to ctdb ptrans.

This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e281cfa8db4a2506f9016718373cdc80f4aa9c1f)

commit | commitdiff | tree

Michael Adam [Wed, 27 Nov 2013 22:28:24 +0000 (23:28 +0100)]

tests:76_ctdb_pdb_recovery: fix a typo in a message

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 30dead171f82b5da31cbcbab88eaa70a896d9c55)

commit | commitdiff | tree

Michael Adam [Wed, 27 Nov 2013 10:40:53 +0000 (11:40 +0100)]

commit | commitdiff | tree

Michael Adam [Wed, 27 Nov 2013 11:13:40 +0000 (12:13 +0100)]

tests: in the stub ip command, avoid broken pipe by using echo instead of cat

This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 70f469e05e279e29980df2af10dd89c53001b236)

commit | commitdiff | tree

Martin Schwenke [Thu, 28 Nov 2013 05:43:55 +0000 (16:43 +1100)]

tests/integration: Update NFS tickles test and supporting code

This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred. This code is horrible and fragile.

A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.

Also remove the old code that checks for tickle information in shared
storage. CTDB hasn't done things this way for a long time.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(Imported from commit ef0e8cc1928dbd12c862a5e96710471ce3b4d023)

commit | commitdiff | tree

Srikrishan Malik [Fri, 13 Dec 2013 07:35:53 +0000 (13:05 +0530)]

eventscripts: Do not mark node unhealthy if no fs is available

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104

(Imported from commit 164ee000df2a3ffc91690c60d08e4ea7ff1a33f2)

commit | commitdiff | tree

Amitay Isaacs [Thu, 16 Jan 2014 02:05:58 +0000 (13:05 +1100)]

daemon: Simplify listing event scripts using scandir

Instead of using RB tree for sorting the script names (incorrectly since
it's only using the leading numbers in the script name), use scandir
with alphasort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104

(Imported from commit eee450fec2f7cb5f45c47162fd5b7c0717978598)

commit | commitdiff | tree

Amitay Isaacs [Thu, 19 Dec 2013 02:01:25 +0000 (13:01 +1100)]

daemon: Do not run monitor event if any other event is already running

Any currently running monitor events are cancelled if any other events
are scheduled. However, this does not stop monitor events to be run
when other events are already running.

Keep track of the number of active events and schedule monitor event
only if there are no active events.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit cbffbb7c2f406fc1d8ebad3c531cc2757232690e)

commit | commitdiff | tree

Martin Schwenke [Wed, 18 Dec 2013 06:08:55 +0000 (17:08 +1100)]

eventscripts: Move all eventscript state under $CTDB_VARDIR/state

Services can be flagged for reconfigure when they release IPs at
shutdown. The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.

$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event. Just push the service
state subdirectories down a level and put everything else in a
subdirectory.

This way all the eventscript state gets cleaned up every time CTDB
starts up.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104

(Imported from commit b7bfe46636d07c71f83daff884ec339c9b4aee72)

commit | commitdiff | tree

Martin Schwenke [Wed, 18 Dec 2013 04:37:11 +0000 (15:37 +1100)]

daemon: Untangle serialisation of 1st recovery -> startup -> monitor

At the moment ctdb_check_healthy() is overloaded to wait until the
first recovery is complete, handle the "startup" event and also
actually handle monitoring.  This is untidy and hard to follow.

Instead, have the daemon explicitly wait for 1st recovery after the
"setup" event.  When first recovery is complete, schedule a function
to handle the "startup" event.  When the "startup" event succeeds then
explicitly enable monitoring.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e6304d1e1adc86fc9c1199feb7b4802614fbc70f)

commit | commitdiff | tree

Martin Schwenke [Mon, 13 Jan 2014 05:34:50 +0000 (16:34 +1100)]

eventscripts: Print a count if killing TCP connections times out

Also update related test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 50e00b3e5224d53df0f3cc882e71737f928e01cd)

commit | commitdiff | tree

Martin Schwenke [Wed, 18 Dec 2013 02:51:22 +0000 (13:51 +1100)]

eventscripts: Reconfigure lock should be released quickly

Currently the lock is held until the corresponding eventscript
completes, since the process still exists.  If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time.  The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held.  This can cause an unwanted monitor replay.

Change this so that the lock is released immediately after the
reconfiguration is complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8eb20c23476d390bb8a12ba01c9f06e7ac4a1453)

commit | commitdiff | tree

Martin Schwenke [Wed, 18 Dec 2013 08:15:39 +0000 (19:15 +1100)]

recoverd: Do not refuse disabling takeover runs on inactive nodes

Failure might be expected when disabling takeover runs on banned
nodes, since they might be suffering from performance problems or
similar. More broadly, administrators who reconfigure a cluster that
isn't in a happy state aren't necessarily doing something sensible.

However, allowing takeover runs to be disabled on inactive nodes stops
reconfiguration of stopped nodes. This is probaby an unreasonable
limitation, so drop it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e77d5f99e396d71c1d354b3f8dc5ddf9ba5c5ee9)

commit | commitdiff | tree

Martin Schwenke [Tue, 26 Nov 2013 01:35:44 +0000 (12:35 +1100)]

recoverd: Ignore failed ipreallocated controls to inactive nodes

Currently timeouts for controls to inactive nodes can cause banning
credits to be applied. This should not happen.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a955d0bedce888597633c0c88082f29e1d26e503)

commit | commitdiff | tree

Amitay Isaacs [Wed, 18 Dec 2013 03:09:52 +0000 (14:09 +1100)]

daemon: Remove ctdb_fork_with_logging()

This function has been replaced with ctdb_vfork_with_logging().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 16 04:05:35 CET 2014 on sn-devel-104

(Imported from commit a92fd11ad1ccc904a999a254d249bbdc74f08f84)

commit | commitdiff | tree

Amitay Isaacs [Mon, 13 Jan 2014 04:16:46 +0000 (15:16 +1100)]

tests: Set CTDB_EVENT_HELPER when running with local daemons

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit dd98b9df6651054dabefdf439735042a78cfea2e)

commit | commitdiff | tree

Amitay Isaacs [Tue, 17 Dec 2013 08:22:20 +0000 (19:22 +1100)]

daemon: Remove unused code to run eventscripts

Eventscripts are now executed using a helper.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 97575e1ba0b7fecef2b26f2da1c0d8cb769a37a8)

commit | commitdiff | tree

Amitay Isaacs [Wed, 18 Dec 2013 03:07:57 +0000 (14:07 +1100)]

daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 2)

Use ctdb_event_helper to run debug-hung-script.sh.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 18c1f432102f1a5093927be9276d001180539e50)

commit | commitdiff | tree

Amitay Isaacs [Tue, 17 Dec 2013 08:19:51 +0000 (19:19 +1100)]

daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 1)

Use ctdb_event_helper to run eventscripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit d86662a925a072eb0374ad7743f4cf95c447bebb)

commit | commitdiff | tree

Amitay Isaacs [Mon, 16 Dec 2013 04:40:01 +0000 (15:40 +1100)]

daemon: Add helper process to execute event scripts

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 69324b61f0669022c7204ee08a4c7104865d4e9b)

commit | commitdiff | tree

Amitay Isaacs [Mon, 16 Dec 2013 04:39:29 +0000 (15:39 +1100)]

daemon: Add ctdb_vfork_with_logging()

This will be used to spawn lightweight helper processes to run
eventscripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 2879404388ed04af199a7e4451605b4435e8cc23)

commit | commitdiff | tree

Amitay Isaacs [Mon, 16 Dec 2013 04:57:42 +0000 (15:57 +1100)]

daemon: No need to call event scripts with CTDB_CALLED_BY_USER

This was added to support external monitoring using CTDB event scripts.
However, it was never used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 7aa20ccb5c747707fca349e9e0847cd0fca8c839)

commit | commitdiff | tree

Amitay Isaacs [Mon, 23 Dec 2013 00:46:48 +0000 (11:46 +1100)]

daemon: Deprecate RELOAD and STATUS events

These events have never been used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit bafa467021b7b2f17c61904b9f70f695a4395921)

commit | commitdiff | tree

Amitay Isaacs [Tue, 17 Dec 2013 08:48:29 +0000 (19:48 +1100)]

common: mkdir_p should not try to create .

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit b8c6bcc365ce08ddc0ebf51c002d53c08f144981)

commit | commitdiff | tree

Martin Schwenke [Mon, 9 Dec 2013 04:54:52 +0000 (15:54 +1100)]

eventscripts: Do not reconfigure in "monitor" events

"monitor" events can be cancelled.  If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped.  In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).

A long time ago we did service reconfiguration in "monitor" events
following failovers.  Service reconfiguration was then moved to the
"ipreallocated" event.  However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur.  The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated".  Therefore, IPs can be deleted without
running the required service reconfiguration.

The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.

This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.

Also update the associated tests.  Make the first confirm that the
monitor event no longer does reconfiguration.  Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104

(Imported from commit fdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd)

commit | commitdiff | tree

Michael Adam [Fri, 6 Dec 2013 00:37:34 +0000 (01:37 +0100)]

packaging:RPM: don't run autogen.

autogen is already run in maketarball.sh which generates
the tarball for the RPM.

This way, we don't have a rpm build dependency on autoconf.
Recent changes introduced a dependency into autoconf
version >= 2.60, so this fix allows the generated
source RPM to be built also on older platforms.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec 9 05:47:00 CET 2013 on sn-devel-104

(Imported from commit c65ad56d40c2ac286dc9d726119d04384981d0b3)

commit | commitdiff | tree

Michael Adam [Fri, 6 Dec 2013 00:33:57 +0000 (01:33 +0100)]

packaging:RPM: package the new manpages

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 7dbb068aa7e77f34377e762bbd65cb7ca72b85b4)

commit | commitdiff | tree

Michael Adam [Fri, 6 Dec 2013 00:31:11 +0000 (01:31 +0100)]

build: install the new manpages

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 0e8340229b0efa6291218a24865e52acb24bb12c)

commit | commitdiff | tree

Martin Schwenke [Mon, 25 Nov 2013 08:28:10 +0000 (19:28 +1100)]

Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 26 Nov 2013 04:41:50 +0000 (15:41 +1100)]

scripts: Be careful when generating unique pids for stack traces

sort expects the data to be line based, so make it so.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 26 Nov 2013 03:38:58 +0000 (14:38 +1100)]

config: Simplify the default CTDB configuration file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Tue, 26 Nov 2013 03:29:52 +0000 (14:29 +1100)]

scripts: Replace hard-coded /var/ctdb with CTDB_VARDIR

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 26 Nov 2013 02:27:46 +0000 (13:27 +1100)]

scripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT

If these configuration variables are not defined, then there should
a default fallback. This is a workaround till CTDB compile time
configuration can be accessed at runtime.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 26 Nov 2013 00:39:54 +0000 (11:39 +1100)]

eventscripts: Perform share check before NFS RPC checks in 60.ganesha

If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 22 Nov 2013 02:57:31 +0000 (13:57 +1100)]

tools/ctdb: Improve error checking when parsing node string

If a node isn't numeric then it is silently converted to 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 22 Nov 2013 02:57:03 +0000 (13:57 +1100)]

recoverd: Only respond to currently queued ipreallocated requests

Otherwise new requests can come in during the latter parts of the
takeover run when the IP allocation algorithm has already run, and the
new requests will be dequeued even though they haven't really be
processed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Tue, 19 Nov 2013 04:40:08 +0000 (15:40 +1100)]

scripts: Add an early exit to statd-callout's notify case

If $statd_state is empty then the loop will run once and print
spurious errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 19 Nov 2013 04:37:58 +0000 (15:37 +1100)]

eventscripts: Remove the nfs_statd_update() call from 60.ganesha

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 18 Nov 2013 10:04:49 +0000 (21:04 +1100)]

tests/integration: Neaten up some of the persistent database tests

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Mon, 18 Nov 2013 04:09:27 +0000 (15:09 +1100)]

tools/ctdb: Fix tstore command to generate ltdb header internally

This fixes an alignment discrepancy on 32-bit vs 64-bit platforms.

sizeof(struct ctdb_ltdb_header) = 20 (32-bit)
= 24 (64-bit)

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 15 Nov 2013 04:31:03 +0000 (15:31 +1100)]

tests/takeover: Fix bogus test description

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 15 Nov 2013 04:23:14 +0000 (15:23 +1100)]

tests/simple: User sleep_for() instead of sleep

Progress...

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 15 Nov 2013 04:21:58 +0000 (15:21 +1100)]

tests/simple: Update persistent DB tests

* Low level DB checks should ignore the sequence number record.

* A restart is needed after messing with the RecoverPDBBySeqNum
tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 15 Nov 2013 04:20:40 +0000 (15:20 +1100)]

recoverd: For persistent databases a sequence number of 0 is valid

Otherwise recovery ends up done by RSN when it is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 19 Nov 2013 04:31:39 +0000 (15:31 +1100)]

locking: Use vfork instead of fork to exec helpers

There is a significant overhead using fork() over vfork(), specially
when the child process execs a helper.  The overhead is in memory space
and time.

    # strace -c ./test_fork 1024 200
    count=1024, size=204800, total=200M
    failed fork=0
    time for fork() = 4879.597000 us
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    4.543321        3304      1375       375 clone
      0.00    0.000071           0      1033           mmap
      0.00    0.000000           0         1           read
      0.00    0.000000           0         3           write
      0.00    0.000000           0         2           open
      0.00    0.000000           0         2           close
      0.00    0.000000           0         3           fstat
      0.00    0.000000           0         3           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         3           brk
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    4.543392                  2429       376 total

    # strace -c ./test_vfork 1024 200
    count=1024, size=204800, total=200M
    failed fork=0
    time for fork() = 82.041000 us
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     96.47    0.001204           1      1000           vfork
      3.53    0.000044           0      1033           mmap
      0.00    0.000000           0         1           read
      0.00    0.000000           0         3           write
      0.00    0.000000           0         2           open
      0.00    0.000000           0         2           close
      0.00    0.000000           0         3           fstat
      0.00    0.000000           0         3           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         3           brk
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.001248                  2054         1 total

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 19 Nov 2013 05:13:20 +0000 (16:13 +1100)]

common: Refactor code to keep track of child processes

This code can then be used to track child processes created with vfork().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 15 Nov 2013 07:59:04 +0000 (18:59 +1100)]

scripts: Run a single instance of debug_locks.sh at a give time

This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.

Also, improve the logging format with separators.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 15 Nov 2013 07:36:09 +0000 (18:36 +1100)]

locking: Update current lock statistics when lock is scheduled

When a child process is created for a lock request, the current locks
statistics should be updated immediately. This will provide accurate
information on number of active lock requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 18 Nov 2013 04:48:22 +0000 (15:48 +1100)]

locking: Do not merge multiple lock requests to avoid unfair scheduling

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 15 Nov 2013 04:58:59 +0000 (15:58 +1100)]

locking: Implement active lock requests limit per database

This limit was currently a global limit and not per database. This
prevents any database freeze lock requests from getting scheduled if
the global limit was reached.

Only individual record requests should be limited and database freeze
requests should always get scheduled.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 8 Nov 2013 05:41:11 +0000 (16:41 +1100)]

scripts: Rewrite statd-callout to avoid 10 minute lag

This is naive and assumes no performance problems when updating
persistent DBs. It also does no error handling.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 13 Nov 2013 06:45:25 +0000 (17:45 +1100)]

client: Treat empty __db_sequence_number__ record as 0

This fixes the issue of transaction commit failing due to an empty
__db_sequence_number__ record in persistent database left by previous
cancelled transaction.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Wed, 13 Nov 2013 05:19:00 +0000 (16:19 +1100)]

doc: Update ctdb.1 - primarily to add pdelete/pfetch/pstore/ptrans

Also:

* More <refentryinfo> above <refmeta> to make the XML valid.

* Describe DB argument in introduction and use it for database
commands.

* Remove unnecessary format="linespecific" from <screen> tags, since
it will not be allowed in DocBook 5.0.

* Sort the items in "INTERNAL COMMANDS".

* Update/simplify some command descriptions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 6 Nov 2013 02:43:53 +0000 (13:43 +1100)]

tools/ctdb: New ptrans command

Also add test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Wed, 13 Nov 2013 03:04:17 +0000 (14:04 +1100)]

onnode: New -i option to stop stdin from being closed

This can be useful for piping data to onnode in certain circumstances.

There are now also enough command-line options that they should
definitely be alphabetically ordered.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 13 Nov 2013 03:13:52 +0000 (14:13 +1100)]

tests/integration: try_command_on_node() shouldn't lose onnode options

Currently it only passes the last (non -v) option seen. It should
pass them all.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 12 Nov 2013 04:16:49 +0000 (15:16 +1100)]

recoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUN

When running a mixed version cluster, compatibility with older
versions was was broken during recent refactorisation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Mon, 4 Nov 2013 01:56:39 +0000 (12:56 +1100)]

scripts: debug_locks.sh should use configuration to find TDB location

That is, don't use fixed paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 1 Nov 2013 03:34:20 +0000 (14:34 +1100)]

recoverd: A node refuses to play against itself

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Thu, 14 Nov 2013 03:25:47 +0000 (14:25 +1100)]

recoverd: Remove duplicate code to update flags during recovery

This also happens earlier in do_recovery() and the nodemap is not
updated after that, so this update is redundant.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 14 Nov 2013 03:14:10 +0000 (14:14 +1100)]

build: Update to latest upstream config.guess

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Wed, 13 Nov 2013 04:25:46 +0000 (15:25 +1100)]

tools/ctdb: Fix db commands when dbid is given instead of name

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 13 Nov 2013 03:33:31 +0000 (14:33 +1100)]

tests: CTDB tool should always be invoked as $CTDB instad of ctdb

$CTDB_TEST_WRAPPER is required only to run test functions or test binaries
on remote nodes. For running ctdb command, $CTDB is sufficient.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 13 Nov 2013 03:25:59 +0000 (14:25 +1100)]

tests: No need to run onnode in parallel for single node

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 13 Nov 2013 03:19:43 +0000 (14:19 +1100)]

tests: Remove -q option to try_command_on_node

This option is always passed to onnode by default.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 11 Nov 2013 01:41:17 +0000 (12:41 +1100)]

tests: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 11 Nov 2013 01:41:00 +0000 (12:41 +1100)]

tcp: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 11 Nov 2013 01:40:44 +0000 (12:40 +1100)]

tools/ctdb: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 11 Nov 2013 01:40:28 +0000 (12:40 +1100)]

common: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 11 Nov 2013 01:39:48 +0000 (12:39 +1100)]

client: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 11 Nov 2013 01:39:27 +0000 (12:39 +1100)]

server: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 7 Nov 2013 05:01:49 +0000 (16:01 +1100)]

tests: Fix calling of ctdb tool from test

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 7 Nov 2013 04:54:28 +0000 (15:54 +1100)]

Revert "tests: If transaction_start fails, try again"

This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.

This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 7 Nov 2013 04:54:20 +0000 (15:54 +1100)]

client: Make g_lock_lock() wait till lock is obtained

This makes the behaviour of g_lock_lock() similar to that implemented in
Samba. Now ctdb_transaction_start() will return NULL only when there are
failures and not when another transaction is active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Srikrishan Malik [Thu, 31 Oct 2013 06:24:58 +0000 (11:54 +0530)]

eventscript: Fix link creation failure if the link already exist but the target path is missing

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>

commit | commitdiff | tree

Martin Schwenke [Wed, 16 Oct 2013 00:46:54 +0000 (11:46 +1100)]

doc: Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 30 Oct 2013 02:22:21 +0000 (13:22 +1100)]

web: Add links to new manpages

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Mon, 23 Sep 2013 06:26:16 +0000 (16:26 +1000)]

doc: Major updates to manual pages

This includes new manpages for ctdb.7, ctdb.conf.5 and ctdb-tunables.7.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 30 Oct 2013 01:37:15 +0000 (12:37 +1100)]

tunables: Remove obsolete tunables

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Wed, 30 Oct 2013 01:17:37 +0000 (12:17 +1100)]

recoverd: Rebalancing should be done regardless tunable

Rebalance target nodes should be set even if a deferred rebalance is
not configured. The user can explicitly cause a takeover run.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 30 Oct 2013 00:32:28 +0000 (11:32 +1100)]

recoverd: Improve an error message in the election code

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 29 Oct 2013 05:38:42 +0000 (16:38 +1100)]

Revert "if a new node enters the cluster, that node will already be frozen at start"

This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94.
Furthermore, if a node doesn't force an election but wins it then it
can fail to record that it is the new recovery master. This can lead
to a reverse split brain where there is no recovery master.

This reverts commit c5035657606283d2e35bea40992505e84ca8e7be.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

Conflicts:
server/ctdb_recoverd.c

commit | commitdiff | tree

Martin Schwenke [Tue, 29 Oct 2013 03:05:41 +0000 (14:05 +1100)]

ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO

This is important enough that we should see it when the log level is
DEBUG_NOTICE.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 28 Oct 2013 05:20:44 +0000 (16:20 +1100)]

tests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test

This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 28 Oct 2013 05:14:40 +0000 (16:14 +1100)]

tests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test

This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.

This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration. Fixing it is hard and involves adding a more
complex stub for testparm. We already have that in the eventscript
unit tests above.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 28 Oct 2013 05:00:54 +0000 (16:00 +1100)]

eventscripts: Rewrite the smb.conf cache file handling

The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning. Instead, put a
timeout on a foreground update.

If the foreground update fails:

* If there's no available cache file then die.

* If there is a previous cache file then use it and log a warning.

* Do a background update at the end of the monitor event.

Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value. Update the
associated test to add a comma.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 25 Oct 2013 05:25:25 +0000 (16:25 +1100)]

tools/ctdb: Fix documentation string for ban command

Ban time of 0 is not supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 24 Oct 2013 00:13:16 +0000 (11:13 +1100)]

Revert "recoverd: Disable takeover runs on other nodes for 5 minutes"

5 minutes is too long to leave the cluster in limbo if the recovery
daemon dies during a takeover run, even though this is quite unlikely.
We need a new recover master to be able to do takeover runs fairly
quickly.

This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f.

commit | commitdiff | tree

Martin Schwenke [Thu, 24 Oct 2013 03:15:53 +0000 (14:15 +1100)]

tools/onnode: Fix healthy/ok node handling

This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output.  The fake "ctdb -Y status" output in the
test was never updated to reflect this change.

Instead of making sure that all columns are "0", just check that
they're not "1".  This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.

Also update associated tests.  The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.

This fixes samba bz#8122.

Signed-off-by: Martin Schwenke <martin@meltin.net>
onnode test fixup

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Mon, 28 Oct 2013 07:49:51 +0000 (18:49 +1100)]

daemon: Change the default recovery method for persistent databases

Use sequence numbers to do recovery for persistent databases instead of
RSNs. This fixes the problem of registry corruption during recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 23 Oct 2013 04:37:41 +0000 (15:37 +1100)]

packaging: Create runtime directories for CTDB

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Wed, 23 Oct 2013 00:28:26 +0000 (11:28 +1100)]

initscript: Update systemd configuration to put PID file in /run/ctdb

Elsewhere we're moving the socket to /var/run/ctdb. We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".

Signed-off-by: Martin Schwenke <martin@meltin.net>

CTDB repository

RSS Atom