git.samba.org - sahlberg/ctdb.git/log

Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking.

Update the init script to allow starting without a reclock file.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 25 Jun 2009 01:41:18 +0000 (11:41 +1000)]

Dont access the reclock file at all if VerifyRecoveryLock is zero and also
make sure the reclock file is closed if the variable is cleared at runtime

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 23 Jun 2009 01:21:37 +0000 (11:21 +1000)]

Merge root@10.1.1.27:/shared/ctdb/ctdb-git

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 23 Jun 2009 01:30:25 +0000 (11:30 +1000)]

new version 1.0.85

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 23 Jun 2009 01:29:26 +0000 (11:29 +1000)]

rename 99.routing to 11.routing so that it executed before the service scripts

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 23 Jun 2009 01:23:54 +0000 (11:23 +1000)]

new version 1.0.85

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 23 Jun 2009 01:01:04 +0000 (11:01 +1000)]

rename 99.routing to 11.routing so the eventscript is processed before
NFS and LVS

commit | commitdiff | tree

Martin Schwenke [Tue, 2 Jun 2009 05:54:04 +0000 (15:54 +1000)]

Fix minor problem in previous initscript commit.

The valgrind start case should not use daemon, since this is specific
to Red Hat.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 2 Jun 2009 00:01:50 +0000 (10:01 +1000)]

Initscript fixes, mostly for "stop" action.

Use a local variable $ctdbd so that we always run ctdbd from the the
same place and so that we know what to kill.  This variable respects
the $CTDBD environment variable, which may be used to specify an
alternative location for the daemon.

In the important cases use "pkill -0 -f" to check if ctdbd is
running.  Also, remove the special case for killing ctdbd when running
under valgrind.  The regular case will handle this just fine.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 19 Jun 2009 01:40:09 +0000 (11:40 +1000)]

Clean up handling the of CTDB restarts in testcases.

Glitches during restarts of the CTDB cluster have been causing some
tests to fail. This is because restarts are initiated in the body of
many tests. This adds a simple function ctdb_restart_when_done, which
schedules a restart using an existing hook in the test exit code.
This function is now used in tests that need to restart CTDB.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 19 Jun 2009 02:12:39 +0000 (12:12 +1000)]

Fix minor onnode bugs relating to local daemons.

Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression.  Due to the subtlety, this description is much longer than
the 1 line patch that fixes it!  The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:

1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).

In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).

The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr.  A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed.  This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell.  It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.

The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.

Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination.  The code uses the node
name as a suffix for the output filename(s).  Usually this is an IP
address.  However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename.  3 possible
fixes were considered:

1. Replace all '/'s in the node name by '_'s.  Nice and simple.
2. Use the basename of the node name.  However, sockets may be in
   different directories but have the same basename.
3. Create all required directories before redirecting.  This is a
   little more complex and probably doesn't meet the user's
   expectations.

Option (1) is implemented here.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 19 Jun 2009 05:55:13 +0000 (15:55 +1000)]

dont log an error if waitpid returns -1 and errno is ECHILD

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 19 Jun 2009 04:58:06 +0000 (14:58 +1000)]

dont leak file descriptors when set recmdoe timesout

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 19 Jun 2009 04:54:22 +0000 (14:54 +1000)]

dont leak file descriptors

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 19 Jun 2009 04:44:26 +0000 (14:44 +1000)]

in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process.
This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery.

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 19 Jun 2009 03:09:11 +0000 (13:09 +1000)]

reduce the timeout we wait for the reclock child process to finish to 5 seconds
before we log an error and abort

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 17 Jun 2009 23:20:18 +0000 (09:20 +1000)]

increase the timeout before we shutdown when ther ecovery daemon is hung

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 17 Jun 2009 23:11:46 +0000 (09:11 +1000)]

rename 99.routing to 11.routing
so it is executed before any of the service scripts

commit | commitdiff | tree

Martin Schwenke [Tue, 16 Jun 2009 02:47:59 +0000 (12:47 +1000)]

New tests for NFS and CIFS tickles.

New tests/complex/ subdirectory contains 2 new tests to ensure that
NFS and CIFS connections are tracked by CTDB and that tickle resets
are sent when a node is disabled.

Changes to ctdb_test_functions.bash to support these tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 16 Jun 2009 02:42:29 +0000 (12:42 +1000)]

Increase threshold in 51_ctdb_bench from 2% to 5%.

The threshold for the difference in the number messages sent in either
direction around the ring of nodes was set to 2%. Something
environmental is causing this different to sometimes be as high as 3%.
We're confident it isn't a CTDB issue so we're increasing the
threshold to 5%.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Jun 2009 00:28:47 +0000 (10:28 +1000)]

When we ban a node, only drop the IPs on the node being banned, not on every node

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Jun 2009 00:58:46 +0000 (10:58 +1000)]

remove unused variable

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Jun 2009 00:57:46 +0000 (10:57 +1000)]

dont require particular values for NoIPFailback and DeterministicIPs when
using ctdb moveip

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Jun 2009 00:56:50 +0000 (10:56 +1000)]

improve ctdb moveip so that it does not always trigger a recovery.

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 5 Jun 2009 07:57:14 +0000 (17:57 +1000)]

try avoiding to cause a recovery when deleting a public ip from a node

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 5 Jun 2009 07:00:47 +0000 (17:00 +1000)]

when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 4 Jun 2009 03:25:58 +0000 (13:25 +1000)]

dont list DELETED nodes in the ctdb listnodes output

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 4 Jun 2009 03:21:25 +0000 (13:21 +1000)]

make it possible to run 'ctdb listnodes' also if the daemon is not running.
in this case, read the nodes file directly instead of asking the local daemon for the list.

add an option -Y to provide machinereadable output to listnodes

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 3 Jun 2009 23:41:05 +0000 (09:41 +1000)]

From William Jojo <w.jojo[AT]hvcc.edu>

AIX dont have getopt.h by default.
Dont try including this file when building on AIX

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 05:05:41 +0000 (15:05 +1000)]

new version 1.0.84

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 05:03:44 +0000 (15:03 +1000)]

teach ONNODE about deleted nodes

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 03:13:03 +0000 (13:13 +1000)]

new version 1.0.83

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 02:43:11 +0000 (12:43 +1000)]

idocument how to remove a node from an existing cluster using 'ctdb
reloadnodes'

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 1 Jun 2009 05:43:30 +0000 (15:43 +1000)]

hide all DELETED nodes from the ctdb command output

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 1 Jun 2009 05:29:36 +0000 (15:29 +1000)]

lower the loglevel when we long that we skip an evenscript because it is not executable

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 1 Jun 2009 04:56:19 +0000 (14:56 +1000)]

dont try to queue packets for sending to (recently) deleted nodes since these nodes do not have a queue.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 1 Jun 2009 04:44:15 +0000 (14:44 +1000)]

when building the initial vnnmap, make sure to skip any deleted nodes

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 1 Jun 2009 04:39:34 +0000 (14:39 +1000)]

use num_nodes and the nodes array instead of walking the vnnmap
when counting the number of active nodes

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 1 Jun 2009 04:18:34 +0000 (14:18 +1000)]

add a new node state : DELETED.

This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.

This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like

   1.0.0.1
   #1.0.0.2
   1.0.0.3

After removing 1.0.0.2 from the cluster,  the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2

Any line in the nodes file that is commented out represents a DELETED pnn

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 29 May 2009 08:16:13 +0000 (18:16 +1000)]

dont remove the socket when the dameon stops. This can race if the
service is immediately restarted

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 07:04:42 +0000 (17:04 +1000)]

New attempt at TDB transaction nesting allow/disallow.

Make the default be that transaction is not allowed and any attempt to create a nested transaction will fail with TDB_ERR_NESTING.

If an application can cope with transaction nesting and the implicit
semantics of tdb_transaction_commit(), it can enable transaction nesting
by using the TDB_ALLOW_NESTING flag.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 06:55:27 +0000 (16:55 +1000)]

Revert "we only need to have transaction nesting disabled when we start the new transaction for the recovery"

This reverts commit bf8dae63d10498e6b6179bbacdd72f1ff0fc60be.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 06:55:02 +0000 (16:55 +1000)]

Revert "set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery"

This reverts commit 1b2029dbb055ff07367ebc1f307f5241320227b2.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 06:54:25 +0000 (16:54 +1000)]

Revert "add TDB_NO_NESTING. When this flag is set tdb will not allow any nested transactions and tdb_transaction_start() will implicitely _cancel() any pending transactions before starting any new ones."

This reverts commit 459e4ee135bd1cd24c15e5325906eb4ecfd550ec.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 02:33:52 +0000 (12:33 +1000)]

remove the obsolete ipmux component.
this is replaced by LVS since a long time

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 02:15:13 +0000 (12:15 +1000)]

fix the git path to the repository

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 02:02:36 +0000 (12:02 +1000)]

install the 31.clamd script as 644 by default

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 01:46:47 +0000 (11:46 +1000)]

add 31.clamd to the install and the rpm

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 02:10:29 +0000 (12:10 +1000)]

From Flavio Carmo Junior <carmo.flavio@gmail.com>

Add an eventscript to manage ClamAV

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 02:08:50 +0000 (12:08 +1000)]

From Flavio Carmo Junior <carmo.flavio@gmail.com>
(with modifications)

Add a webpage about CLAMAV support in CTDB

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 25 May 2009 01:44:27 +0000 (11:44 +1000)]

document the new support for ClamAV

commit | commitdiff | tree

Sumit Bose [Thu, 21 May 2009 11:43:41 +0000 (13:43 +0200)]

fix re pattern to accept the new recovery lock times in the statistics output

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 21 May 2009 04:10:45 +0000 (14:10 +1000)]

change the socket we use for sending grautious ARPs from AF_INET/SOCK_PACKET to AF_PACKET/SOCK_RAW

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 21 May 2009 01:49:16 +0000 (11:49 +1000)]

Whitespace changes and using the CTDB_NO_MEMORY() macro changes to
the previous patch.

commit | commitdiff | tree

Sumit Bose [Wed, 20 May 2009 10:08:13 +0000 (12:08 +0200)]

add missing checks on so far ignored return values

Most of these were found during a review by Jim Meyering <meyering@redhat.com>

commit | commitdiff | tree

Sumit Bose [Wed, 20 May 2009 10:02:27 +0000 (12:02 +0200)]

structure member node_list_file is not used anywhere

commit | commitdiff | tree

Sumit Bose [Wed, 20 May 2009 09:47:34 +0000 (11:47 +0200)]

structure member logfile is not used anywhere

commit | commitdiff | tree

Sumit Bose [Wed, 20 May 2009 07:17:01 +0000 (09:17 +0200)]

fix a configure warning while checking for netfilter.h

commit | commitdiff | tree

Sumit Bose [Wed, 20 May 2009 06:59:00 +0000 (08:59 +0200)]

added a missing dependency

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 18 May 2009 22:55:42 +0000 (08:55 +1000)]

Change the loglevel of "registered tcp client for ..." to INFO
instead of ERR

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 18 May 2009 22:47:19 +0000 (08:47 +1000)]

From : Flavio Carmo Junior <carmo.flavio@gmail.com>

Add a helper function that checks whether a unix domain socket exists
and there is a daemon LISTENING to it similar to the existing function
to check for a daemon LISTENING to a tcp/ip socket.

commit | commitdiff | tree

Volker Lendecke [Fri, 15 May 2009 20:08:21 +0000 (22:08 +0200)]

Fix http://ctdb.samba.org/download.html

commit | commitdiff | tree

Christian Ambach [Wed, 6 May 2009 17:01:58 +0000 (19:01 +0200)]

Remove error messages about a non-existing /var/log/log.ctdb when running ctdb with logging to syslog

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 14 May 2009 08:25:00 +0000 (18:25 +1000)]

add additional log info to track if/why we cant switch to client mode.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 14 May 2009 00:33:25 +0000 (10:33 +1000)]

Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 13 May 2009 22:55:40 +0000 (08:55 +1000)]

new version 1.0.82

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 13 May 2009 22:55:05 +0000 (08:55 +1000)]

use scope host when adding the interface to loopback so we dont respond to ARPs for this ip

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 13 May 2009 22:12:48 +0000 (08:12 +1000)]

change the prefix NATGW_ to CTDB_NATGW_

commit | commitdiff | tree

Michael Adam [Tue, 12 May 2009 05:56:23 +0000 (07:56 +0200)]

ping pong: fix logic for mmap reads vs. preads

Michael

commit | commitdiff | tree

Michael Adam [Tue, 12 May 2009 20:59:35 +0000 (22:59 +0200)]

maketarball.sh: add GPL license header

Michael

commit | commitdiff | tree

Michael Adam [Tue, 12 May 2009 20:59:08 +0000 (22:59 +0200)]

makerpms.sh: add GPL license header

Michael

commit | commitdiff | tree

Michael Adam [Thu, 26 Mar 2009 18:03:03 +0000 (19:03 +0100)]

Remove generated binary files.

Noted by Mathieu Parent <math.parent@gmail.com>

Michael

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:21:26 +0000 (18:21 +1000)]

remove NATGW_PRIVATE_IFACE from the documentation since we do not need
it any more.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:42:13 +0000 (18:42 +1000)]

assign the natgw address to loopback and not the private network so that natgw will still work even when public and private networks are one and the same

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:39:34 +0000 (18:39 +1000)]

add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:32:41 +0000 (18:32 +1000)]

check that a node is banned before trying to unban it.

commit | commitdiff | tree

Martin Schwenke [Fri, 3 Apr 2009 01:54:26 +0000 (12:54 +1100)]

In 51_ctdb_bench.sh now allows a 2% difference between positive and
negative. ctdb_bench.c checks to ensure the timer has advanced from 0
before dividing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 21 Apr 2009 06:50:37 +0000 (16:50 +1000)]

Avoid floating point divide by 0 in ctdb_fetch.c's bench_fetch().

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 1 May 2009 07:40:45 +0000 (17:40 +1000)]

Bug fixes for tests: simple/12_ctdb_getdebug.sh and scripts/test_wrap.

simple/12_ctdb_getdebug.sh now recognises output with multi-digit node
numbers.

Sharing the ctdb directory via NFS and testing on a real cluster by
setting CTDB_TEST_REAL_CLUSTER didn't work by default. The fix is to
hack scripts/test_wrap so that it tries to find a valid bin directory
next to the directory containing it is in.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 11 May 2009 22:59:49 +0000 (08:59 +1000)]

From: Sumit Bose <sbose@redhat.com>

fix handling of AC_INIT

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 04:43:17 +0000 (14:43 +1000)]

Fix lvsmaster and natgwlist nodespecs.

They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 04:14:11 +0000 (14:14 +1000)]

Updated onnode docs to reflect recent changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 03:39:31 +0000 (13:39 +1000)]

New lvs/lvsmaster and natgw/natgwlist nodespecs for onnode.

Some code re-factoring to implement this and to make it easy to
implement new ones. New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 6 May 2009 03:17:34 +0000 (13:17 +1000)]

New option "-o <prefix>" saves stdout from each node to file <prefix>.<ip>.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 5 May 2009 06:02:30 +0000 (16:02 +1000)]

Use ctdb_fetch_lock rather than ctdb_call.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 04:50:28 +0000 (14:50 +1000)]

41.httpd event script workaround for RHEL5-ism.

RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores.  This means that eventually a node runs out of semaphores
and httpd can't be started.  So, before we attempt to start httpd we
clean up any semaphores owned by apache.  We also try to restart httpd
in the monitor event if httpd has gone away.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 11 May 2009 04:44:59 +0000 (14:44 +1000)]

Add a -Y machinereadable flag to "lvsmaster"

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 11 May 2009 03:56:28 +0000 (13:56 +1000)]

in the "lvsmaster" command, return -1 if there is no lvsmaster

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 8 May 2009 07:29:57 +0000 (17:29 +1000)]

new version 1.0.81

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 6 May 2009 10:32:39 +0000 (20:32 +1000)]

From: Sumit Bose <sbose@redhat.com>

fix handling of AC_INIT and read version from ctdb.spec