Ronnie Sahlberg [Mon, 29 Jun 2009 23:09:06 +0000 (09:09 +1000)]
New version 1.0.86
Ronnie Sahlberg [Thu, 25 Jun 2009 04:45:57 +0000 (14:45 +1000)]
update the man pages with the "getreclock" and "setreclock" commands.
Ronnie Sahlberg [Thu, 25 Jun 2009 04:45:17 +0000 (14:45 +1000)]
Do not allow the "VerifyRecoveryLock" tunable to be changed if there is no reclock file
Ronnie Sahlberg [Thu, 25 Jun 2009 04:34:21 +0000 (14:34 +1000)]
disable VerifyRecoveryLock when the user modifies the filename
Ronnie Sahlberg [Thu, 25 Jun 2009 04:25:18 +0000 (14:25 +1000)]
add a control to set the reclock file
Ronnie Sahlberg [Thu, 25 Jun 2009 02:55:43 +0000 (12:55 +1000)]
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled
Ronnie Sahlberg [Thu, 25 Jun 2009 02:26:14 +0000 (12:26 +1000)]
return NULL and not a "" when there is no reclock file returned from the server
Ronnie Sahlberg [Thu, 25 Jun 2009 02:17:19 +0000 (12:17 +1000)]
add a control to read the current reclock file from a node
Ronnie Sahlberg [Thu, 25 Jun 2009 01:59:21 +0000 (11:59 +1000)]
Document that you can run ctdb without a reclock file in the sysconfig file
Ronnie Sahlberg [Thu, 25 Jun 2009 01:50:45 +0000 (11:50 +1000)]
Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking.
Update the init script to allow starting without a reclock file.
Ronnie Sahlberg [Thu, 25 Jun 2009 01:41:18 +0000 (11:41 +1000)]
Dont access the reclock file at all if VerifyRecoveryLock is zero and also
make sure the reclock file is closed if the variable is cleared at runtime
Ronnie Sahlberg [Tue, 23 Jun 2009 01:21:37 +0000 (11:21 +1000)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Ronnie Sahlberg [Tue, 23 Jun 2009 01:30:25 +0000 (11:30 +1000)]
new version 1.0.85
Ronnie Sahlberg [Tue, 23 Jun 2009 01:29:26 +0000 (11:29 +1000)]
rename 99.routing to 11.routing so that it executed before the service scripts
Ronnie Sahlberg [Tue, 23 Jun 2009 01:23:54 +0000 (11:23 +1000)]
new version 1.0.85
Ronnie Sahlberg [Tue, 23 Jun 2009 01:01:04 +0000 (11:01 +1000)]
rename 99.routing to 11.routing so the eventscript is processed before
NFS and LVS
Martin Schwenke [Tue, 2 Jun 2009 05:54:04 +0000 (15:54 +1000)]
Fix minor problem in previous initscript commit.
The valgrind start case should not use daemon, since this is specific
to Red Hat.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 2 Jun 2009 00:01:50 +0000 (10:01 +1000)]
Initscript fixes, mostly for "stop" action.
Use a local variable $ctdbd so that we always run ctdbd from the the
same place and so that we know what to kill. This variable respects
the $CTDBD environment variable, which may be used to specify an
alternative location for the daemon.
In the important cases use "pkill -0 -f" to check if ctdbd is
running. Also, remove the special case for killing ctdbd when running
under valgrind. The regular case will handle this just fine.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 19 Jun 2009 01:40:09 +0000 (11:40 +1000)]
Clean up handling the of CTDB restarts in testcases.
Glitches during restarts of the CTDB cluster have been causing some
tests to fail. This is because restarts are initiated in the body of
many tests. This adds a simple function ctdb_restart_when_done, which
schedules a restart using an existing hook in the test exit code.
This function is now used in tests that need to restart CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 19 Jun 2009 02:12:39 +0000 (12:12 +1000)]
Fix minor onnode bugs relating to local daemons.
Commit
a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression. Due to the subtlety, this description is much longer than
the 1 line patch that fixes it! The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:
1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).
In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).
The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr. A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed. This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell. It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.
The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.
Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination. The code uses the node
name as a suffix for the output filename(s). Usually this is an IP
address. However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename. 3 possible
fixes were considered:
1. Replace all '/'s in the node name by '_'s. Nice and simple.
2. Use the basename of the node name. However, sockets may be in
different directories but have the same basename.
3. Create all required directories before redirecting. This is a
little more complex and probably doesn't meet the user's
expectations.
Option (1) is implemented here.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Fri, 19 Jun 2009 05:55:13 +0000 (15:55 +1000)]
dont log an error if waitpid returns -1 and errno is ECHILD
Ronnie Sahlberg [Fri, 19 Jun 2009 04:58:06 +0000 (14:58 +1000)]
dont leak file descriptors when set recmdoe timesout
Ronnie Sahlberg [Fri, 19 Jun 2009 04:54:22 +0000 (14:54 +1000)]
dont leak file descriptors
Ronnie Sahlberg [Fri, 19 Jun 2009 04:44:26 +0000 (14:44 +1000)]
in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process.
This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery.
Ronnie Sahlberg [Fri, 19 Jun 2009 03:09:11 +0000 (13:09 +1000)]
reduce the timeout we wait for the reclock child process to finish to 5 seconds
before we log an error and abort
Ronnie Sahlberg [Wed, 17 Jun 2009 23:20:18 +0000 (09:20 +1000)]
increase the timeout before we shutdown when ther ecovery daemon is hung
Ronnie Sahlberg [Wed, 17 Jun 2009 23:11:46 +0000 (09:11 +1000)]
rename 99.routing to 11.routing
so it is executed before any of the service scripts
Martin Schwenke [Tue, 16 Jun 2009 02:47:59 +0000 (12:47 +1000)]
New tests for NFS and CIFS tickles.
New tests/complex/ subdirectory contains 2 new tests to ensure that
NFS and CIFS connections are tracked by CTDB and that tickle resets
are sent when a node is disabled.
Changes to ctdb_test_functions.bash to support these tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Jun 2009 02:42:29 +0000 (12:42 +1000)]
Increase threshold in 51_ctdb_bench from 2% to 5%.
The threshold for the difference in the number messages sent in either
direction around the ring of nodes was set to 2%. Something
environmental is causing this different to sometimes be as high as 3%.
We're confident it isn't a CTDB issue so we're increasing the
threshold to 5%.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Wed, 10 Jun 2009 00:28:47 +0000 (10:28 +1000)]
When we ban a node, only drop the IPs on the node being banned, not on every node
Ronnie Sahlberg [Tue, 9 Jun 2009 00:58:46 +0000 (10:58 +1000)]
remove unused variable
Ronnie Sahlberg [Tue, 9 Jun 2009 00:57:46 +0000 (10:57 +1000)]
dont require particular values for NoIPFailback and DeterministicIPs when
using ctdb moveip
Ronnie Sahlberg [Tue, 9 Jun 2009 00:56:50 +0000 (10:56 +1000)]
improve ctdb moveip so that it does not always trigger a recovery.
Ronnie Sahlberg [Fri, 5 Jun 2009 07:57:14 +0000 (17:57 +1000)]
try avoiding to cause a recovery when deleting a public ip from a node
Ronnie Sahlberg [Fri, 5 Jun 2009 07:00:47 +0000 (17:00 +1000)]
when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing
Ronnie Sahlberg [Thu, 4 Jun 2009 03:25:58 +0000 (13:25 +1000)]
dont list DELETED nodes in the ctdb listnodes output
Ronnie Sahlberg [Thu, 4 Jun 2009 03:21:25 +0000 (13:21 +1000)]
make it possible to run 'ctdb listnodes' also if the daemon is not running.
in this case, read the nodes file directly instead of asking the local daemon for the list.
add an option -Y to provide machinereadable output to listnodes
Ronnie Sahlberg [Wed, 3 Jun 2009 23:41:05 +0000 (09:41 +1000)]
From William Jojo <w.jojo[AT]hvcc.edu>
AIX dont have getopt.h by default.
Dont try including this file when building on AIX
Ronnie Sahlberg [Tue, 2 Jun 2009 05:05:41 +0000 (15:05 +1000)]
new version 1.0.84
Ronnie Sahlberg [Tue, 2 Jun 2009 05:03:44 +0000 (15:03 +1000)]
teach ONNODE about deleted nodes
Ronnie Sahlberg [Tue, 2 Jun 2009 03:13:03 +0000 (13:13 +1000)]
new version 1.0.83
Ronnie Sahlberg [Tue, 2 Jun 2009 02:43:11 +0000 (12:43 +1000)]
idocument how to remove a node from an existing cluster using 'ctdb
reloadnodes'
Ronnie Sahlberg [Mon, 1 Jun 2009 05:43:30 +0000 (15:43 +1000)]
hide all DELETED nodes from the ctdb command output
Ronnie Sahlberg [Mon, 1 Jun 2009 05:29:36 +0000 (15:29 +1000)]
lower the loglevel when we long that we skip an evenscript because it is not executable
Ronnie Sahlberg [Mon, 1 Jun 2009 04:56:19 +0000 (14:56 +1000)]
dont try to queue packets for sending to (recently) deleted nodes since these nodes do not have a queue.
Ronnie Sahlberg [Mon, 1 Jun 2009 04:44:15 +0000 (14:44 +1000)]
when building the initial vnnmap, make sure to skip any deleted nodes
Ronnie Sahlberg [Mon, 1 Jun 2009 04:39:34 +0000 (14:39 +1000)]
use num_nodes and the nodes array instead of walking the vnnmap
when counting the number of active nodes
Ronnie Sahlberg [Mon, 1 Jun 2009 04:18:34 +0000 (14:18 +1000)]
add a new node state : DELETED.
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.
This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like
1.0.0.1
#1.0.0.2
1.0.0.3
After removing 1.0.0.2 from the cluster, the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2
Any line in the nodes file that is commented out represents a DELETED pnn
Ronnie Sahlberg [Fri, 29 May 2009 08:16:13 +0000 (18:16 +1000)]
dont remove the socket when the dameon stops. This can race if the
service is immediately restarted
Ronnie Sahlberg [Mon, 25 May 2009 07:04:42 +0000 (17:04 +1000)]
New attempt at TDB transaction nesting allow/disallow.
Make the default be that transaction is not allowed and any attempt to create a nested transaction will fail with TDB_ERR_NESTING.
If an application can cope with transaction nesting and the implicit
semantics of tdb_transaction_commit(), it can enable transaction nesting
by using the TDB_ALLOW_NESTING flag.
Ronnie Sahlberg [Mon, 25 May 2009 06:55:27 +0000 (16:55 +1000)]
Revert "we only need to have transaction nesting disabled when we start the new transaction for the recovery"
This reverts commit
bf8dae63d10498e6b6179bbacdd72f1ff0fc60be.
Ronnie Sahlberg [Mon, 25 May 2009 06:55:02 +0000 (16:55 +1000)]
Revert "set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery"
This reverts commit
1b2029dbb055ff07367ebc1f307f5241320227b2.
Ronnie Sahlberg [Mon, 25 May 2009 06:54:25 +0000 (16:54 +1000)]
Revert "add TDB_NO_NESTING. When this flag is set tdb will not allow any nested transactions and tdb_transaction_start() will implicitely _cancel() any pending transactions before starting any new ones."
This reverts commit
459e4ee135bd1cd24c15e5325906eb4ecfd550ec.
Ronnie Sahlberg [Mon, 25 May 2009 02:33:52 +0000 (12:33 +1000)]
remove the obsolete ipmux component.
this is replaced by LVS since a long time
Ronnie Sahlberg [Mon, 25 May 2009 02:15:13 +0000 (12:15 +1000)]
fix the git path to the repository
Ronnie Sahlberg [Mon, 25 May 2009 02:02:36 +0000 (12:02 +1000)]
install the 31.clamd script as 644 by default
Ronnie Sahlberg [Mon, 25 May 2009 01:46:47 +0000 (11:46 +1000)]
add 31.clamd to the install and the rpm
Ronnie Sahlberg [Mon, 25 May 2009 02:10:29 +0000 (12:10 +1000)]
From Flavio Carmo Junior <carmo.flavio@gmail.com>
Add an eventscript to manage ClamAV
Ronnie Sahlberg [Mon, 25 May 2009 02:08:50 +0000 (12:08 +1000)]
From Flavio Carmo Junior <carmo.flavio@gmail.com>
(with modifications)
Add a webpage about CLAMAV support in CTDB
Ronnie Sahlberg [Mon, 25 May 2009 01:44:27 +0000 (11:44 +1000)]
document the new support for ClamAV
Sumit Bose [Thu, 21 May 2009 11:43:41 +0000 (13:43 +0200)]
fix re pattern to accept the new recovery lock times in the statistics output
Ronnie Sahlberg [Thu, 21 May 2009 04:10:45 +0000 (14:10 +1000)]
change the socket we use for sending grautious ARPs from AF_INET/SOCK_PACKET to AF_PACKET/SOCK_RAW
Ronnie Sahlberg [Thu, 21 May 2009 01:49:16 +0000 (11:49 +1000)]
Whitespace changes and using the CTDB_NO_MEMORY() macro changes to
the previous patch.
Sumit Bose [Wed, 20 May 2009 10:08:13 +0000 (12:08 +0200)]
add missing checks on so far ignored return values
Most of these were found during a review by Jim Meyering <meyering@redhat.com>
Sumit Bose [Wed, 20 May 2009 10:02:27 +0000 (12:02 +0200)]
structure member node_list_file is not used anywhere
Sumit Bose [Wed, 20 May 2009 09:47:34 +0000 (11:47 +0200)]
structure member logfile is not used anywhere
Sumit Bose [Wed, 20 May 2009 07:17:01 +0000 (09:17 +0200)]
fix a configure warning while checking for netfilter.h
Sumit Bose [Wed, 20 May 2009 06:59:00 +0000 (08:59 +0200)]
added a missing dependency
Ronnie Sahlberg [Mon, 18 May 2009 22:55:42 +0000 (08:55 +1000)]
Change the loglevel of "registered tcp client for ..." to INFO
instead of ERR
Ronnie Sahlberg [Mon, 18 May 2009 22:47:19 +0000 (08:47 +1000)]
From : Flavio Carmo Junior <carmo.flavio@gmail.com>
Add a helper function that checks whether a unix domain socket exists
and there is a daemon LISTENING to it similar to the existing function
to check for a daemon LISTENING to a tcp/ip socket.
Volker Lendecke [Fri, 15 May 2009 20:08:21 +0000 (22:08 +0200)]
Fix http://ctdb.samba.org/download.html
Christian Ambach [Wed, 6 May 2009 17:01:58 +0000 (19:01 +0200)]
Remove error messages about a non-existing /var/log/log.ctdb when running ctdb with logging to syslog
Ronnie Sahlberg [Thu, 14 May 2009 08:25:00 +0000 (18:25 +1000)]
add additional log info to track if/why we cant switch to client mode.
Ronnie Sahlberg [Thu, 14 May 2009 00:33:25 +0000 (10:33 +1000)]
Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
Ronnie Sahlberg [Wed, 13 May 2009 22:55:40 +0000 (08:55 +1000)]
new version 1.0.82
Ronnie Sahlberg [Wed, 13 May 2009 22:55:05 +0000 (08:55 +1000)]
use scope host when adding the interface to loopback so we dont respond to ARPs for this ip
Ronnie Sahlberg [Wed, 13 May 2009 22:12:48 +0000 (08:12 +1000)]
change the prefix NATGW_ to CTDB_NATGW_
Michael Adam [Tue, 12 May 2009 05:56:23 +0000 (07:56 +0200)]
ping pong: fix logic for mmap reads vs. preads
Michael
Michael Adam [Tue, 12 May 2009 20:59:35 +0000 (22:59 +0200)]
maketarball.sh: add GPL license header
Michael
Michael Adam [Tue, 12 May 2009 20:59:08 +0000 (22:59 +0200)]
makerpms.sh: add GPL license header
Michael
Michael Adam [Thu, 26 Mar 2009 18:03:03 +0000 (19:03 +0100)]
Remove generated binary files.
Noted by Mathieu Parent <math.parent@gmail.com>
Michael
Ronnie Sahlberg [Tue, 12 May 2009 08:21:26 +0000 (18:21 +1000)]
remove NATGW_PRIVATE_IFACE from the documentation since we do not need
it any more.
Ronnie Sahlberg [Tue, 12 May 2009 08:42:13 +0000 (18:42 +1000)]
assign the natgw address to loopback and not the private network so that natgw will still work even when public and private networks are one and the same
Ronnie Sahlberg [Tue, 12 May 2009 08:39:34 +0000 (18:39 +1000)]
add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging.
Ronnie Sahlberg [Tue, 12 May 2009 08:32:41 +0000 (18:32 +1000)]
check that a node is banned before trying to unban it.
Martin Schwenke [Fri, 3 Apr 2009 01:54:26 +0000 (12:54 +1100)]
In 51_ctdb_bench.sh now allows a 2% difference between positive and
negative. ctdb_bench.c checks to ensure the timer has advanced from 0
before dividing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 21 Apr 2009 06:50:37 +0000 (16:50 +1000)]
Avoid floating point divide by 0 in ctdb_fetch.c's bench_fetch().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 1 May 2009 07:40:45 +0000 (17:40 +1000)]
Bug fixes for tests: simple/12_ctdb_getdebug.sh and scripts/test_wrap.
simple/12_ctdb_getdebug.sh now recognises output with multi-digit node
numbers.
Sharing the ctdb directory via NFS and testing on a real cluster by
setting CTDB_TEST_REAL_CLUSTER didn't work by default. The fix is to
hack scripts/test_wrap so that it tries to find a valid bin directory
next to the directory containing it is in.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 11 May 2009 22:59:49 +0000 (08:59 +1000)]
From: Sumit Bose <sbose@redhat.com>
fix handling of AC_INIT
Martin Schwenke [Mon, 11 May 2009 04:43:17 +0000 (14:43 +1000)]
Fix lvsmaster and natgwlist nodespecs.
They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 11 May 2009 04:14:11 +0000 (14:14 +1000)]
Updated onnode docs to reflect recent changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 11 May 2009 03:39:31 +0000 (13:39 +1000)]
New lvs/lvsmaster and natgw/natgwlist nodespecs for onnode.
Some code re-factoring to implement this and to make it easy to
implement new ones. New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 6 May 2009 03:17:34 +0000 (13:17 +1000)]
New option "-o <prefix>" saves stdout from each node to file <prefix>.<ip>.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 5 May 2009 06:02:30 +0000 (16:02 +1000)]
Use ctdb_fetch_lock rather than ctdb_call.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 11 May 2009 04:50:28 +0000 (14:50 +1000)]
41.httpd event script workaround for RHEL5-ism.
RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores. This means that eventually a node runs out of semaphores
and httpd can't be started. So, before we attempt to start httpd we
clean up any semaphores owned by apache. We also try to restart httpd
in the monitor event if httpd has gone away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 11 May 2009 04:44:59 +0000 (14:44 +1000)]
Add a -Y machinereadable flag to "lvsmaster"
Ronnie Sahlberg [Mon, 11 May 2009 03:56:28 +0000 (13:56 +1000)]
in the "lvsmaster" command, return -1 if there is no lvsmaster
Ronnie Sahlberg [Fri, 8 May 2009 07:29:57 +0000 (17:29 +1000)]
new version 1.0.81
Ronnie Sahlberg [Wed, 6 May 2009 10:32:39 +0000 (20:32 +1000)]
From: Sumit Bose <sbose@redhat.com>
fix handling of AC_INIT and read version from ctdb.spec
Michael Adam [Tue, 5 May 2009 11:16:38 +0000 (13:16 +0200)]
ping_pong: add GPL comment header with Tridge's copyright
Michael