git.samba.org - sahlberg/ctdb.git/log

git.samba.org / sahlberg / ctdb.git / log

Ronnie Sahlberg [Wed, 29 Jul 2009 03:25:43 +0000 (13:25 +1000)]

initial part of new vacuuming patch.

create some new fields for ctdb_db and tunables

Martin Schwenke [Wed, 30 Sep 2009 11:21:56 +0000 (21:21 +1000)]

Minor fixes to 01.reclock eventscript.

test -z really needs its argument to be quoted. Simplified a status
test.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 28 Sep 2009 04:12:59 +0000 (14:12 +1000)]

change the reclock fail count to 19 monitor intervals before we shut down ctdbd

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 28 Sep 2009 04:06:40 +0000 (14:06 +1000)]

add a new eventscript 01.reclock

    if the reclock file has been set, then this script will test that the
    reclock file can actually be accessed.
    if the file does not exist, or if the attempts to stat the file hangs,
    the node will be marked unhealthy after the third failed monitoring event
    and after the tenth failure, ctdb itself will shutdown.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 27 Jul 2009 03:10:32 +0000 (13:10 +1000)]

new version 1.0.82-7

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:17:05 +0000 (12:17 +1000)]

dont try sending a keepalive if the transport is down

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:16:13 +0000 (12:16 +1000)]

Dont even try allocating and sending a CALL packet if the transport is down

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:14:58 +0000 (12:14 +1000)]

failing a dmaster send due to the transport being down is fatal

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:13:15 +0000 (12:13 +1000)]

if we fail a dmaster migration due to the transport being down, then that is a fatal condition.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:10:27 +0000 (12:10 +1000)]

dont try to send error packets if the transport is down

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:09:28 +0000 (12:09 +1000)]

dont even try to send a message from the main daemon if the transport is down

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 02:03:12 +0000 (12:03 +1000)]

Dont try to allocate and send packets if the transport is down

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 30 Jun 2009 01:55:42 +0000 (11:55 +1000)]

dont even try to allocate a packet if the transport is down since it will fail

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 23 Jun 2009 01:29:26 +0000 (11:29 +1000)]

rename 99.routing to 11.routing so that it executed before the service scripts

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Jul 2009 00:54:05 +0000 (10:54 +1000)]

new version 1.0.82-6

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 18 May 2009 22:55:42 +0000 (08:55 +1000)]

Change the loglevel of "registered tcp client for ..." to INFO
instead of ERR

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Jun 2009 00:35:32 +0000 (10:35 +1000)]

new version 1.0.82-5

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Jun 2009 00:28:47 +0000 (10:28 +1000)]

When we ban a node, only drop the IPs on the node being banned, not on every node

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Jun 2009 02:33:06 +0000 (12:33 +1000)]

new version 1.0.82-4

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Jun 2009 02:31:36 +0000 (12:31 +1000)]

dont remove the socket when the dameon stops. This can race if the
service is immediately restarted

Conflicts:

server/ctdb_daemon.c

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 09:44:51 +0000 (19:44 +1000)]

new version 1.0.82-3

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 09:43:47 +0000 (19:43 +1000)]

make ctdb statistics machinereadable

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 07:59:03 +0000 (17:59 +1000)]

new version 1.0.82-2

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Jun 2009 07:56:20 +0000 (17:56 +1000)]

Add -Y machinereadable output to ctdb listvars and ctdb getvar

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 14 May 2009 00:33:25 +0000 (10:33 +1000)]

Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 13 May 2009 22:55:40 +0000 (08:55 +1000)]

new version 1.0.82

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 13 May 2009 22:55:05 +0000 (08:55 +1000)]

use scope host when adding the interface to loopback so we dont respond to ARPs for this ip

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 13 May 2009 22:12:48 +0000 (08:12 +1000)]

change the prefix NATGW_ to CTDB_NATGW_

commit | commitdiff | tree

Michael Adam [Tue, 12 May 2009 05:56:23 +0000 (07:56 +0200)]

ping pong: fix logic for mmap reads vs. preads

Michael

commit | commitdiff | tree

Michael Adam [Tue, 12 May 2009 20:59:35 +0000 (22:59 +0200)]

maketarball.sh: add GPL license header

Michael

commit | commitdiff | tree

Michael Adam [Tue, 12 May 2009 20:59:08 +0000 (22:59 +0200)]

makerpms.sh: add GPL license header

Michael

commit | commitdiff | tree

Michael Adam [Thu, 26 Mar 2009 18:03:03 +0000 (19:03 +0100)]

Remove generated binary files.

Noted by Mathieu Parent <math.parent@gmail.com>

Michael

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:21:26 +0000 (18:21 +1000)]

remove NATGW_PRIVATE_IFACE from the documentation since we do not need
it any more.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:42:13 +0000 (18:42 +1000)]

assign the natgw address to loopback and not the private network so that natgw will still work even when public and private networks are one and the same

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:39:34 +0000 (18:39 +1000)]

add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 12 May 2009 08:32:41 +0000 (18:32 +1000)]

check that a node is banned before trying to unban it.

commit | commitdiff | tree

Martin Schwenke [Fri, 3 Apr 2009 01:54:26 +0000 (12:54 +1100)]

In 51_ctdb_bench.sh now allows a 2% difference between positive and
negative. ctdb_bench.c checks to ensure the timer has advanced from 0
before dividing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 21 Apr 2009 06:50:37 +0000 (16:50 +1000)]

Avoid floating point divide by 0 in ctdb_fetch.c's bench_fetch().

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 1 May 2009 07:40:45 +0000 (17:40 +1000)]

Bug fixes for tests: simple/12_ctdb_getdebug.sh and scripts/test_wrap.

simple/12_ctdb_getdebug.sh now recognises output with multi-digit node
numbers.

Sharing the ctdb directory via NFS and testing on a real cluster by
setting CTDB_TEST_REAL_CLUSTER didn't work by default. The fix is to
hack scripts/test_wrap so that it tries to find a valid bin directory
next to the directory containing it is in.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 11 May 2009 22:59:49 +0000 (08:59 +1000)]

From: Sumit Bose <sbose@redhat.com>

fix handling of AC_INIT

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 04:43:17 +0000 (14:43 +1000)]

Fix lvsmaster and natgwlist nodespecs.

They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 04:14:11 +0000 (14:14 +1000)]

Updated onnode docs to reflect recent changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 03:39:31 +0000 (13:39 +1000)]

New lvs/lvsmaster and natgw/natgwlist nodespecs for onnode.

Some code re-factoring to implement this and to make it easy to
implement new ones. New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 6 May 2009 03:17:34 +0000 (13:17 +1000)]

New option "-o <prefix>" saves stdout from each node to file <prefix>.<ip>.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 5 May 2009 06:02:30 +0000 (16:02 +1000)]

Use ctdb_fetch_lock rather than ctdb_call.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 11 May 2009 04:50:28 +0000 (14:50 +1000)]

41.httpd event script workaround for RHEL5-ism.

RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores.  This means that eventually a node runs out of semaphores
and httpd can't be started.  So, before we attempt to start httpd we
clean up any semaphores owned by apache.  We also try to restart httpd
in the monitor event if httpd has gone away.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 11 May 2009 04:44:59 +0000 (14:44 +1000)]

Add a -Y machinereadable flag to "lvsmaster"

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 11 May 2009 03:56:28 +0000 (13:56 +1000)]

in the "lvsmaster" command, return -1 if there is no lvsmaster

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 8 May 2009 07:29:57 +0000 (17:29 +1000)]

new version 1.0.81

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 6 May 2009 10:32:39 +0000 (20:32 +1000)]

From: Sumit Bose <sbose@redhat.com>

fix handling of AC_INIT and read version from ctdb.spec

commit | commitdiff | tree

Michael Adam [Tue, 5 May 2009 11:16:38 +0000 (13:16 +0200)]

ping_pong: add GPL comment header with Tridge's copyright

Michael

commit | commitdiff | tree

Michael Adam [Wed, 29 Apr 2009 22:35:55 +0000 (00:35 +0200)]

ping_pong: get pread/pwrite prototypes from unistd.h

by defining _XOPEN_SOURCE to be 500 before including headers

Michael

commit | commitdiff | tree

Michael Adam [Wed, 29 Apr 2009 16:03:03 +0000 (18:03 +0200)]

ping_pong: reduce a couple of prototype warnings

Michael

commit | commitdiff | tree

Michael Adam [Wed, 29 Apr 2009 15:58:17 +0000 (17:58 +0200)]

packaging: also package ping_pong

Michael

commit | commitdiff | tree

Michael Adam [Wed, 29 Apr 2009 15:57:43 +0000 (17:57 +0200)]

build: also build and install ping_pong

Michael

commit | commitdiff | tree

Michael Adam [Wed, 29 Apr 2009 15:50:38 +0000 (17:50 +0200)]

add tridge's ping_pong.c to the utils folder

Michael

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 6 May 2009 00:29:07 +0000 (10:29 +1000)]

From Sumit Bose <sbose@redhat.com>

add more 64bit plattforms to configure.ac and preserve cli settings

commit | commitdiff | tree

Andrew Tridgell [Tue, 5 May 2009 06:06:58 +0000 (16:06 +1000)]

added link to michaels sambaxp papers

commit | commitdiff | tree

Andrew Tridgell [Tue, 5 May 2009 06:49:05 +0000 (16:49 +1000)]

allow pages in subdirs

commit | commitdiff | tree

Andrew Tridgell [Tue, 5 May 2009 06:52:24 +0000 (16:52 +1000)]

more subdir html support

commit | commitdiff | tree

Andrew Tridgell [Tue, 5 May 2009 22:18:21 +0000 (08:18 +1000)]

use less intrusive smbstatus call in periodic connections cleanup

commit | commitdiff | tree

root [Tue, 5 May 2009 06:33:21 +0000 (16:33 +1000)]

change the talloc hierarchy for the main transaction_start context and the individual transaction_all handles

commit | commitdiff | tree

root [Tue, 5 May 2009 21:32:25 +0000 (07:32 +1000)]

fixed a problem with clients disconnecting during a traverse

When a client (such as smbstatus) is killed, it may have outstanding
traverse children on remote nodes. We need to catch the client
disconnect in ctdbd and send a control to all nodes telling them to
kill those outstanding traverse children.

commit | commitdiff | tree

root [Fri, 1 May 2009 02:37:52 +0000 (12:37 +1000)]

new version 1.0.80

commit | commitdiff | tree

root [Fri, 1 May 2009 02:30:26 +0000 (12:30 +1000)]

when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0

Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero and when the client disconnects we decrement it to a negative number.

this is a pure cosmetic patch with no operational impact to ctdb

commit | commitdiff | tree

root [Thu, 30 Apr 2009 15:18:27 +0000 (01:18 +1000)]

Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 30 Apr 2009 07:38:30 +0000 (17:38 +1000)]

dont unconditionally kill/restart ctdb when given "service ctdb start" only start ctdb if it is not already running, and print an error message othervise

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 25 Apr 2009 22:47:38 +0000 (08:47 +1000)]

we only need to have transaction nesting disabled when we start the new transaction for the recovery

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 25 Apr 2009 22:42:54 +0000 (08:42 +1000)]

set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 25 Apr 2009 22:38:37 +0000 (08:38 +1000)]

add TDB_NO_NESTING. When this flag is set tdb will not allow any nested transactions and tdb_transaction_start() will implicitely _cancel() any pending transactions before starting any new ones.

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 24 Apr 2009 08:23:48 +0000 (18:23 +1000)]

add a tuneable RecoveryDropAllIPs so it is possible to control after how long a node that has been stuck in recovery will wait until it will yield all public addresses.

this now defaults to 60 seconds

This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 24 Apr 2009 08:09:51 +0000 (18:09 +1000)]

increase the loglevel for the message we print when we automatically release all ips when we have been in recovery for too long

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 24 Apr 2009 04:41:21 +0000 (14:41 +1000)]

tweak some timeouts so that we do trigger a banning even if the control hangs/timesout

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 24 Apr 2009 03:58:32 +0000 (13:58 +1000)]

If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned.

commit | commitdiff | tree

Andrew Tridgell [Thu, 23 Apr 2009 01:35:42 +0000 (11:35 +1000)]

change shutdown level for ctdb to be 01

We want ctdb to shutdown first, as it manages many other
services. With the old level of 32 the NFS service would shutdown
first, and that would trigger ctdb to do a recovery. Then ctdb itself
would be shutdown a few seconds later, which causes a lot of error
messages in the other nodes logs

commit | commitdiff | tree

Andrew Tridgell [Thu, 23 Apr 2009 01:00:16 +0000 (11:00 +1000)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 8 Apr 2009 02:56:52 +0000 (12:56 +1000)]

new version 1.0.79

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 8 Apr 2009 02:49:28 +0000 (12:49 +1000)]

create a function "remote_ip" which can be used from scripts to remove a single ip from an interface.

use this fucntion from the natgw eventscript

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 8 Apr 2009 00:45:00 +0000 (10:45 +1000)]

set libdir to ../lib64 on x86-64 platforms

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Apr 2009 23:34:20 +0000 (09:34 +1000)]

install ctdb.pc from the RPM

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Apr 2009 23:21:11 +0000 (09:21 +1000)]

From Mathieu Parent <math.parent@gmail.com>

Install the pkgconfig file

commit | commitdiff | tree

Mathieu Parent [Tue, 7 Apr 2009 23:14:20 +0000 (09:14 +1000)]

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Apr 2009 22:48:55 +0000 (08:48 +1000)]

install /etc/ctdb/notify.sh as executable.

this addresses bug 6250

commit | commitdiff | tree

Andrew Tridgell [Tue, 7 Apr 2009 07:07:41 +0000 (17:07 +1000)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 6 Apr 2009 04:03:09 +0000 (14:03 +1000)]

we only need to switch into client mode from the eventscript child if we are running the monitor event

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 6 Apr 2009 04:00:41 +0000 (14:00 +1000)]

increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell)

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 6 Apr 2009 03:16:36 +0000 (13:16 +1000)]

use _exit() and not exit() when we terminate a failed eventscript child process

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 6 Apr 2009 02:00:22 +0000 (12:00 +1000)]

We dont need to verify the nodemap on remote nodes that are banned

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 2 Apr 2009 03:50:43 +0000 (14:50 +1100)]

if we cant pull the remote nodemap off a node we should mark it as a culprit so it eventually becomes banned.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 1 Apr 2009 06:21:38 +0000 (17:21 +1100)]

Change the (dodgy) seqnumfrequency variable to have ms resolution instead of second resolution.

Rename the variable to SeqnumInterval for
1, it is an interval and not a 1/interval unit
2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable

this is a real dodgy variable

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 1 Apr 2009 06:13:48 +0000 (17:13 +1100)]

remove a prototype for a function no longer used

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 31 Mar 2009 09:04:45 +0000 (20:04 +1100)]

new release 1.0.78

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 31 Mar 2009 09:00:00 +0000 (20:00 +1100)]

we should also install the 11.natgw eventscript if we want to be able to use it

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 31 Mar 2009 03:38:52 +0000 (14:38 +1100)]

install a default /etc/ctdb/notify.sh script as example on how to use
snmptrap/email to notify that a node has changed health status

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 31 Mar 2009 03:23:31 +0000 (14:23 +1100)]

add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state.

This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 31 Mar 2009 00:42:10 +0000 (11:42 +1100)]

new version 1.0.77

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 31 Mar 2009 00:33:28 +0000 (11:33 +1100)]

we must also try to set the routes when we release an ip since during the release/10.interfaces there can actually be a window where the kernel decides to remove all addresses (before we manually add them back in 10.interfaces) during which the kernel may also decide to delete all routes since there are no gateways reachable through this interface anymore.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 25 Mar 2009 03:52:08 +0000 (14:52 +1100)]

new version 1.0.76

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 25 Mar 2009 03:46:05 +0000 (14:46 +1100)]

change the ctdb command table to allow us to describe commands which can be run independtly of the ctdb daemon.

create a new debugging command xpnn which discovers the pnn of the local node and which works even if the local daemon is not running

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 25 Mar 2009 02:46:41 +0000 (13:46 +1100)]

iupdate the documentation for NATGW to reflect that you can now use
multiple natgw groups in one cluster

CTDB repository

RSS Atom