git.samba.org - sahlberg/ctdb.git/log

From Alexander Saupp.
If we use vlan tagging and bonding we must strip the vlan part off the name
so we can check the main bonde device for status.

I.e. check bond0 instead of bond0.<VLANTAG>

commit | commitdiff | tree

Andrew Tridgell [Wed, 23 Jul 2008 05:36:23 +0000 (15:36 +1000)]

run the testparm commands in 50.samba in the background, only running
in the foreground if something fails

commit | commitdiff | tree

Andrew Tridgell [Wed, 23 Jul 2008 05:35:46 +0000 (15:35 +1000)]

allow for probing of directories without raising an error

commit | commitdiff | tree

Andrew Tridgell [Wed, 23 Jul 2008 05:25:52 +0000 (15:25 +1000)]

fixed buffering in ctdb logging code to handle multiple lines
correctly

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 21 Jul 2008 23:07:42 +0000 (09:07 +1000)]

From Michael Adams,
change one element from private to private_data

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 18 Jul 2008 03:49:05 +0000 (13:49 +1000)]

new version 1.0.50

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 18 Jul 2008 03:42:39 +0000 (13:42 +1000)]

Merge git://git.samba.org/tridge/ctdb

commit | commitdiff | tree

Andrew Tridgell [Fri, 18 Jul 2008 03:46:01 +0000 (13:46 +1000)]

fixed a bug where we would look for a signal past the end of the
signal arrays

This could cause the events code to get into a loop chewing CPU

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 18 Jul 2008 02:07:25 +0000 (12:07 +1000)]

if a new node enters the cluster, that node will already be frozen at start
but the rest of the nodes are not frozen.

at this stage an election is called by the new node.

Since in this case the nodes are not froze, we can not modify the recmaster
of the nodes so it is expected that this control would fail.

Add a boolean to send_election_request() to make it not
try to set the recmaster locally for the case where we are in an election phase
while not frozen.

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 18 Jul 2008 00:59:34 +0000 (10:59 +1000)]

We can not assume that just because we could complete a TCP handshake
to the remote node that
1, we are in fact talking to a CTDB daemon
2, that IF we are talking to a ctdb daemon, it is operational.

So, we can not blindly mark the node as CONNECTED just because
we can open a TCP connection.

Instead we rely on "If we did get a KEEPALIVE from the remote node,
is is connected"

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 18 Jul 2008 00:41:18 +0000 (10:41 +1000)]

lower a debug statement

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 18 Jul 2008 00:38:51 +0000 (10:38 +1000)]

lower a debug message

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 17 Jul 2008 08:53:54 +0000 (18:53 +1000)]

Allow the fix-to-make-persistent-writes-safer work with unpatched samba versions

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 17 Jul 2008 08:47:20 +0000 (18:47 +1000)]

Only decrement the "number of persistent writes in flight" If/when
it is >0 or we will break if used against an unpatched samba server

commit | commitdiff | tree

Andrew Tridgell [Thu, 17 Jul 2008 08:45:15 +0000 (18:45 +1000)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 17 Jul 2008 03:56:17 +0000 (13:56 +1000)]

new version 1.0.48

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 17 Jul 2008 03:50:55 +0000 (13:50 +1000)]

Add two new controls to start and cancel a persistent update.
This allows ctdb to automatically start a new full blown recovery
if a client has started updating the local tdb for a persistent database
but is kill -9ed before it has ensured the update is distributed clusterwide.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 16 Jul 2008 23:04:15 +0000 (09:04 +1000)]

Do not allow "ctdb eventscript" to start new eventscripts while we are in recovery mode

commit | commitdiff | tree

Andrew Tridgell [Wed, 16 Jul 2008 06:58:16 +0000 (16:58 +1000)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 16 Jul 2008 06:51:37 +0000 (16:51 +1000)]

Merge git://git.samba.org/tridge/ctdb

commit | commitdiff | tree

Andrew Tridgell [Wed, 16 Jul 2008 02:46:43 +0000 (12:46 +1000)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 16 Jul 2008 02:23:18 +0000 (12:23 +1000)]

change how we filter out "empty" records in the traversecode
so that we output the same list of keys in "catdb" as "tdbdump".

when traversing a persistent database, as an optimization, only
traverse on the local node (and thus skip checking if we are
dmaster or not). If the local node is not part of the vnnmap and thus
would not be guaranteed to have an uptodate persistent database
we instead traverse it on one of the other nodes that are in the vnnmap.

commit | commitdiff | tree

Andrew Tridgell [Wed, 16 Jul 2008 02:23:05 +0000 (12:23 +1000)]

fixed postun script to prevent corrupting RPM database

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 15 Jul 2008 01:03:35 +0000 (11:03 +1000)]

Add two new options

CTDB_SAMBA_SKIP_CONF_CHECK and CTDB_SAMBA_CHECK_PORTS.
The first is used to tell ctdb to no longer monitoring if the smb.conf file is consistent or not.

The second specifies which ports to check that smb is listening on
instead of using testparm to figure this out.

Since the net, testparm and smbstatus may block indefinitely in some configurations
we must have a way to configure ctdb to NOT use any of these three commands
in the scripts. These commands should thus never be used in scripts.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 14 Jul 2008 01:22:41 +0000 (11:22 +1000)]

remove a debugging echo statement

commit | commitdiff | tree

Andrew Tridgell [Sun, 13 Jul 2008 23:19:22 +0000 (09:19 +1000)]

fixed up exit status for onnode

commit | commitdiff | tree

Andrew Tridgell [Fri, 11 Jul 2008 09:21:39 +0000 (19:21 +1000)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 11 Jul 2008 01:48:41 +0000 (11:48 +1000)]

new version 1.0.47

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 11 Jul 2008 00:33:46 +0000 (10:33 +1000)]

Fix a very subtle race where we could get a double free of a talloced
memory if ctdb_run_eventscript() would be called
during processing of ctdb_event_script_timeout() for
user unvoked eventscripts. (eventsccripts invoked by "ctdb eventscript ...")

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Thu, 10 Jul 2008 23:24:21 +0000 (09:24 +1000)]

Signed-off-by: Martin Schwenke <martin@meltin.net>
Update packaging/RPM/ctdb.spec to reflect onnode changes.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 07:06:52 +0000 (17:06 +1000)]

Revert "Yip yip yip!"

This reverts commit f7bdf96843a7e4ad61ad378786922d6281de9d93.

commit | commitdiff | tree

Martin Schwenke [Thu, 10 Jul 2008 06:56:30 +0000 (16:56 +1000)]

Yip yip yip!

commit | commitdiff | tree

Martin Schwenke [Thu, 10 Jul 2008 04:19:52 +0000 (14:19 +1000)]

When in verbose mode with -p, each line is prefixed with the node
address/name.  To implement this stderr has redirected to stdout -
this doesn't need to be done but is the simplest implementation.
Remove -t option since it doesn't seem to accomplish much but causes
spurious messages to be displayed by ssh.  Add explicit -h and --help
options.  Make style of usage message consistent with documentation.
Document new features in doc/onnode.1.xml.

commit | commitdiff | tree

Martin Schwenke [Wed, 9 Jul 2008 04:23:02 +0000 (14:23 +1000)]

Update Makefile.in for new version of onnode.

commit | commitdiff | tree

Martin Schwenke [Wed, 9 Jul 2008 04:18:15 +0000 (14:18 +1000)]

Complete rewrite of tools/onnode. Remove old tools/onnode.ssh,
tools/onnode.rsh.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 03:40:00 +0000 (13:40 +1000)]

explain why you have to have a real ip address as well as the "virtual"
ip address for lvs

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 03:00:50 +0000 (13:00 +1000)]

new version 10.0.46

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 02:50:16 +0000 (12:50 +1000)]

add documentation for both LVS:single-ip and CAPABILITIES:wan-accelerator

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 01:42:37 +0000 (11:42 +1000)]

Update to the LVS eventscript.
Do not assume all nodes are members of LVS so always deciding the recmaster will be lvsmaster wont work.

Instead,
Create the set of active LVS nodes as those nodes that are LVS capable and
also HEALTHY.
Except if ALL LVS capable nodes are unhealthy in which case we allow the unhealthy
nodes to be part of the active set.

In the active set, pick one of the active nodes as being the lvsmaster
which will receive all incoming traffic and distribute it across
the active lvs nodes in the cluster.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 01:12:58 +0000 (11:12 +1000)]

Add three mode commands to the CTDB tool.

lvs: which shows which nodes are active LVS servers
lvsmaster: which shows which node is the lvs master multiplex node
pnn: which prints the pnn of the local node

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jul 2008 00:37:22 +0000 (10:37 +1000)]

make LVS a capability so that we can see which nodes are configured with
LVS and which are not using LVS.

"ctdb getcapabilities"

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 22:56:33 +0000 (08:56 +1000)]

add an option to skip checking that all the samba shares are ok
when monitoring the node health.
this might be useful to skip for environments with thousands of shares

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 22:05:34 +0000 (08:05 +1000)]

remove the attempts to restart NFS.

nfs should never stop spontaneously so trying to restart it is
just counterproductive and at best a workaround to
hide real bugs.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 05:17:27 +0000 (15:17 +1000)]

if we have enabled LVS but we dont have all the required packages
just log it to the messages
dont stop ctdb from starting

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 04:02:54 +0000 (14:02 +1000)]

proper waitpid() fix.
remove all waitpid() calls and use the event system to trap sigchld

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 03:14:47 +0000 (13:14 +1000)]

Revert "pull the development files out into their own package"

This reverts commit 36be210bbc5e0af75c5fd6e57863272bfa0e942e.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 03:14:34 +0000 (13:14 +1000)]

Revert "add spec file for development rpm"

This reverts commit bd7b254b81dda4d9d62516abf32f93f2503eb9bb.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 03:14:07 +0000 (13:14 +1000)]

Revert "copy ctdb-dev to the spec directory"

This reverts commit 8814997c1b9623397058088dd0e1775cecfe371b.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 03:07:17 +0000 (13:07 +1000)]

copy ctdb-dev to the spec directory

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 01:37:02 +0000 (11:37 +1000)]

add spec file for development rpm

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 01:32:19 +0000 (11:32 +1000)]

pull the development files out into their own package

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 01:08:44 +0000 (11:08 +1000)]

install the readme in /usr/share/doc/ctdb/ instead of under /etc

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 00:24:19 +0000 (10:24 +1000)]

mark /etc/ctdb/functions as a config file to keep rpmlint happy

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 00:17:39 +0000 (10:17 +1000)]

From Chris Cowan, patch to make aix compile again

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jul 2008 00:03:21 +0000 (10:03 +1000)]

Replace \s with [[:space:]] in our regexps we use for egrep.

Kevin Collins noticed that RHEL5 grep-2.5.1-54.2.el5 built for
x86 does not handle \s while the exact same RHEL5 package for amd64
does!

[[:space:]] is more portable. Even across the same package version ( different architecture ) from the same vendor :-)

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 8 Jul 2008 07:41:31 +0000 (17:41 +1000)]

Revert "waitpid() can block if it takes a long time before the child terminates"

This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.

revert the waitpid changes. we need to waitpid for some childredn so should
refactor the approach completely

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 8 Jul 2008 07:40:53 +0000 (17:40 +1000)]

Revert "set sigchild to SIG_IGN instead of SIG_DFL"

This reverts commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 8 Jul 2008 06:31:23 +0000 (16:31 +1000)]

set sigchild to SIG_IGN instead of SIG_DFL

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 8 Jul 2008 00:03:57 +0000 (10:03 +1000)]

new version 1.0.45

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 7 Jul 2008 23:58:10 +0000 (09:58 +1000)]

update the monitor event for nfs to track how many times in a row it has failed
to "ping" the local nfs daemon.

Once it has failed more than 3 times in a row it will attempt to restart the nfs service.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 7 Jul 2008 17:48:11 +0000 (03:48 +1000)]

waitpid() can block if it takes a long time before the child terminates
so we should not call it from the main daemon.

1, set SIGCHLD to SIG_DFL to make sure we ignore this signal

2, get rid of all waitpid() calls

3, change reporting of event script status code from _exit()/waitpid() to write()/read() one byte across the pipe.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 7 Jul 2008 10:38:59 +0000 (20:38 +1000)]

use more libral handling of event scripts timing out.

If the event script that timed out was for the "monitor" event, then
even if it timed out we still return SUCCESS back to the guy invoking the eventscript.
Only consider the eventscript for "monitor" to have failed with an error
IFF it actually terminated with an error, or if it timed out 5 times in a row and hung.

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 6 Jul 2008 23:07:49 +0000 (09:07 +1000)]

new version .44

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 6 Jul 2008 22:53:22 +0000 (08:53 +1000)]

zero out the sockaddr_in structure before we store the ipv4 data in it to make sure that all data is initialized. Othervise valgrind will complain about uninitialized data when we write this structure out on the wire

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 6 Jul 2008 22:52:04 +0000 (08:52 +1000)]

we need a 'case x:' in our ugly 'encode the control opcode as a linenumber in valgrind output' hack to make it work

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 6 Jul 2008 22:51:05 +0000 (08:51 +1000)]

If a transaction commit fails. Log this error and cancel all pending transactions to the
databases instead of calling ctdb_fatal()

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 6 Jul 2008 22:50:12 +0000 (08:50 +1000)]

in the destructor for the lock-wait child, make sure that we cancel any pending
transactions.

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 08:03:24 +0000 (18:03 +1000)]

fixed a case statement

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 08:00:24 +0000 (18:00 +1000)]

an extraordinarily ugly patch!

This is a hack to allow backtraces under valgrind to show what opcode
is getting uninitialised bytes

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 07:40:25 +0000 (17:40 +1000)]

ensure pad bytes in the ltdb_header are initialised

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 07:32:21 +0000 (17:32 +1000)]

don't use mmap in tdb if --nosetsched is set. That makes valgrind
happier (it doesn't like the mmap/msync calls in tdb)

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 07:15:06 +0000 (17:15 +1000)]

prevent valgrind errors where we print unitialised values on control errors

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 07:04:37 +0000 (17:04 +1000)]

fixed a warning

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 07:04:26 +0000 (17:04 +1000)]

fixed some incorrect CTDB_NO_MEMORY*() calls found after fixing the
_VOID varient

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 06:58:29 +0000 (16:58 +1000)]

CTDB_NO_MEMORY_VOID() needs to return on error

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 06:58:14 +0000 (16:58 +1000)]

added option to start ctdb under valgrind

Just add CTDB_VALGRIND=yes in /etc/sysconfig/ctdb, and look at the
logs in /var/log/ctdb_valgrind.*

commit | commitdiff | tree

Andrew Tridgell [Fri, 4 Jul 2008 06:05:04 +0000 (16:05 +1000)]

zero out the ctdb->freeze_handle when we free it

This prevents heap corruption when a freeze child dies

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 3 Jul 2008 02:46:09 +0000 (12:46 +1000)]

we dont need to explicitely thaw the databases from the recovery daemon
since this is already done implicitely when we changed recovery mode
back to normal

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 2 Jul 2008 03:55:59 +0000 (13:55 +1000)]

track both when we last started and ended a recovery.
make ctdb uptime print how long the recovery took

in the recovery daemon when we check that the public ip address
allocation on the local node is correct (we have the ips we should have
and we dont have any we shouldnt have) use ctdb uptime and check the
recovery start/stop times and make sure we dont check for ip allocation
inconsistencies during a recovery where the ip address allocation is in flux.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 2 Jul 2008 02:21:53 +0000 (12:21 +1000)]

print the opcode when an async callback detects an error

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 2 Jul 2008 02:01:19 +0000 (12:01 +1000)]

update a comment to reflect that this is not always a real recovery
it can also be printed when we just do an ip reallocation

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 30 Jun 2008 23:34:43 +0000 (09:34 +1000)]

new version

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 26 Jun 2008 23:31:18 +0000 (09:31 +1000)]

initdit/ctdb is not a config file