sahlberg/ctdb.git
15 years agoMerge git://git.samba.org/tridge/ctdb
Ronnie Sahlberg [Wed, 16 Jul 2008 06:51:37 +0000 (16:51 +1000)]
Merge git://git.samba.org/tridge/ctdb

15 years agochange how we filter out "empty" records in the traversecode
Ronnie Sahlberg [Wed, 16 Jul 2008 02:23:18 +0000 (12:23 +1000)]
change how we filter out "empty" records in the traversecode
so that we output the same list of keys in "catdb" as "tdbdump".

when traversing a persistent database, as an optimization, only
traverse on the local node (and thus skip checking if we are
dmaster or not). If the local node is not part of the vnnmap and thus
would not be guaranteed to have an uptodate persistent database
we instead traverse it on one of the other nodes that are in the vnnmap.

15 years agofixed postun script to prevent corrupting RPM database
Andrew Tridgell [Wed, 16 Jul 2008 02:23:05 +0000 (12:23 +1000)]
fixed postun script to prevent corrupting RPM database

15 years agoAdd two new options
Ronnie Sahlberg [Tue, 15 Jul 2008 01:03:35 +0000 (11:03 +1000)]
Add two new options

CTDB_SAMBA_SKIP_CONF_CHECK and CTDB_SAMBA_CHECK_PORTS.
The first is used to tell ctdb to no longer monitoring if the smb.conf file is consistent or not.

The second specifies which ports to check that smb is listening on
instead of using testparm to figure this out.

Since the net, testparm and smbstatus may block indefinitely in some configurations
we must have a way to configure ctdb to NOT use any of these three commands
in the scripts. These commands should thus never be used in scripts.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
15 years agoremove a debugging echo statement
Ronnie Sahlberg [Mon, 14 Jul 2008 01:22:41 +0000 (11:22 +1000)]
remove a debugging echo statement

15 years agofixed up exit status for onnode
Andrew Tridgell [Sun, 13 Jul 2008 23:19:22 +0000 (09:19 +1000)]
fixed up exit status for onnode

15 years agoMerge commit 'ronnie/master'
Andrew Tridgell [Fri, 11 Jul 2008 09:21:39 +0000 (19:21 +1000)]
Merge commit 'ronnie/master'

15 years agonew version 1.0.47
Ronnie Sahlberg [Fri, 11 Jul 2008 01:48:41 +0000 (11:48 +1000)]
new version 1.0.47

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
15 years agoFix a very subtle race where we could get a double free of a talloced 1.0.47 obnox/1.0.47 origin/1.0.47
Ronnie Sahlberg [Fri, 11 Jul 2008 00:33:46 +0000 (10:33 +1000)]
Fix a very subtle race where we could get a double free of a talloced
memory if ctdb_run_eventscript() would be called
during processing of ctdb_event_script_timeout() for
user unvoked eventscripts. (eventsccripts invoked by "ctdb eventscript ...")

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
15 years agoSigned-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 10 Jul 2008 23:24:21 +0000 (09:24 +1000)]
Signed-off-by: Martin Schwenke <martin@meltin.net>
Update packaging/RPM/ctdb.spec to reflect onnode changes.

15 years agoRevert "Yip yip yip!"
Ronnie Sahlberg [Thu, 10 Jul 2008 07:06:52 +0000 (17:06 +1000)]
Revert "Yip yip yip!"

This reverts commit f7bdf96843a7e4ad61ad378786922d6281de9d93.

15 years agoYip yip yip!
Martin Schwenke [Thu, 10 Jul 2008 06:56:30 +0000 (16:56 +1000)]
Yip yip yip!

15 years agoWhen in verbose mode with -p, each line is prefixed with the node
Martin Schwenke [Thu, 10 Jul 2008 04:19:52 +0000 (14:19 +1000)]
When in verbose mode with -p, each line is prefixed with the node
address/name.  To implement this stderr has redirected to stdout -
this doesn't need to be done but is the simplest implementation.
Remove -t option since it doesn't seem to accomplish much but causes
spurious messages to be displayed by ssh.  Add explicit -h and --help
options.  Make style of usage message consistent with documentation.
Document new features in doc/onnode.1.xml.

15 years agoUpdate Makefile.in for new version of onnode.
Martin Schwenke [Wed, 9 Jul 2008 04:23:02 +0000 (14:23 +1000)]
Update Makefile.in for new version of onnode.

15 years agoComplete rewrite of tools/onnode. Remove old tools/onnode.ssh,
Martin Schwenke [Wed, 9 Jul 2008 04:18:15 +0000 (14:18 +1000)]
Complete rewrite of tools/onnode.  Remove old tools/onnode.ssh,
tools/onnode.rsh.

15 years agoexplain why you have to have a real ip address as well as the "virtual"
Ronnie Sahlberg [Thu, 10 Jul 2008 03:40:00 +0000 (13:40 +1000)]
explain why you have to have a real ip address as well as the "virtual"
ip address for lvs

15 years agonew version 10.0.46
Ronnie Sahlberg [Thu, 10 Jul 2008 03:00:50 +0000 (13:00 +1000)]
new version 10.0.46

15 years agoadd documentation for both LVS:single-ip and CAPABILITIES:wan-accelerator
Ronnie Sahlberg [Thu, 10 Jul 2008 02:50:16 +0000 (12:50 +1000)]
add documentation for both LVS:single-ip and CAPABILITIES:wan-accelerator

15 years agoUpdate to the LVS eventscript.
Ronnie Sahlberg [Thu, 10 Jul 2008 01:42:37 +0000 (11:42 +1000)]
Update to the LVS eventscript.
Do not assume all nodes are members of LVS so always deciding the recmaster will be lvsmaster wont work.

Instead,
Create the set of active LVS nodes as those nodes that are LVS capable and
also HEALTHY.
Except if ALL LVS capable nodes are unhealthy in which case we allow the unhealthy
nodes to be part of the active set.

In the active set, pick one of the active nodes as being the lvsmaster
which will receive all incoming traffic and distribute it across
the active lvs nodes in the cluster.

15 years agoAdd three mode commands to the CTDB tool.
Ronnie Sahlberg [Thu, 10 Jul 2008 01:12:58 +0000 (11:12 +1000)]
Add three mode commands to the CTDB tool.

lvs: which shows which nodes are active LVS servers
lvsmaster: which shows which node is the lvs master multiplex node
pnn: which prints the pnn of the local node

15 years agomake LVS a capability so that we can see which nodes are configured with
Ronnie Sahlberg [Thu, 10 Jul 2008 00:37:22 +0000 (10:37 +1000)]
make LVS a capability so that we can see which nodes are configured with
LVS and which are not using LVS.

"ctdb getcapabilities"

15 years agoadd an option to skip checking that all the samba shares are ok
Ronnie Sahlberg [Wed, 9 Jul 2008 22:56:33 +0000 (08:56 +1000)]
add an option to skip checking that all the samba shares are ok
when monitoring the node health.
this might be useful to skip for environments with thousands of shares

15 years agoremove the attempts to restart NFS.
Ronnie Sahlberg [Wed, 9 Jul 2008 22:05:34 +0000 (08:05 +1000)]
remove the attempts to restart NFS.

nfs should never stop spontaneously so trying to restart it is
just counterproductive and at best a workaround to
hide real bugs.

15 years agoif we have enabled LVS but we dont have all the required packages
Ronnie Sahlberg [Wed, 9 Jul 2008 05:17:27 +0000 (15:17 +1000)]
if we have enabled LVS   but we dont have all the required packages
just log it to the messages
dont stop ctdb from starting

15 years agoproper waitpid() fix.
Ronnie Sahlberg [Wed, 9 Jul 2008 04:02:54 +0000 (14:02 +1000)]
proper waitpid() fix.
remove all waitpid() calls and use the event system to trap sigchld

15 years agoRevert "pull the development files out into their own package"
Ronnie Sahlberg [Wed, 9 Jul 2008 03:14:47 +0000 (13:14 +1000)]
Revert "pull the development files out into their own package"

This reverts commit 36be210bbc5e0af75c5fd6e57863272bfa0e942e.

15 years agoRevert "add spec file for development rpm"
Ronnie Sahlberg [Wed, 9 Jul 2008 03:14:34 +0000 (13:14 +1000)]
Revert "add spec file for development rpm"

This reverts commit bd7b254b81dda4d9d62516abf32f93f2503eb9bb.

15 years agoRevert "copy ctdb-dev to the spec directory"
Ronnie Sahlberg [Wed, 9 Jul 2008 03:14:07 +0000 (13:14 +1000)]
Revert "copy ctdb-dev to the spec directory"

This reverts commit 8814997c1b9623397058088dd0e1775cecfe371b.

15 years agocopy ctdb-dev to the spec directory
Ronnie Sahlberg [Wed, 9 Jul 2008 03:07:17 +0000 (13:07 +1000)]
copy ctdb-dev to the spec directory

15 years agoadd spec file for development rpm
Ronnie Sahlberg [Wed, 9 Jul 2008 01:37:02 +0000 (11:37 +1000)]
add spec file for development rpm

15 years agopull the development files out into their own package
Ronnie Sahlberg [Wed, 9 Jul 2008 01:32:19 +0000 (11:32 +1000)]
pull the development files out into their own package

15 years agoinstall the readme in /usr/share/doc/ctdb/ instead of under /etc
Ronnie Sahlberg [Wed, 9 Jul 2008 01:08:44 +0000 (11:08 +1000)]
install the readme in /usr/share/doc/ctdb/ instead of under /etc

15 years ago mark /etc/ctdb/functions as a config file to keep rpmlint happy
Ronnie Sahlberg [Wed, 9 Jul 2008 00:24:19 +0000 (10:24 +1000)]
 mark /etc/ctdb/functions as a config file to keep rpmlint happy

15 years agoFrom Chris Cowan, patch to make aix compile again
Ronnie Sahlberg [Wed, 9 Jul 2008 00:17:39 +0000 (10:17 +1000)]
From Chris Cowan,  patch to make aix compile again

15 years agoReplace \s with [[:space:]] in our regexps we use for egrep.
Ronnie Sahlberg [Wed, 9 Jul 2008 00:03:21 +0000 (10:03 +1000)]
Replace \s with [[:space:]] in our regexps we use for egrep.

Kevin Collins noticed that RHEL5 grep-2.5.1-54.2.el5 built for
x86 does not handle \s    while the exact same RHEL5 package for amd64
does!

[[:space:]] is more portable.  Even across the same package version ( different architecture ) from the same vendor :-)

15 years agoRevert "waitpid() can block if it takes a long time before the child terminates" 1.0.45 obnox/1.0.45 origin/1.0.45
Ronnie Sahlberg [Tue, 8 Jul 2008 07:41:31 +0000 (17:41 +1000)]
Revert "waitpid() can block if it takes a long time before the child terminates"

This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.

revert the waitpid changes.   we need to waitpid for some childredn so should
refactor the approach completely

15 years agoRevert "set sigchild to SIG_IGN instead of SIG_DFL"
Ronnie Sahlberg [Tue, 8 Jul 2008 07:40:53 +0000 (17:40 +1000)]
Revert "set sigchild to SIG_IGN instead of SIG_DFL"

This reverts commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76.

15 years agoset sigchild to SIG_IGN instead of SIG_DFL
Ronnie Sahlberg [Tue, 8 Jul 2008 06:31:23 +0000 (16:31 +1000)]
set sigchild to SIG_IGN instead of SIG_DFL

15 years agonew version 1.0.45
Ronnie Sahlberg [Tue, 8 Jul 2008 00:03:57 +0000 (10:03 +1000)]
new version 1.0.45

15 years agoupdate the monitor event for nfs to track how many times in a row it has failed
Ronnie Sahlberg [Mon, 7 Jul 2008 23:58:10 +0000 (09:58 +1000)]
update the monitor event for nfs to track how many times in a row it has failed
to "ping" the local nfs daemon.

Once it has failed more than 3 times in a row it will attempt to restart the nfs service.

15 years agowaitpid() can block if it takes a long time before the child terminates
Ronnie Sahlberg [Mon, 7 Jul 2008 17:48:11 +0000 (03:48 +1000)]
waitpid() can block if it takes a long time before the child terminates
so we should not call it from the main daemon.

1, set SIGCHLD to SIG_DFL to make sure we ignore this signal

2, get rid of all waitpid() calls

3, change reporting of event script status code from _exit()/waitpid()   to write()/read() one byte across the pipe.

15 years agouse more libral handling of event scripts timing out.
Ronnie Sahlberg [Mon, 7 Jul 2008 10:38:59 +0000 (20:38 +1000)]
use more libral handling of event scripts timing out.

If the event script that timed out was for the "monitor" event, then
even if it timed out we still return SUCCESS back to the guy invoking the eventscript.
Only consider the eventscript for "monitor" to have failed with an error
IFF it actually terminated with an error,   or if it timed out 5 times in a row and hung.

15 years agonew version .44 1.0.44 obnox/1.0.44 origin/1.0.44
Ronnie Sahlberg [Sun, 6 Jul 2008 23:07:49 +0000 (09:07 +1000)]
new version .44

15 years agozero out the sockaddr_in structure before we store the ipv4 data in it to make sure...
Ronnie Sahlberg [Sun, 6 Jul 2008 22:53:22 +0000 (08:53 +1000)]
zero out the sockaddr_in structure before we store the ipv4 data in it to make sure that all data is initialized. Othervise valgrind will complain about uninitialized data when we write this structure out on the wire

15 years agowe need a 'case x:' in our ugly 'encode the control opcode as a linenumber in valgrin...
Ronnie Sahlberg [Sun, 6 Jul 2008 22:52:04 +0000 (08:52 +1000)]
we need a 'case x:' in our ugly 'encode the control opcode as a linenumber in valgrind output' hack to make it work

15 years agoIf a transaction commit fails. Log this error and cancel all pending transactions...
Ronnie Sahlberg [Sun, 6 Jul 2008 22:51:05 +0000 (08:51 +1000)]
If a transaction commit fails. Log this error and cancel all pending transactions to the
databases instead of calling ctdb_fatal()

15 years ago in the destructor for the lock-wait child, make sure that we cancel any pending
Ronnie Sahlberg [Sun, 6 Jul 2008 22:50:12 +0000 (08:50 +1000)]
 in the destructor for the lock-wait child, make sure that we cancel any pending
transactions.

15 years agofixed a case statement
Andrew Tridgell [Fri, 4 Jul 2008 08:03:24 +0000 (18:03 +1000)]
fixed a case statement

15 years agoan extraordinarily ugly patch!
Andrew Tridgell [Fri, 4 Jul 2008 08:00:24 +0000 (18:00 +1000)]
an extraordinarily ugly patch!

This is a hack to allow backtraces under valgrind to show what opcode
is getting uninitialised bytes

15 years agoensure pad bytes in the ltdb_header are initialised
Andrew Tridgell [Fri, 4 Jul 2008 07:40:25 +0000 (17:40 +1000)]
ensure pad bytes in the ltdb_header are initialised

15 years agodon't use mmap in tdb if --nosetsched is set. That makes valgrind
Andrew Tridgell [Fri, 4 Jul 2008 07:32:21 +0000 (17:32 +1000)]
don't use mmap in tdb if --nosetsched is set. That makes valgrind
happier (it doesn't like the mmap/msync calls in tdb)

15 years agoprevent valgrind errors where we print unitialised values on control errors
Andrew Tridgell [Fri, 4 Jul 2008 07:15:06 +0000 (17:15 +1000)]
prevent valgrind errors where we print unitialised values on control errors

15 years agofixed a warning
Andrew Tridgell [Fri, 4 Jul 2008 07:04:37 +0000 (17:04 +1000)]
fixed a warning

15 years agofixed some incorrect CTDB_NO_MEMORY*() calls found after fixing the
Andrew Tridgell [Fri, 4 Jul 2008 07:04:26 +0000 (17:04 +1000)]
fixed some incorrect CTDB_NO_MEMORY*() calls found after fixing the
_VOID varient

15 years agoCTDB_NO_MEMORY_VOID() needs to return on error
Andrew Tridgell [Fri, 4 Jul 2008 06:58:29 +0000 (16:58 +1000)]
CTDB_NO_MEMORY_VOID() needs to return on error

15 years agoadded option to start ctdb under valgrind
Andrew Tridgell [Fri, 4 Jul 2008 06:58:14 +0000 (16:58 +1000)]
added option to start ctdb under valgrind

Just add CTDB_VALGRIND=yes in /etc/sysconfig/ctdb, and look at the
logs in /var/log/ctdb_valgrind.*

15 years agozero out the ctdb->freeze_handle when we free it
Andrew Tridgell [Fri, 4 Jul 2008 06:05:04 +0000 (16:05 +1000)]
zero out the ctdb->freeze_handle when we free it

This prevents heap corruption when a freeze child dies

15 years agowe dont need to explicitely thaw the databases from the recovery daemon
Ronnie Sahlberg [Thu, 3 Jul 2008 02:46:09 +0000 (12:46 +1000)]
we dont need to explicitely thaw the databases from the recovery daemon
since this is already done implicitely when we changed recovery mode
back to normal

15 years agotrack both when we last started and ended a recovery.
Ronnie Sahlberg [Wed, 2 Jul 2008 03:55:59 +0000 (13:55 +1000)]
track both when we last started and ended a recovery.
make ctdb uptime print how long the recovery took

in the recovery daemon when we check that the public ip address
allocation on the local node is correct (we have the ips we should have
and we dont have any we shouldnt have) use ctdb uptime and check the
recovery start/stop times and make sure we dont check for ip allocation
inconsistencies during a recovery  where the ip address allocation is in flux.

15 years agoprint the opcode when an async callback detects an error
Ronnie Sahlberg [Wed, 2 Jul 2008 02:21:53 +0000 (12:21 +1000)]
print the opcode when an async callback detects an error

15 years agoupdate a comment to reflect that this is not always a real recovery
Ronnie Sahlberg [Wed, 2 Jul 2008 02:01:19 +0000 (12:01 +1000)]
update a comment to reflect that this is not always a real recovery
it can also be printed when we just do an ip reallocation

15 years agonew version
Ronnie Sahlberg [Mon, 30 Jun 2008 23:34:43 +0000 (09:34 +1000)]
new version

15 years agoinitdit/ctdb is not a config file
Ronnie Sahlberg [Thu, 26 Jun 2008 23:31:18 +0000 (09:31 +1000)]
initdit/ctdb is not a config file

15 years agomake /etc/ctdb/functions executable and add a hashbang to it so
Ronnie Sahlberg [Thu, 26 Jun 2008 23:29:38 +0000 (09:29 +1000)]
make /etc/ctdb/functions executable and add a hashbang to it so
rpmlint wont complain

15 years agotest
Ronnie Sahlberg [Thu, 26 Jun 2008 04:14:37 +0000 (14:14 +1000)]
test

15 years agoRevert "test"
Ronnie Sahlberg [Thu, 26 Jun 2008 04:00:36 +0000 (14:00 +1000)]
Revert "test"

This reverts commit f71287a28d66db202fe52f9a43b6daf2389d7f66.

15 years agotest
Ronnie Sahlberg [Thu, 26 Jun 2008 03:51:18 +0000 (13:51 +1000)]
test

15 years agoreduce loglevel of the info message we are updating the flags on all nodes
Ronnie Sahlberg [Thu, 26 Jun 2008 03:15:41 +0000 (13:15 +1000)]
reduce loglevel of the info message we are updating the flags on all nodes

15 years agoforce an update of the flags from the recmaster after each monitoring run
Ronnie Sahlberg [Thu, 26 Jun 2008 03:08:37 +0000 (13:08 +1000)]
force an update of the flags from the recmaster after each monitoring run

15 years ago/etc/ctdb/functions should not be executable
Ronnie Sahlberg [Thu, 26 Jun 2008 02:43:30 +0000 (12:43 +1000)]
/etc/ctdb/functions should not be executable

15 years agothird attempt for fixing a freeze child writing to the socket
Ronnie Sahlberg [Thu, 26 Jun 2008 01:52:26 +0000 (11:52 +1000)]
third attempt for fixing a freeze child writing to the socket

15 years agoverify that the recmaster has the correct flags for us and if not tell the recmaste...
Ronnie Sahlberg [Thu, 26 Jun 2008 01:08:09 +0000 (11:08 +1000)]
verify that the recmaster has the correct flags for us   and if not tell the recmaster what the flags should be

15 years agoonly loop over the write it the write failed
Ronnie Sahlberg [Thu, 26 Jun 2008 01:02:08 +0000 (11:02 +1000)]
only loop over the write it the write failed

15 years agothe write() from the freeze child process can fail
Ronnie Sahlberg [Wed, 25 Jun 2008 23:54:27 +0000 (09:54 +1000)]
the write() from the freeze child process can fail
try writing many times and log an error if the write failed

15 years agoit is 2008 not 2008 right now :-)
Ronnie Sahlberg [Fri, 13 Jun 2008 03:53:05 +0000 (13:53 +1000)]
it is 2008   not 2008 right now :-)

15 years agoupdate to 1.0.42
Ronnie Sahlberg [Fri, 13 Jun 2008 03:50:28 +0000 (13:50 +1000)]
update to 1.0.42

15 years agoban the node after 3 failed scripts by default
Ronnie Sahlberg [Fri, 13 Jun 2008 03:45:23 +0000 (13:45 +1000)]
ban the node after 3 failed scripts by default

15 years agoif the event scripts hangs EventScriptsBanCount consecutive times in a row
Ronnie Sahlberg [Fri, 13 Jun 2008 03:18:06 +0000 (13:18 +1000)]
if the event scripts hangs EventScriptsBanCount consecutive times in a row
the node will ban itself for the default recovery ban period

15 years agowhen a eventscript has timed out, log the event options (i.e. "monitor" "takeip 1...
Ronnie Sahlberg [Fri, 13 Jun 2008 02:18:00 +0000 (12:18 +1000)]
when a eventscript has timed out, log the event options (i.e. "monitor" "takeip 1.2..." etc)
to the log

15 years agomake it possible to re-start a recovery without marking the current node as
Ronnie Sahlberg [Fri, 13 Jun 2008 01:47:42 +0000 (11:47 +1000)]
make it possible to re-start a recovery without marking the current node as
the culprit.

15 years agoadd a callback for failed nodes to the async control helper.
Ronnie Sahlberg [Thu, 12 Jun 2008 06:53:36 +0000 (16:53 +1000)]
add a callback for failed nodes to the async control helper.

this callback is called for every node where the control failed (or timed out)

when we issue the start recovery control from recovery master,
set any node that fails as a culprit   so it will eventually be banned

15 years agofirst cut to convert takeover_callback_state{}
Ronnie Sahlberg [Wed, 4 Jun 2008 07:12:57 +0000 (17:12 +1000)]
first cut to convert takeover_callback_state{}
to use ctdb_sock_addr instead of sockaddr_in

15 years agofix a comment
Ronnie Sahlberg [Wed, 4 Jun 2008 05:23:06 +0000 (15:23 +1000)]
fix a comment

note that we dont actually send the ipv6 "gratious arp" on the wire just yet.
(since ipv6 doesnt use arp)
but all the infrastructure is there when we implement sending raw neig.disc. packets

15 years agoconvert handling of gratious arps and their controls and helpers to
Ronnie Sahlberg [Wed, 4 Jun 2008 05:13:00 +0000 (15:13 +1000)]
convert handling of gratious arps and their controls and helpers to
use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6

15 years agoadd a parameter for the tdb-flags to the client function
Ronnie Sahlberg [Wed, 4 Jun 2008 00:46:20 +0000 (10:46 +1000)]
add a parameter for the tdb-flags to the client function
ctdb_attach()   so that we can pass TDB_NOSYNC when we attach to
a persistent database and want fast unsafe writes instead of
slow but safe tdb_transaction writes.

enhance the ctdb_persistent test suite to test both safe and unsafe writes

15 years agorun the persistent write test with 4 nodes by default
Ronnie Sahlberg [Tue, 3 Jun 2008 08:19:48 +0000 (18:19 +1000)]
run the persistent write test with 4 nodes by default

use the timelimit argument to the persistent writer to run the test for
30 seconds by default

15 years agoredesign the test of persistent writes
Ronnie Sahlberg [Tue, 3 Jun 2008 08:18:28 +0000 (18:18 +1000)]
redesign the test of persistent writes
so that we have n persistent writers on n nodes,
all writers writing persistently to the same record.

each writer on a node has its own "counter" in this record that is incremented by one in each iteration.
the persistent writer on node 0 also checks that all the counters in the record are increasing monotonically and if they are not, flagging it as an ERROR.

15 years agocreate the nodes file in a 'test' subdirectory and not the current directory
Ronnie Sahlberg [Tue, 3 Jun 2008 08:14:54 +0000 (18:14 +1000)]
create the nodes file in a 'test' subdirectory and not the current directory

delete all persistent databases when the test starts
(the tests only uses test databases in a special test directory)

do not set up any public addresses in the tests

wait until there are no disconnected or unhealthy nodes when starting the
test daemons instead of waiting for the recovery mode to change.
we do want to wait until the system has recovered and ALL nodes are ok.

16 years agodebugleves can now be negative so print their value using %d instead of %u
Ronnie Sahlberg [Wed, 28 May 2008 22:19:35 +0000 (08:19 +1000)]
debugleves can now be negative   so print their value using %d instead of %u

16 years agoupdate to .41
Ronnie Sahlberg [Wed, 28 May 2008 04:51:46 +0000 (14:51 +1000)]
update to .41

16 years agodont bother casting to a void* private_data pointer,
Ronnie Sahlberg [Wed, 28 May 2008 03:40:12 +0000 (13:40 +1000)]
dont bother casting to a void* private_data pointer,
just pass it as 'state' structure

16 years agoremove another field we dont need in the childwrite_handle structure
Ronnie Sahlberg [Wed, 28 May 2008 03:31:58 +0000 (13:31 +1000)]
remove another field we dont need in the childwrite_handle structure

16 years agoremote a comment that is no longer relevant
Ronnie Sahlberg [Wed, 28 May 2008 03:30:22 +0000 (13:30 +1000)]
remote a comment that is no longer relevant

remove a field in the childwrite_handle structure we dont need

16 years agodo persistent writes in a child process
Ronnie Sahlberg [Wed, 28 May 2008 03:04:25 +0000 (13:04 +1000)]
do persistent writes in a child process

16 years agoupdate to .40
Ronnie Sahlberg [Mon, 26 May 2008 22:23:46 +0000 (08:23 +1000)]
update to .40

16 years agoread the samba sysconfig from the samba eventscript
Ronnie Sahlberg [Mon, 26 May 2008 22:21:18 +0000 (08:21 +1000)]
read the samba sysconfig from the samba eventscript

16 years agodisable transactions for now, there are more situations where there are conflicting...
Ronnie Sahlberg [Thu, 22 May 2008 08:33:54 +0000 (18:33 +1000)]
disable transactions for now,  there are more situations where there are conflicting locks   and the "net" command is not prepared that the persistent store can fail.

16 years agorestore a timeout value to the default settings instead of the hardcoded 3 second...
Ronnie Sahlberg [Thu, 22 May 2008 06:33:36 +0000 (16:33 +1000)]
restore a timeout value to the default settings instead of the hardcoded 3 second test value

16 years agofix some memory hierarchy bugs in allocation of the state structure for persistent...
Ronnie Sahlberg [Thu, 22 May 2008 06:29:46 +0000 (16:29 +1000)]
fix some memory hierarchy bugs in allocation of the state structure for persistent writes.

since these two controls (UPDATE_RECORD and PERSISTENT_STORE) can respond
asynchronously to the control,   we can not allocate the state variable as a child off ctdb_req_control  instead we must allocate state as a child off ctdb itself
and steal ctdb_req_control so it becomes a child of state.

othervise both ctdb_req_control and also state will be released immediately after we have finished setting up the async reply and returned.

16 years agocleanup of the previous patch.
Ronnie Sahlberg [Thu, 22 May 2008 03:12:53 +0000 (13:12 +1000)]
cleanup of the previous patch.

With these patches, ctdbd will enforce and (by default) always use
tdb_transactions when updating/writing records to a persistent database.

This might come with a small performance degratation  since transactions
are slower than no transactions at all.

If a client, such as samba wants to use a persistent database but does NOT
want to pay the performance penalty, it can specify TDB_NOSYNC  as the
srvid parameter in the ctdb_control() for CTDB_CONTROL_DB_ATTACH_PERSISTENT.

In this case CTDBD will remember that "this database is not that important"
so I can use unsafe (no transaction) tdb_stores to write the updates.
It will be faster than the default (always use transaction) but less crash safe.