sahlberg/ctdb.git
14 years agoversion 1.0.108-3 1.0.108
Ronnie Sahlberg [Thu, 7 Jan 2010 02:48:05 +0000 (13:48 +1100)]
version 1.0.108-3

14 years agoFrom Volker:
Ronnie Sahlberg [Thu, 7 Jan 2010 02:47:00 +0000 (13:47 +1100)]
From Volker:

This fixes the following condition: On a cluster with many nodes a single
node
was running as the only node. Any transaction that was attempted against the
persistent databases hung. From a former run of the cluster
"__transaction_lock__" existed in the persistent databases, but with a dmaster
entry in the ctdb header that was not the local node. When the
transaction_start code tried to acquire this, ctdb queued the dmaster request
to a node that does not exist, hanging forever.

I though -- wait a second, why has nobody found this yet with non-persistent
databases? Answer: Non-persistent databases are opened with CLEAR_IF_FIRST,
which means that all records are locally deleted when ctdb attaches to it.

This wipe does not happen for persistent databases, but we have this one
__transaction_lock__ record around that is treated like a non-persistent
database. This patch treats the __transaction_lock__ for persistent db's
specially: It deletes it locally when ctdbd attaches to the db.

14 years agoversion 1.0.108-2
Ronnie Sahlberg [Tue, 5 Jan 2010 23:50:39 +0000 (10:50 +1100)]
version 1.0.108-2

14 years agouse --ping instead of --ping-dc for winbind for now
Ronnie Sahlberg [Tue, 5 Jan 2010 23:42:41 +0000 (10:42 +1100)]
use --ping instead of --ping-dc for winbind for now

14 years agoversion 1.0.108 ctdb-1.0.108
Ronnie Sahlberg [Mon, 7 Dec 2009 08:04:41 +0000 (19:04 +1100)]
version 1.0.108

14 years agoUse wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determ...
Ronnie Sahlberg [Mon, 7 Dec 2009 07:27:46 +0000 (18:27 +1100)]
Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state.

14 years agopackaging: package tests/bin/ctdb_transaction under /usr/share/doc/tests/bin obnox/master-rebase-wip-trans
Michael Adam [Fri, 4 Dec 2009 22:18:12 +0000 (23:18 +0100)]
packaging: package tests/bin/ctdb_transaction under /usr/share/doc/tests/bin

For testing/diagnostic purposes.

Michael

14 years agoclient: improve two error messages in ctdb_transaction_commit().
Michael Adam [Thu, 3 Dec 2009 23:19:44 +0000 (00:19 +0100)]
client: improve two error messages in ctdb_transaction_commit().

Michael

14 years agoserver:trans2_commit: move the check for active recovery down.
Michael Adam [Thu, 3 Dec 2009 23:06:34 +0000 (00:06 +0100)]
server:trans2_commit: move the check for active recovery down.

This needs to be done after the control-dispatcher:
In the TRANS2_COMMIT control, the client->db_id needs
to be set before bailing out, since otherwise the
next TRANS2_COMMIT_RETRY will fail...

Michael

14 years agoclient: increase the number of commit retries 10-->100
Michael Adam [Wed, 2 Dec 2009 23:28:32 +0000 (00:28 +0100)]
client: increase the number of commit retries 10-->100

To cope with timeouts when recoveries and transactions collide.
Maybe 100 is too high.

Michael

14 years agoclient: untangle checks and produce more detailed error messages
Michael Adam [Wed, 2 Dec 2009 23:27:34 +0000 (00:27 +0100)]
client: untangle checks and produce more detailed error messages

in ctdb_transaction_fetch_start

Michael

14 years agoclient: increase the rsn of the __transaction_lock__ when storing
Michael Adam [Wed, 2 Dec 2009 23:26:52 +0000 (00:26 +0100)]
client: increase the rsn of the __transaction_lock__ when storing

So that it is correctly handled by recoveries.
Also explicitly set the dmaster field to the current node's pnn.

Michael

14 years agorecovery: add special pull-logic for persistent databases
Michael Adam [Fri, 4 Dec 2009 10:21:29 +0000 (11:21 +0100)]
recovery: add special pull-logic for persistent databases

The decision mechanism which records of a persistent db
are to be pulled into the recdb during recovery is now
as follows:

* Usually a record with the higher rsn than that already
  stored is taken. (Just as for normal tdbs.)

* If a transaction is running on some node, then those
  nodes copies of all records are taken and are not
  overwritten later by other nodes' copies.

In order to keep track of whether a record's copy was obtained
from a node with a transaction running, the recovery mechanism
misuses the ctdb tdb header field 'lacount' in the recdb.
It is cleared later when pushing out the recdb database to the
other nodes.

This way, an incomplete transaction is not spoiled when
a recovery interrupts and the replay should usually succeed
(possibly after a few retries).

Michael

14 years agomake ctdb_ctrl_transaction_active public.
Michael Adam [Wed, 2 Dec 2009 23:25:16 +0000 (00:25 +0100)]
make ctdb_ctrl_transaction_active public.

Michael

14 years agorecovery: for persistent db's don't set the dmaster to the recmaster node number
Michael Adam [Sun, 29 Nov 2009 10:17:18 +0000 (11:17 +0100)]
recovery: for persistent db's don't set the dmaster to the recmaster node number

It is important to keep track of the dmaster (i.e. the node that last committed
a transaction containing changes to this node).

Michael

14 years agorecovery: pass the persistent flag to recover_database()
Michael Adam [Sun, 29 Nov 2009 10:14:31 +0000 (11:14 +0100)]
recovery: pass the persistent flag to recover_database()

and further down to pull_remote_database(), pull_one_remote_database(),
and push_recdb_database().

This is in preparation of special handling of persistent databases
during recoveries.

Michael

14 years agotests:ctdb_transaction: print an extra counters when a commit fails
Michael Adam [Sun, 29 Nov 2009 10:07:36 +0000 (11:07 +0100)]
tests:ctdb_transaction: print an extra counters when a commit fails

Michael

14 years agoclient: in catdb, print the keyname first, and separate records by a blank line
Michael Adam [Sun, 29 Nov 2009 09:38:33 +0000 (10:38 +0100)]
client: in catdb, print the keyname first, and separate records by a blank line

Michael

14 years agopackaging: remove the lib/popt from the tarball in debian mode
Michael Adam [Tue, 1 Dec 2009 22:54:12 +0000 (23:54 +0100)]
packaging: remove the lib/popt from the tarball in debian mode

Debian CTDB packaging fails when this is included.

Michael

14 years agopackaging: rework maketarball.sh to accept an arbitrary githas to pack
Michael Adam [Tue, 1 Dec 2009 22:51:51 +0000 (23:51 +0100)]
packaging: rework maketarball.sh to accept an arbitrary githas to pack

The githash can be specified through the environment variable "GITHASH"
that can contain a commit hash or a tag name, e.g.

The call syntax is now

[GITHASH=xyz] [USE_GITHASH=yes/no] [DEBIAN_MODE=yes/no] maketarball.sh

Michael

14 years agoctdb: add command "ctdb wipedb" to wipe the contents of an attached tdb
Michael Adam [Sun, 29 Nov 2009 03:05:03 +0000 (04:05 +0100)]
ctdb: add command "ctdb wipedb" to wipe the contents of an attached tdb

Michael

14 years agotests: turn printfs into DEBUG statements in the ctdb_transaction test
Michael Adam [Thu, 29 Oct 2009 21:40:50 +0000 (22:40 +0100)]
tests: turn printfs into DEBUG statements in the ctdb_transaction test

Michael

14 years agoMerge branch 'status-test-2'
Martin Schwenke [Fri, 4 Dec 2009 03:44:46 +0000 (14:44 +1100)]
Merge branch 'status-test-2'

14 years agoDont store debug level DEBUG_DEBUG in the in-memory ringbuffer.
Ronnie Sahlberg [Fri, 4 Dec 2009 00:45:37 +0000 (11:45 +1100)]
Dont store debug level DEBUG_DEBUG in the in-memory ringbuffer.

It is unlikely we will need something this verbose for normal troubleshooting.
This allows us to keep a significantly longer time interval of log messages
in the 500k slots available in the ringbuffer.

14 years agoUse statically allocated ringbuffer to store the last 500k log entries
Ronnie Sahlberg [Fri, 4 Dec 2009 00:36:27 +0000 (11:36 +1100)]
Use statically allocated ringbuffer to store the last 500k log entries
in memory instead of dynamically allocated ones so that we reduce the pressure
on malloc/free.

14 years agoDocument the procedure to remove/change the NATGW configuration at
Ronnie Sahlberg [Thu, 3 Dec 2009 21:33:56 +0000 (08:33 +1100)]
Document the procedure to remove/change the NATGW configuration at
runtime without restarting the ctdb service

14 years agolower the loglevel for the message that a client has attached to a persistent database
Ronnie Sahlberg [Wed, 2 Dec 2009 03:53:21 +0000 (14:53 +1100)]
lower the loglevel for the message that a client has attached to a persistent database

14 years agolower the loglevel for the message that a client has attached through a domian socket
Ronnie Sahlberg [Wed, 2 Dec 2009 03:51:57 +0000 (14:51 +1100)]
lower the loglevel for the message that a client has attached through a domian socket

14 years agoAdd a proper function to process a process-exist control in the daemon.
Ronnie Sahlberg [Wed, 2 Dec 2009 02:58:27 +0000 (13:58 +1100)]
Add a proper function to process a process-exist control in the daemon.

This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.

If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.

bz58185

14 years agoAdd a double linked list to the ctdb_context to store a mapping between client pids...
Ronnie Sahlberg [Wed, 2 Dec 2009 02:41:04 +0000 (13:41 +1100)]
Add a double linked list to the ctdb_context to store a mapping between client pids and client structures.

Add the mapping to the list everytime we accept() a new client connection
and set it up to remove in the destructor when the client structure is freed.

14 years agoUse the PID we pick up from the domain socket when a client connects
Ronnie Sahlberg [Wed, 2 Dec 2009 02:17:12 +0000 (13:17 +1100)]
Use the PID we pick up from the domain socket when a client connects
and store this in the client structure.

There is no need to rely on the hack that samba sends some special message
handle registrations that encodes the pid in the srvid any more.

This might not work on AIX since I recall some issues to get the pid in
this way on that platform.

14 years agoversion 1.0.107 ctdb-1.0.107
Ronnie Sahlberg [Wed, 2 Dec 2009 00:28:42 +0000 (11:28 +1100)]
version 1.0.107

14 years agoctdb_io: fix use-after-free on invalid packets
Rusty Russell [Tue, 1 Dec 2009 22:27:42 +0000 (08:57 +1030)]
ctdb_io: fix use-after-free on invalid packets

Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb.
His fix was to remove the talloc_free() in that function, which causes
loops when a socket is closed (as it does not get removed from the event
system), eg:
netcat 192.168.1.2 4379 < /dev/null

The real bug is that when we have more than one pending packet in the
queue, we loop calling the callback without any safeguards should that
callback free the queue (as it tends to do on invalid packets).  This
can be reproduced by sending more than one bogus packet at once:
# Length word at start: 4 == empty packet (assumed little endian)
/usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt
netcat 192.168.1.2 4379 < /tmp/pkt

Using a destructor we can check if the callback frees us, and exit
immediately.  Elsewhere, we return after the callback anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoversion 1.0.106 ctdb-1.0.106
Ronnie Sahlberg [Wed, 2 Dec 2009 00:26:51 +0000 (11:26 +1100)]
version 1.0.106

14 years agoEventscripts: Fix syntax error in 00.ctdb. martins/status-test-2
Martin Schwenke [Tue, 1 Dec 2009 07:08:57 +0000 (18:08 +1100)]
Eventscripts: Fix syntax error in 00.ctdb.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agopackaging:maketarball.sh: add a DEBIAN_MODE to the tarball creation
Michael Adam [Thu, 26 Nov 2009 07:35:20 +0000 (08:35 +0100)]
packaging:maketarball.sh: add a DEBIAN_MODE to the tarball creation

It is triggered by setting DEBIAN_MODE=yes in the environment.
This creates a tarball suitable for use in debian packages.
The differences from the standard tarball are these:

* The tar ball file is called ctdb_VERSION.orig.tar.gz
* The base directory in the tar ball is ctdb-VERSION.orig/

Michael

14 years agoconfigure:maketarball.sh: call autogen.sh and include configure in the tarball
Michael Adam [Thu, 26 Nov 2009 07:34:44 +0000 (08:34 +0100)]
configure:maketarball.sh: call autogen.sh and include configure in the tarball

Michael

14 years agopackaging:maketarball.sh: create the specfile from the ctdb.spec.in
Michael Adam [Thu, 26 Nov 2009 07:32:24 +0000 (08:32 +0100)]
packaging:maketarball.sh: create the specfile from the ctdb.spec.in

Michael

14 years agoEventscripts: Remove executable bit accidently set on some scripts.
Martin Schwenke [Tue, 1 Dec 2009 06:54:45 +0000 (17:54 +1100)]
Eventscripts: Remove executable bit accidently set on some scripts.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoEventscript argument cleanups and introduction of ctdb_standard_event_handler.
Martin Schwenke [Tue, 1 Dec 2009 06:43:47 +0000 (17:43 +1100)]
Eventscript argument cleanups and introduction of ctdb_standard_event_handler.

The functions file no longer causes a side-effect by doing a shift.
It also doesn't set a convenience variable for $1.

All eventscripts now explicitly use "$1" in their case statement, as
does the initscript.  The absence of a shift means that the
takeip/releaseip events now explicitly reference $2-$4 rather than
$1-$3.

New function ctdb_standard_event_handler handles the status and
setstatus events, and exits for either of those events.  It is called
via a default case in each eventscript, replacing an explicit status
case where applicable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agowhen we detect a ip-allocation mismatch, just force a new ip reassignment
Ronnie Sahlberg [Tue, 1 Dec 2009 05:06:59 +0000 (16:06 +1100)]
when we detect a ip-allocation mismatch, just force a new ip reassignment
instead of a full blown recovery

14 years agoWhen starting up ctdbd, wait until all initial recoveries have finished
Ronnie Sahlberg [Tue, 1 Dec 2009 02:19:58 +0000 (13:19 +1100)]
When starting up ctdbd, wait until all initial recoveries have finished
and until we have gone through a full re-recovery timeout without triggering
any pending recoveries before we start up the services and start monitoring
the node.

14 years agoMerge commit 'martins/status-test-2'
Ronnie Sahlberg [Mon, 30 Nov 2009 23:53:18 +0000 (10:53 +1100)]
Merge commit 'martins/status-test-2'

Conflicts:

server/eventscript.c

14 years agoEvent scripts: functions file now intercepts status and setstatus.
Martin Schwenke [Fri, 27 Nov 2009 04:57:33 +0000 (15:57 +1100)]
Event scripts: functions file now intercepts status and setstatus.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoremove a stray ) so we compile
Ronnie Sahlberg [Fri, 27 Nov 2009 02:35:39 +0000 (13:35 +1100)]
remove a stray )   so we compile

14 years agodont use talloc_steal() on a object that is already a child of ctdb.
Ronnie Sahlberg [Fri, 27 Nov 2009 02:28:31 +0000 (13:28 +1100)]
dont use talloc_steal() on a object that is already a child of ctdb.

14 years agoMerge commit 'martins/status-test' into status-test-2
Ronnie Sahlberg [Fri, 27 Nov 2009 01:50:45 +0000 (12:50 +1100)]
Merge commit 'martins/status-test' into status-test-2

14 years agoMerge commit 'martins-svart/status-test-2' into status-test
Martin Schwenke [Fri, 27 Nov 2009 01:49:31 +0000 (12:49 +1100)]
Merge commit 'martins-svart/status-test-2' into status-test

14 years agoEvent script infrastructure: add reload event to check_options().
Martin Schwenke [Fri, 27 Nov 2009 01:04:02 +0000 (12:04 +1100)]
Event script infrastructure: add reload event to check_options().

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoMerge commit 'martins/status-test' into status-test-2
Ronnie Sahlberg [Thu, 26 Nov 2009 05:26:25 +0000 (16:26 +1100)]
Merge commit 'martins/status-test' into status-test-2

14 years agoMerge commit 'martins-svart/status-test-2' into status-test
Martin Schwenke [Thu, 26 Nov 2009 05:25:15 +0000 (16:25 +1100)]
Merge commit 'martins-svart/status-test-2' into status-test

14 years agoAdd flag to ctdb_event_script_callback indicating when called by client.
Martin Schwenke [Thu, 26 Nov 2009 04:49:49 +0000 (15:49 +1100)]
Add flag to ctdb_event_script_callback indicating when called by client.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoresolve some conflicts from merging from martins branch
Ronnie Sahlberg [Thu, 26 Nov 2009 02:42:12 +0000 (13:42 +1100)]
resolve some conflicts from merging from martins branch

14 years agochange the lock wait child handling to use a pipe isntead of a socketpair
Ronnie Sahlberg [Thu, 26 Nov 2009 01:08:35 +0000 (12:08 +1100)]
change the lock wait child handling to use a pipe isntead of a socketpair

remove a stray alarm(30) that caused databases to be unlocked after 30 seconds.

14 years agoMerge commit 'martins-svart/status-test-2' into status-test
Martin Schwenke [Wed, 25 Nov 2009 23:49:47 +0000 (10:49 +1100)]
Merge commit 'martins-svart/status-test-2' into status-test

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoEvent scripts: use $script_name rather than $service name for status.
Martin Schwenke [Wed, 25 Nov 2009 05:42:14 +0000 (16:42 +1100)]
Event scripts: use $script_name rather than $service name for status.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoEvent scripts: Respect CTDB_MANAGES_NFS and add function log_status_cat.
Martin Schwenke [Wed, 25 Nov 2009 05:34:49 +0000 (16:34 +1100)]
Event scripts: Respect CTDB_MANAGES_NFS and add function log_status_cat.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoMore eventscript cleanups. Initial smoke testing seems OK.
Martin Schwenke [Fri, 20 Nov 2009 05:45:36 +0000 (16:45 +1100)]
More eventscript cleanups.  Initial smoke testing seems OK.

Apart from lots of cleanup work, this also fixes a bug where the share
checks didn't used to cope with directory names containing spaces.
The previous commit also loaded the config incorrectly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agouse a binary tree and sort all ipv4/v6 addresses before we assign them out on nodes.
Ronnie Sahlberg [Wed, 25 Nov 2009 00:54:40 +0000 (11:54 +1100)]
use a binary tree and sort all ipv4/v6 addresses before we assign them out on nodes.

14 years agoeventscript: check that ctdb forced script events correct
Rusty Russell [Wed, 25 Nov 2009 00:32:29 +0000 (11:02 +1030)]
eventscript: check that ctdb forced script events correct

Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoiIt is better to plainly disallow clietnts from connecting here
Ronnie Sahlberg [Tue, 24 Nov 2009 21:03:42 +0000 (08:03 +1100)]
iIt is better to plainly disallow clietnts from connecting here
if the node is BANNED.
Dont even let them attach at all
to the database

Revert "temporarily try allowing clients to attach to databases even if
the node is banned/stopped or inactive in any other way."

This reverts commit 227fe99f105bdc3a4f1000f238cbe3adeb3f22f0.

14 years agoMerge commit 'origin/status-test' into status-test
Martin Schwenke [Tue, 24 Nov 2009 05:14:54 +0000 (16:14 +1100)]
Merge commit 'origin/status-test' into status-test

14 years agoeventscript: check that ctdb forced script events correct
Rusty Russell [Tue, 24 Nov 2009 00:54:22 +0000 (11:24 +1030)]
eventscript: check that ctdb forced script events correct

Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: check that internal script events are being invoked correctly
Rusty Russell [Tue, 24 Nov 2009 00:53:13 +0000 (11:23 +1030)]
eventscript: check that internal script events are being invoked correctly

This is not as good as a compile-time check, but at least we count the
number of arguments are correct.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: check that internal script events are being invoked correctly
Rusty Russell [Tue, 24 Nov 2009 00:53:13 +0000 (11:23 +1030)]
eventscript: check that internal script events are being invoked correctly

This is not as good as a compile-time check, but at least we count the
number of arguments are correct.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: remove call name from state->options
Rusty Russell [Tue, 24 Nov 2009 00:52:46 +0000 (11:22 +1030)]
eventscript: remove call name from state->options

Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.

This is clearer, and mainly involves adding some %s to debug statements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: remove call name from state->options
Rusty Russell [Tue, 24 Nov 2009 00:52:46 +0000 (11:22 +1030)]
eventscript: remove call name from state->options

Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.

This is clearer, and mainly involves adding some %s to debug statements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: put call type into state struct.
Rusty Russell [Tue, 24 Nov 2009 00:49:58 +0000 (11:19 +1030)]
eventscript: put call type into state struct.

This means we can get rid of more strcmp; they can simply use the
state->call value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: put call type into state struct.
Rusty Russell [Tue, 24 Nov 2009 00:49:58 +0000 (11:19 +1030)]
eventscript: put call type into state struct.

This means we can get rid of more strcmp; they can simply use the
state->call value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: introduce enum for different event script calls.
Rusty Russell [Tue, 24 Nov 2009 00:46:49 +0000 (11:16 +1030)]
eventscript: introduce enum for different event script calls.

Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: introduce enum for different event script calls.
Rusty Russell [Tue, 24 Nov 2009 00:46:49 +0000 (11:16 +1030)]
eventscript: introduce enum for different event script calls.

Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: put timeout inside ctdb_event_script_callback_v
Rusty Russell [Tue, 24 Nov 2009 00:39:46 +0000 (11:09 +1030)]
eventscript: put timeout inside ctdb_event_script_callback_v

Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: put timeout inside ctdb_event_script_callback_v
Rusty Russell [Tue, 24 Nov 2009 00:39:46 +0000 (11:09 +1030)]
eventscript: put timeout inside ctdb_event_script_callback_v

Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: cleanup ctdb_event_script_v
Rusty Russell [Tue, 24 Nov 2009 00:39:01 +0000 (11:09 +1030)]
eventscript: cleanup ctdb_event_script_v

ctdb_event_script_v doesn't take varargs.  ctdb_run_event_script is
a better name, and fix comment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: typo cleanups
Rusty Russell [Tue, 24 Nov 2009 00:38:39 +0000 (11:08 +1030)]
eventscript: typo cleanups

1) ctdb_event_script_v doesn't take varargs.  ctdb_run_event_script is
   a better name, and fix comment.
2) Fix indentation on allowed_scripts.
3) Comment on run_eventscripts_callback is wrong; it's the callback
   for any ctdb forced event.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: fix bug in timeouts on forced eventscripts. Again.
Rusty Russell [Tue, 24 Nov 2009 00:36:53 +0000 (11:06 +1030)]
eventscript: fix bug in timeouts on forced eventscripts.  Again.

In 15bc66ae801b0c69, Ronnie fixed a double-free race.  The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback.  This particularly
hurt in ctdb_event_script_timeout.

There's nothing wrong with this, but obviously we should make the callback
call last of all.  At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.

This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again.  But the same change introduced
a direct use-after-free bug which caused an occasional oops.

So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.

But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix!  Let's try to fix this in a more permanent way, but always doing
the callback from the destructor.  This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.

Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: fix bug in timeouts on forced eventscripts. Again.
Rusty Russell [Tue, 24 Nov 2009 00:36:53 +0000 (11:06 +1030)]
eventscript: fix bug in timeouts on forced eventscripts.  Again.

In 15bc66ae801b0c69, Ronnie fixed a double-free race.  The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback.  This particularly
hurt in ctdb_event_script_timeout.

There's nothing wrong with this, but obviously we should make the callback
call last of all.  At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.

This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again.  But the same change introduced
a direct use-after-free bug which caused an occasional oops.

So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.

But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix!  Let's try to fix this in a more permanent way, but always doing
the callback from the destructor.  This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.

Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: clean up forked handler event code
Rusty Russell [Tue, 24 Nov 2009 00:30:13 +0000 (11:00 +1030)]
eventscript: clean up forked handler event code

Write the whole int through the pipe, rather than quietly cutting it
off.  Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.

Add a comment about why we don't need to check the write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agorework and simplify the eventscript handling
Ronnie Sahlberg [Wed, 25 Nov 2009 00:30:11 +0000 (11:00 +1030)]
rework and simplify the eventscript handling

This version has no trailing whitespace, and fixed

14 years agoeventscript: clean up forked handler event code
Rusty Russell [Tue, 24 Nov 2009 00:30:13 +0000 (11:00 +1030)]
eventscript: clean up forked handler event code

Write the whole int through the pipe, rather than quietly cutting it
off.  Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.

Add a comment about why we don't need to check the write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoreduce the log level for three vacuuming related log messages
Ronnie Sahlberg [Mon, 23 Nov 2009 22:27:22 +0000 (09:27 +1100)]
reduce the log level for three vacuuming related log messages

14 years agorework and simplify the eventscript handling
Ronnie Sahlberg [Mon, 23 Nov 2009 20:40:51 +0000 (07:40 +1100)]
rework and simplify the eventscript handling

14 years agoMore eventscript cleanups. Initial smoke testing seems OK.
Martin Schwenke [Fri, 20 Nov 2009 05:45:36 +0000 (16:45 +1100)]
More eventscript cleanups.  Initial smoke testing seems OK.

Apart from lots of cleanup work, this also fixes a bug where the share
checks didn't used to cope with directory names containing spaces.
The previous commit also loaded the config incorrectly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoNow vaguely tested initscript updates.
Martin Schwenke [Thu, 19 Nov 2009 05:48:19 +0000 (16:48 +1100)]
Now vaguely tested initscript updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoMore untested eventscript factorisation.
Martin Schwenke [Thu, 19 Nov 2009 04:00:17 +0000 (15:00 +1100)]
More untested eventscript factorisation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoTest suite: Make the CIFS tickle test wait until it sees the required tickle.
Martin Schwenke [Thu, 19 Nov 2009 03:54:05 +0000 (14:54 +1100)]
Test suite: Make the CIFS tickle test wait until it sees the required tickle.

The test depended on the exit code of "ctdb gettickles", which always
succeeds.  This change wraps the command in a function that checks
whether the tickle we're interested in is registered.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agonew version 1.0.105 ctdb-1.0.105
Ronnie Sahlberg [Thu, 19 Nov 2009 00:08:14 +0000 (11:08 +1100)]
new version 1.0.105

14 years agodont reset the event script context everytime we start a new "ctdb eventscript ..."
Ronnie Sahlberg [Thu, 19 Nov 2009 00:03:51 +0000 (11:03 +1100)]
dont reset the event script context everytime we start a new "ctdb eventscript ..."
command.
Use the existing context used for non-monitor events

Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV

14 years agomake the ringbuffer logging more efficient and marshall the data by writing to a...
Ronnie Sahlberg [Wed, 18 Nov 2009 08:10:50 +0000 (19:10 +1100)]
make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob

14 years agoadd an in memory ringbuffer where we store the last 500000 log entries regardless...
Ronnie Sahlberg [Wed, 18 Nov 2009 01:44:18 +0000 (12:44 +1100)]
add an in memory ringbuffer where we store the last 500000 log entries regardless of log level.

add commandt to extract this in memory buffer and to clear it

14 years agocreate a new event context for the syslog daemon
Ronnie Sahlberg [Tue, 17 Nov 2009 01:07:10 +0000 (12:07 +1100)]
create a new event context for the syslog daemon

14 years agoset up a pipe betweent he main daemon and the child we use for syslogling so that...
Ronnie Sahlberg [Mon, 16 Nov 2009 04:17:32 +0000 (15:17 +1100)]
set up a pipe betweent he main daemon and the child we use for syslogling so that we can clean up the childprocess when we stop ctdbd

14 years agoEventscripts: Untested factorisations and introduction of status event.
Martin Schwenke [Fri, 13 Nov 2009 07:28:25 +0000 (18:28 +1100)]
Eventscripts: Untested factorisations and introduction of status event.

This is the first stage of an experimental change to eventscripts.
Ronnie and I did a few hours of factorisation of 40.vsftpd and applied
many of the changes to 41.httpd.  Other eventscripts were also
modified.

At this stage this is completely untested.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agotest of a change to make ctdbd use "status" event instead of the "monitor" event.
Ronnie Sahlberg [Fri, 13 Nov 2009 01:37:55 +0000 (12:37 +1100)]
test of a change to make ctdbd use "status" event instead of the "monitor" event.

This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.

14 years agoMerge commit 'martins/master'
Ronnie Sahlberg [Fri, 13 Nov 2009 01:25:31 +0000 (12:25 +1100)]
Merge commit 'martins/master'

14 years agoTest suite: Fix the NFS and CIFS tickle tests.
Martin Schwenke [Thu, 12 Nov 2009 22:44:34 +0000 (09:44 +1100)]
Test suite: Fix the NFS and CIFS tickle tests.

The NFS test sleeps for MonitorInterval to give CTDB time to record an
NFS tickle.  However, this isn't always long enough.  This changes the
test to wait until a monitor event has actually occurred.

The CIFS test assumes that Samba is able to register a tickle with
CTDB before it notices that netstat has registered the tickle and can
use onnode to ask CTDB about it.  That is an incorrect assumption -
sometimes we can get to the point of asking CTDB about the tickle
before Samba and CTDB have processed it.  This adds a timeout loop
that makes the CIFS test wait until the tickle has been registered or
fail after 10 seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoMerge commit 'origin/master'
Martin Schwenke [Wed, 11 Nov 2009 01:16:30 +0000 (12:16 +1100)]
Merge commit 'origin/master'

14 years agoFix bashism in events.d/11.natgw rusty/master-rebase
Mathieu Parent [Tue, 10 Nov 2009 11:04:13 +0000 (12:04 +0100)]
Fix bashism in events.d/11.natgw

Signed-off-by: Michael Adam <obnox@samba.org>
14 years agoversion 1.0.104 ctdb-1.0.104
Ronnie Sahlberg [Fri, 6 Nov 2009 00:16:05 +0000 (11:16 +1100)]
version 1.0.104

14 years agosugegstion from metze,
Ronnie Sahlberg [Thu, 5 Nov 2009 22:54:03 +0000 (09:54 +1100)]
sugegstion from metze,
use killtcp and kill both directions of the nfs connections.
we used to kill only one direction since the other direction was unkillble
but recent kernels allow us to kill both