Ronnie Sahlberg [Thu, 11 Mar 2010 06:57:30 +0000 (17:57 +1100)]
New version 1.0.82-20
* Wed Mar 11 2010 : Version 1.0.82-20
- From Christian Ambach : reduce loglevel for vacuuming message
Christian Ambach [Wed, 10 Mar 2010 17:46:15 +0000 (18:46 +0100)]
adjust a vacuum log level
made the severity of the decreasing interval log level the same as for the increasing,
they are both just info logs because they don't report errors
Ronnie Sahlberg [Mon, 11 Jan 2010 22:49:45 +0000 (09:49 +1100)]
version 1.0.82-19
Christian Ambach [Tue, 8 Dec 2009 18:23:19 +0000 (19:23 +0100)]
reduce vacuuming lognoise
syslog.h says:
LOG_NOTICE 5 normal but significant condition
LOG_INFO 6 informational
several vacuuming related logs logged at NOTICE level although I don't see
any real significance, these are just informational messages for me
Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>
Ronnie Sahlberg [Mon, 23 Nov 2009 22:27:22 +0000 (09:27 +1100)]
reduce the log level for three vacuuming related log messages (cherry picked from commit
fbc453733d53359b9eba34a7ca9123237a7ecca5)
Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>
Ronnie Sahlberg [Thu, 5 Nov 2009 05:07:23 +0000 (16:07 +1100)]
dont use the pointer after it has been talloc_free()d. (cherry picked from commit
1cbf06a126621b3e932925cdad2ef9c009f93d4e)
Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>
Ronnie Sahlberg [Sun, 25 Oct 2009 22:35:18 +0000 (09:35 +1100)]
lower the log level of a debug message (cherry picked from commit
496dc2e80b714811c6e69dc928deaad61cf603b1)
Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>
Ronnie Sahlberg [Tue, 20 Oct 2009 02:01:15 +0000 (13:01 +1100)]
From Wolfgang Mueller
make sure to always create the vactun database and get rid of some annoying log messages
(cherry picked from commit
54f9c314a0354f1039208fe6ac7dc159b6db8750)
Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>
Ronnie Sahlberg [Tue, 8 Dec 2009 00:54:22 +0000 (11:54 +1100)]
version 1.0.82-18
Rusty Russell [Tue, 1 Dec 2009 22:27:42 +0000 (08:57 +1030)]
ctdb_io: fix use-after-free on invalid packets
Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb.
His fix was to remove the talloc_free() in that function, which causes
loops when a socket is closed (as it does not get removed from the event
system), eg:
netcat 192.168.1.2 4379 < /dev/null
The real bug is that when we have more than one pending packet in the
queue, we loop calling the callback without any safeguards should that
callback free the queue (as it tends to do on invalid packets). This
can be reproduced by sending more than one bogus packet at once:
# Length word at start: 4 == empty packet (assumed little endian)
/usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt
netcat 192.168.1.2 4379 < /tmp/pkt
Using a destructor we can check if the callback frees us, and exit
immediately. Elsewhere, we return after the callback anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ronnie Sahlberg [Tue, 8 Dec 2009 00:52:20 +0000 (11:52 +1100)]
Revert "prevent doubly freeing memory on invalid packet"
This reverts commit
1975b53b5ea608d60b8b3e527a435a7c817a97ea.
Ronnie Sahlberg [Fri, 6 Nov 2009 08:48:12 +0000 (19:48 +1100)]
version 1.0.82-16
Ronnie Sahlberg [Fri, 6 Nov 2009 08:45:07 +0000 (19:45 +1100)]
From Wolfgang Mueller
Backport a patch from head for updating nodes flags
also make sure that when a node repotrs incinsistent node flags, we mark it as a culprit so that if it insists in retaining an opinion not accepted in the cluster groupthink we ban it quickly.
root [Thu, 5 Nov 2009 15:53:26 +0000 (16:53 +0100)]
prevent doubly freeing memory on invalid packet
Ronnie Sahlberg [Thu, 29 Oct 2009 03:13:49 +0000 (14:13 +1100)]
version 1.0.82-15
Ronnie Sahlberg [Thu, 29 Oct 2009 03:11:57 +0000 (14:11 +1100)]
1.0.82 does nto have the getreclock command, so just use the variable from the sysconfig file
Ronnie Sahlberg [Thu, 29 Oct 2009 03:10:09 +0000 (14:10 +1100)]
From Wolfgang M
Add stronger tests for valid filenames when opening all persistent databases
Wolfgang Mueller-Friedt [Wed, 28 Oct 2009 10:01:27 +0000 (13:01 +0300)]
vacuuming needed additional check before getting rid of the record; there is a gap between selecting the records and deleting them, therefore we have to check if the records still can be deleted when we actually are about to delete them
Ronnie Sahlberg [Thu, 29 Oct 2009 02:58:33 +0000 (13:58 +1100)]
Revert "From Wolfgang Mueller,"
This reverts commit
b74ba9a75db431a59b0e68a4b57c48d8c28221d5.
Ronnie Sahlberg [Tue, 13 Oct 2009 21:25:44 +0000 (08:25 +1100)]
New version 1.0.82-3
Ronnie Sahlberg [Tue, 13 Oct 2009 21:23:49 +0000 (08:23 +1100)]
From Wolfgang Mueller,
when we detect a dmaster migration error, create a recovery to replair the databases instead of calling ctdb_fatal()
Ronnie Sahlberg [Tue, 13 Oct 2009 09:58:40 +0000 (20:58 +1100)]
version 1.0.82-12
Ronnie Sahlberg [Tue, 13 Oct 2009 09:57:03 +0000 (20:57 +1100)]
From Volker L
A less intrusive deadlock prevention workaround
Ronnie Sahlberg [Tue, 13 Oct 2009 09:56:08 +0000 (20:56 +1100)]
Revert "add a control to set a database priority. Let newly created databases default to priority 1."
This reverts commit
808b4a6ed9f6122a958146ef6ac6665f0e75fe32.
Ronnie Sahlberg [Tue, 13 Oct 2009 09:56:00 +0000 (20:56 +1100)]
Revert "add a control to read the db priority from a database"
This reverts commit
17516689570e352d5df1b5a6bae3d7c7e4bb5662.
Ronnie Sahlberg [Tue, 13 Oct 2009 09:55:52 +0000 (20:55 +1100)]
Revert "during recovery, update all remote nodes so they use the same priorities"
This reverts commit
9bbce3e37e213080e974afea551e9147a43a44af.
Ronnie Sahlberg [Tue, 13 Oct 2009 09:55:43 +0000 (20:55 +1100)]
Revert "uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon."
This reverts commit
a92210bc9572851df327862b325376d56470823c.
Ronnie Sahlberg [Tue, 13 Oct 2009 09:55:36 +0000 (20:55 +1100)]
Revert "initial attempt at freezing databases in priority order"
This reverts commit
499e781b065f5195e021f33d428503b59b2189b8.
Ronnie Sahlberg [Tue, 13 Oct 2009 09:55:28 +0000 (20:55 +1100)]
Revert "allow setting the recmode even when not completely frozen."
This reverts commit
575210997f7e9aebc721d584b00bf7def15ab600.
Ronnie Sahlberg [Tue, 13 Oct 2009 09:55:18 +0000 (20:55 +1100)]
Revert "new version 1.0.82-11"
This reverts commit
5b470e788f9c40f5b510e62fbfd1b990779f2c57.
Ronnie Sahlberg [Mon, 12 Oct 2009 02:56:27 +0000 (13:56 +1100)]
new version 1.0.82-11
Ronnie Sahlberg [Mon, 12 Oct 2009 02:06:16 +0000 (13:06 +1100)]
allow setting the recmode even when not completely frozen.
we sometimes have to do this when we want to trigger a recovery
Ronnie Sahlberg [Mon, 12 Oct 2009 01:08:39 +0000 (12:08 +1100)]
initial attempt at freezing databases in priority order
Ronnie Sahlberg [Sun, 11 Oct 2009 22:22:17 +0000 (09:22 +1100)]
uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon.
this is encoded in the srvid field of the request header
Ronnie Sahlberg [Sat, 10 Oct 2009 05:28:20 +0000 (16:28 +1100)]
during recovery, update all remote nodes so they use the same priorities
for the databases as this node.
Ronnie Sahlberg [Sat, 10 Oct 2009 04:04:18 +0000 (15:04 +1100)]
add a control to read the db priority from a database
Ronnie Sahlberg [Sat, 10 Oct 2009 03:26:09 +0000 (14:26 +1100)]
add a control to set a database priority. Let newly created databases default to priority 1.
database priorities will be used to control in which order databases are locked during recovery in.
Ronnie Sahlberg [Thu, 8 Oct 2009 09:03:54 +0000 (20:03 +1100)]
version 1.0.82-10
Ronnie Sahlberg [Thu, 8 Oct 2009 05:45:25 +0000 (16:45 +1100)]
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned
Ronnie Sahlberg [Fri, 2 Oct 2009 03:47:54 +0000 (13:47 +1000)]
new version 1.0.82-9
Ronnie Sahlberg [Fri, 2 Oct 2009 03:41:54 +0000 (13:41 +1000)]
we should close this file on exec
Ronnie Sahlberg [Fri, 2 Oct 2009 02:11:26 +0000 (12:11 +1000)]
new version 1.0.82-8
Ronnie Sahlberg [Tue, 29 Sep 2009 03:20:18 +0000 (13:20 +1000)]
From Wolfgang Mueller-Friedt
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.
Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.
this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.
minor stylish changes made by ronnie s
Ronnie Sahlberg [Wed, 29 Jul 2009 03:31:12 +0000 (13:31 +1000)]
change the defaults for repacking to repack once every 120 seconds and letting it work for 30 second before timing out.
Wolfgang Mueller-Friedt [Tue, 28 Jul 2009 20:09:28 +0000 (23:09 +0300)]
repack limit tunable
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
Wolfgang Mueller-Friedt [Tue, 28 Jul 2009 14:49:41 +0000 (17:49 +0300)]
remove repack from eventscript
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
Wolfgang Mueller-Friedt [Tue, 28 Jul 2009 14:45:31 +0000 (17:45 +0300)]
added event repacking
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
Ronnie Sahlberg [Thu, 23 Jul 2009 06:03:39 +0000 (16:03 +1000)]
vacuum event framework
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
Ronnie Sahlberg [Wed, 29 Jul 2009 03:25:43 +0000 (13:25 +1000)]
initial part of new vacuuming patch.
create some new fields for ctdb_db and tunables
Martin Schwenke [Wed, 30 Sep 2009 11:21:56 +0000 (21:21 +1000)]
Minor fixes to 01.reclock eventscript.
test -z really needs its argument to be quoted. Simplified a status
test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 28 Sep 2009 04:12:59 +0000 (14:12 +1000)]
change the reclock fail count to 19 monitor intervals before we shut down ctdbd
Ronnie Sahlberg [Mon, 28 Sep 2009 04:06:40 +0000 (14:06 +1000)]
add a new eventscript 01.reclock
if the reclock file has been set, then this script will test that the
reclock file can actually be accessed.
if the file does not exist, or if the attempts to stat the file hangs,
the node will be marked unhealthy after the third failed monitoring event
and after the tenth failure, ctdb itself will shutdown.
Ronnie Sahlberg [Mon, 27 Jul 2009 03:10:32 +0000 (13:10 +1000)]
new version 1.0.82-7
Ronnie Sahlberg [Tue, 30 Jun 2009 02:17:05 +0000 (12:17 +1000)]
dont try sending a keepalive if the transport is down
Ronnie Sahlberg [Tue, 30 Jun 2009 02:16:13 +0000 (12:16 +1000)]
Dont even try allocating and sending a CALL packet if the transport is down
Ronnie Sahlberg [Tue, 30 Jun 2009 02:14:58 +0000 (12:14 +1000)]
failing a dmaster send due to the transport being down is fatal
Ronnie Sahlberg [Tue, 30 Jun 2009 02:13:15 +0000 (12:13 +1000)]
if we fail a dmaster migration due to the transport being down, then that is a fatal condition.
Ronnie Sahlberg [Tue, 30 Jun 2009 02:10:27 +0000 (12:10 +1000)]
dont try to send error packets if the transport is down
Ronnie Sahlberg [Tue, 30 Jun 2009 02:09:28 +0000 (12:09 +1000)]
dont even try to send a message from the main daemon if the transport is down
Ronnie Sahlberg [Tue, 30 Jun 2009 02:03:12 +0000 (12:03 +1000)]
Dont try to allocate and send packets if the transport is down
Ronnie Sahlberg [Tue, 30 Jun 2009 01:55:42 +0000 (11:55 +1000)]
dont even try to allocate a packet if the transport is down since it will fail
Ronnie Sahlberg [Tue, 23 Jun 2009 01:29:26 +0000 (11:29 +1000)]
rename 99.routing to 11.routing so that it executed before the service scripts
Ronnie Sahlberg [Tue, 14 Jul 2009 00:54:05 +0000 (10:54 +1000)]
new version 1.0.82-6
Ronnie Sahlberg [Mon, 18 May 2009 22:55:42 +0000 (08:55 +1000)]
Change the loglevel of "registered tcp client for ..." to INFO
instead of ERR
Ronnie Sahlberg [Wed, 10 Jun 2009 00:35:32 +0000 (10:35 +1000)]
new version 1.0.82-5
Ronnie Sahlberg [Wed, 10 Jun 2009 00:28:47 +0000 (10:28 +1000)]
When we ban a node, only drop the IPs on the node being banned, not on every node
Ronnie Sahlberg [Tue, 9 Jun 2009 02:33:06 +0000 (12:33 +1000)]
new version 1.0.82-4
Ronnie Sahlberg [Tue, 9 Jun 2009 02:31:36 +0000 (12:31 +1000)]
dont remove the socket when the dameon stops. This can race if the
service is immediately restarted
Conflicts:
server/ctdb_daemon.c
Ronnie Sahlberg [Tue, 2 Jun 2009 09:44:51 +0000 (19:44 +1000)]
new version 1.0.82-3
Ronnie Sahlberg [Tue, 2 Jun 2009 09:43:47 +0000 (19:43 +1000)]
make ctdb statistics machinereadable
Ronnie Sahlberg [Tue, 2 Jun 2009 07:59:03 +0000 (17:59 +1000)]
new version 1.0.82-2
Ronnie Sahlberg [Tue, 2 Jun 2009 07:56:20 +0000 (17:56 +1000)]
Add -Y machinereadable output to ctdb listvars and ctdb getvar
Ronnie Sahlberg [Thu, 14 May 2009 00:33:25 +0000 (10:33 +1000)]
Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
Ronnie Sahlberg [Wed, 13 May 2009 22:55:40 +0000 (08:55 +1000)]
new version 1.0.82
Ronnie Sahlberg [Wed, 13 May 2009 22:55:05 +0000 (08:55 +1000)]
use scope host when adding the interface to loopback so we dont respond to ARPs for this ip
Ronnie Sahlberg [Wed, 13 May 2009 22:12:48 +0000 (08:12 +1000)]
change the prefix NATGW_ to CTDB_NATGW_
Michael Adam [Tue, 12 May 2009 05:56:23 +0000 (07:56 +0200)]
ping pong: fix logic for mmap reads vs. preads
Michael
Michael Adam [Tue, 12 May 2009 20:59:35 +0000 (22:59 +0200)]
maketarball.sh: add GPL license header
Michael
Michael Adam [Tue, 12 May 2009 20:59:08 +0000 (22:59 +0200)]
makerpms.sh: add GPL license header
Michael
Michael Adam [Thu, 26 Mar 2009 18:03:03 +0000 (19:03 +0100)]
Remove generated binary files.
Noted by Mathieu Parent <math.parent@gmail.com>
Michael
Ronnie Sahlberg [Tue, 12 May 2009 08:21:26 +0000 (18:21 +1000)]
remove NATGW_PRIVATE_IFACE from the documentation since we do not need
it any more.
Ronnie Sahlberg [Tue, 12 May 2009 08:42:13 +0000 (18:42 +1000)]
assign the natgw address to loopback and not the private network so that natgw will still work even when public and private networks are one and the same
Ronnie Sahlberg [Tue, 12 May 2009 08:39:34 +0000 (18:39 +1000)]
add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging.
Ronnie Sahlberg [Tue, 12 May 2009 08:32:41 +0000 (18:32 +1000)]
check that a node is banned before trying to unban it.
Martin Schwenke [Fri, 3 Apr 2009 01:54:26 +0000 (12:54 +1100)]
In 51_ctdb_bench.sh now allows a 2% difference between positive and
negative. ctdb_bench.c checks to ensure the timer has advanced from 0
before dividing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 21 Apr 2009 06:50:37 +0000 (16:50 +1000)]
Avoid floating point divide by 0 in ctdb_fetch.c's bench_fetch().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 1 May 2009 07:40:45 +0000 (17:40 +1000)]
Bug fixes for tests: simple/12_ctdb_getdebug.sh and scripts/test_wrap.
simple/12_ctdb_getdebug.sh now recognises output with multi-digit node
numbers.
Sharing the ctdb directory via NFS and testing on a real cluster by
setting CTDB_TEST_REAL_CLUSTER didn't work by default. The fix is to
hack scripts/test_wrap so that it tries to find a valid bin directory
next to the directory containing it is in.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 11 May 2009 22:59:49 +0000 (08:59 +1000)]
From: Sumit Bose <sbose@redhat.com>
fix handling of AC_INIT
Martin Schwenke [Mon, 11 May 2009 04:43:17 +0000 (14:43 +1000)]
Fix lvsmaster and natgwlist nodespecs.
They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 11 May 2009 04:14:11 +0000 (14:14 +1000)]
Updated onnode docs to reflect recent changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 11 May 2009 03:39:31 +0000 (13:39 +1000)]
New lvs/lvsmaster and natgw/natgwlist nodespecs for onnode.
Some code re-factoring to implement this and to make it easy to
implement new ones. New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 6 May 2009 03:17:34 +0000 (13:17 +1000)]
New option "-o <prefix>" saves stdout from each node to file <prefix>.<ip>.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 5 May 2009 06:02:30 +0000 (16:02 +1000)]
Use ctdb_fetch_lock rather than ctdb_call.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 11 May 2009 04:50:28 +0000 (14:50 +1000)]
41.httpd event script workaround for RHEL5-ism.
RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores. This means that eventually a node runs out of semaphores
and httpd can't be started. So, before we attempt to start httpd we
clean up any semaphores owned by apache. We also try to restart httpd
in the monitor event if httpd has gone away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 11 May 2009 04:44:59 +0000 (14:44 +1000)]
Add a -Y machinereadable flag to "lvsmaster"
Ronnie Sahlberg [Mon, 11 May 2009 03:56:28 +0000 (13:56 +1000)]
in the "lvsmaster" command, return -1 if there is no lvsmaster
Ronnie Sahlberg [Fri, 8 May 2009 07:29:57 +0000 (17:29 +1000)]
new version 1.0.81
Ronnie Sahlberg [Wed, 6 May 2009 10:32:39 +0000 (20:32 +1000)]
From: Sumit Bose <sbose@redhat.com>
fix handling of AC_INIT and read version from ctdb.spec
Michael Adam [Tue, 5 May 2009 11:16:38 +0000 (13:16 +0200)]
ping_pong: add GPL comment header with Tridge's copyright
Michael
Michael Adam [Wed, 29 Apr 2009 22:35:55 +0000 (00:35 +0200)]
ping_pong: get pread/pwrite prototypes from unistd.h
by defining _XOPEN_SOURCE to be 500 before including headers
Michael