Mathieu Parent [Thu, 27 Aug 2009 21:36:07 +0000 (23:36 +0200)]
Fix bashisms in samba event script.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
0310a6b17d6167c46482a07c6cd96bcabda6ffbc)
Mathieu Parent [Thu, 27 Aug 2009 21:35:41 +0000 (23:35 +0200)]
Fix bashisms in multipathd event script.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
13b81b6c8e01aa52a31756ecffa797a4761115db)
Mathieu Parent [Thu, 27 Aug 2009 21:35:03 +0000 (23:35 +0200)]
Fix bashism in natgw eventscript.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
4fad47c1af8503385b090be281ffbd284021279c)
Ronnie Sahlberg [Wed, 9 Sep 2009 02:50:55 +0000 (12:50 +1000)]
allow the transaction commit to fail
(This used to be ctdb commit
7a6134e684c9ac4763bf198ef1410867b6082c94)
Ronnie Sahlberg [Wed, 9 Sep 2009 02:50:21 +0000 (12:50 +1000)]
Merge commit 'martins/master'
(This used to be ctdb commit
12e14a09dd28ed005c8eb8fca7cd38a96aab938e)
Martin Schwenke [Wed, 9 Sep 2009 02:48:40 +0000 (12:48 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
e978b274a6af94ea7734675243ec65c5b17a583d)
Ronnie Sahlberg [Wed, 9 Sep 2009 02:48:21 +0000 (12:48 +1000)]
dont check if commit failed, we do allow the commit to fail sometimes
(This used to be ctdb commit
affa6f47432507e84b7e76b88a2c27fff8e6e2e4)
Ronnie Sahlberg [Wed, 9 Sep 2009 00:57:39 +0000 (10:57 +1000)]
dont force an election just because the ban flag differs across the cluster.
a simple push to resync this flag is sufficient
(This used to be ctdb commit
8903b858ddd3a016d9cf765187839814443a67ca)
Martin Schwenke [Tue, 8 Sep 2009 05:19:24 +0000 (15:19 +1000)]
Document onnode "onnode any".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
f6cf586d8f6384e48215b7d2c20fb83e98504878)
Martin Schwenke [Tue, 8 Sep 2009 05:10:20 +0000 (15:10 +1000)]
onnode: add "any" nodespec to select any node with running CTDB.
In testing and other situations (e.g. eventscripts) it is necessary to
select a node where a ctdb command can be run. The whole idea here is
to avoid nodes where ctdbd is not running and where most ctdb commands
would fail. This implements a standard way of doing this involving a
recursive onnode command.
There is still a small window for a race, where the selected node is
suddenly shutdown, but this is unavoidable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
fb47cce86c0edae5caaf485f13ae7a151b6cb00d)
Martin Schwenke [Mon, 7 Sep 2009 05:29:34 +0000 (15:29 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
10ebeb215e7260186dab8f4f2403c48db9df9a00)
Ronnie Sahlberg [Thu, 3 Sep 2009 18:09:30 +0000 (04:09 +1000)]
lower the loglevel for the info messages that a public ip is not hosted locally for takeip/releaseip
(This used to be ctdb commit
f76132b0d555e52ee0a379ec2c156350b37b0280)
Ronnie Sahlberg [Thu, 3 Sep 2009 17:05:37 +0000 (03:05 +1000)]
new version 1.0.89
(This used to be ctdb commit
46823aa7c673bc18a1424500b3f01da9c2dd6333)
Ronnie Sahlberg [Thu, 3 Sep 2009 16:59:24 +0000 (02:59 +1000)]
make it possible to have ctdb manage (start/stop/monitor) winbind without having samba
(This used to be ctdb commit
77574b7d7fe11c8e73957a80845481f3b2a64219)
Ronnie Sahlberg [Thu, 3 Sep 2009 16:00:14 +0000 (02:00 +1000)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
(This used to be ctdb commit
b869bb0e32d32422a5ba6b235864acba07f2b412)
Ronnie Sahlberg [Thu, 3 Sep 2009 16:20:39 +0000 (02:20 +1000)]
new prototype banning code
(This used to be ctdb commit
0c4c2240267af183d54ffd4c0aacda208f6eff6a)
Ronnie Sahlberg [Tue, 1 Sep 2009 18:39:17 +0000 (04:39 +1000)]
overwrite the state file, dont append to it.
dont log errors is trying to delete a nonexisting state file
this eliminates some annoying log entries in the ctdb log
(This used to be ctdb commit
7a95257a5ec19f232f661bc7f797051bf08ab776)
Ronnie Sahlberg [Tue, 1 Sep 2009 17:12:27 +0000 (03:12 +1000)]
redirect stderr to dev null since the rule might not exist when we try to unconditionally delete it
(This used to be ctdb commit
e1d709f32196e19d4041ee2958e143791762e08f)
Michael Adam [Thu, 27 Aug 2009 20:09:42 +0000 (22:09 +0200)]
set broadcast addresses in the takeip event.
Michael
(This used to be ctdb commit
e26d9d32e68e7db1cf4f96c47c0126e9e0b213be)
Ronnie Sahlberg [Thu, 27 Aug 2009 19:19:44 +0000 (05:19 +1000)]
remove a check for the reclock file we dont need
(This used to be ctdb commit
54c047c48902a15e5d2925bfa86e012a11188796)
Martin Schwenke [Thu, 27 Aug 2009 02:35:52 +0000 (12:35 +1000)]
Test suite: fix minor typo in complex/32_cifs_tickle.sh
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
cd65d8acb97aa9f83ff0d0585bf09caef2d2f3eb)
Martin Schwenke [Thu, 27 Aug 2009 02:33:43 +0000 (12:33 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
9bceef2b13fe9560ca02a266ce5a1fbbcef3af22)
Martin Schwenke [Thu, 16 Jul 2009 04:04:06 +0000 (14:04 +1000)]
Test suite: Fix debug code for unexpectedly unhealthy cluster.
The debug code should run "ctdb status" on a cluster node, not on the
test client.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
448cd8db1305c1e6dfab323f92eac4a576596e4e)
Ronnie Sahlberg [Tue, 18 Aug 2009 22:25:50 +0000 (08:25 +1000)]
skip any persistent databases ending in .bak
(This used to be ctdb commit
85590e9dfaab0db16ce8103e509fd4d51aef4ad5)
Martin Schwenke [Mon, 17 Aug 2009 03:08:42 +0000 (13:08 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
8ed4fa0eb09238952044645b72234185e498a40c)
Ronnie Sahlberg [Mon, 17 Aug 2009 01:04:40 +0000 (11:04 +1000)]
new version 1.0.88
(This used to be ctdb commit
fbfa1c72875dda4d1636d8e72c67ba09b10455df)
Ronnie Sahlberg [Mon, 17 Aug 2009 00:56:12 +0000 (10:56 +1000)]
reduce the loglevel for the message that we switch to a different recmaster while waiting for ipreallocate to finish
(This used to be ctdb commit
e5b25e1386294b1f800c32fb01c69c3c3ce85c26)
Ronnie Sahlberg [Mon, 17 Aug 2009 00:54:45 +0000 (10:54 +1000)]
if no timeout at all is specified to the ctdb tool, neither using -T nor by setting CGTDB_TIMEOUT, then use 120 seconds as a default timepout before the ctdb command will exit with an error.
(This used to be ctdb commit
d8d21884736a9610d48cf532e1c6778e511fb7a8)
Martin Schwenke [Fri, 14 Aug 2009 10:47:38 +0000 (20:47 +1000)]
Test suite: ctdb_persistent.c needs to use transactions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
49de8429d2a996dc055370541a12ea36762abe94)
Ronnie Sahlberg [Thu, 13 Aug 2009 03:02:00 +0000 (13:02 +1000)]
document enable/disablescript
(This used to be ctdb commit
5852a526ce7d0333cd1f9a05353d8920ea99db37)
Ronnie Sahlberg [Thu, 13 Aug 2009 03:04:08 +0000 (13:04 +1000)]
add new controls to make it possible to enable/disable individual eventscripts
update scriptstatus output so it lists disabled scripts
(This used to be ctdb commit
7e799b7523c9699bd65a8a8207f7e03d668b0b81)
Martin Schwenke [Tue, 11 Aug 2009 22:48:03 +0000 (08:48 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
0e9c3e0cf76dd33a24241f02709e56bc330f009a)
Ronnie Sahlberg [Sun, 9 Aug 2009 21:33:52 +0000 (07:33 +1000)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
(This used to be ctdb commit
a42dbdb7b9ccf3ce2aed48aa33f1cd3af2e94fe3)
Michael Adam [Thu, 30 Jul 2009 10:02:27 +0000 (12:02 +0200)]
tests: fix the 52_ctdb_fetch.sh test.
The parser for the output of the ctdb_fetch program
did not match the output that ctdb_fetch generates.
It seemed to rather come from the ctdb_bench test...
This patch adapts the parser to correctly interpret
the output of ctdb_fetch.
Michael
(This used to be ctdb commit
836b95f32724cf37e4f643f20653f78842613692)
Michael Adam [Sat, 11 Jul 2009 22:39:29 +0000 (00:39 +0200)]
client: fix a debug message (misplaced newline).
Michael
(This used to be ctdb commit
c513a31d755003d7af91529790b06ce0d226c90f)
Michael Adam [Wed, 15 Jul 2009 08:03:03 +0000 (10:03 +0200)]
client:ctdb_control_send: remove duplicate setting of the reqid header.
Michael
(This used to be ctdb commit
875778fbbfd6b0f09fd2db76f7348ad6271350a3)
Michael Adam [Tue, 21 Jul 2009 07:50:56 +0000 (09:50 +0200)]
ctdbd: use ctdb_syslog_log() as debug_add function for syslog
Michael
(This used to be ctdb commit
a0ad69197b4771f3d5be23d78d0933d732405f08)
Michael Adam [Tue, 21 Jul 2009 07:48:10 +0000 (09:48 +0200)]
ctdbd: set debug_add hook to be able to use dump_data in the daemon.
Michael
(This used to be ctdb commit
afafab0ac6cac90c3f8614204b5b6df92e446728)
Michael Adam [Tue, 21 Jul 2009 07:47:07 +0000 (09:47 +0200)]
debug: add debug_add and dump_data functions
Michael
(This used to be ctdb commit
64405bdbebb2ddf0ae980e958ede77df79139000)
Rusty Russell [Thu, 30 Jul 2009 02:22:39 +0000 (11:52 +0930)]
tdb: don't alter tdb->flags in tdb_reopen_all()
The flags are user-visible, via tdb_get_flags/add_flags/remove_flags.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
(This used to be ctdb commit
8f48e37c254e0852d4e2dea54b905ce5ef2b925d)
Rusty Russell [Thu, 30 Jul 2009 02:22:08 +0000 (11:52 +0930)]
tdb: Reimplementation of Metze's "lib/tdb: if we know pwrite and pread are thread/fork safe tdb_reopen_all() should be a noop".
This version just wraps the reopen code, so we still re-grab the lock and do
the normal sanity checks.
The reason we do this at all is to avoid global fd limits, see:
http://forums.fedoraforum.org/showthread.php?t=210393
Note also that this whole reopen concept is fundamentally racy: if the parent
goes away before the child calls tdb_reopen_all, the database can be left
without an active lock and another TDB_CLEAR_IF_FIRST opener will clear it.
A fork_with_tdbs() wrapper could use a pipe to solve this, but it's hardly
elegant (what if there are other independent things which have similar needs?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
(This used to be ctdb commit
8d0d432ab7766d9c0f9868fd77e48b9b5cc5d9f9)
Rusty Russell [Thu, 30 Jul 2009 20:10:33 +0000 (13:10 -0700)]
realloc() has that horrible overloaded free semantic when size is 0: current code does a free of the old record in this case, then fail.
(This used to be ctdb commit
8b6a5bba93843cd83b7b386b82949ad88f29884a)
Rusty Russell [Thu, 30 Jul 2009 20:09:33 +0000 (13:09 -0700)]
If the record is at the end of the database, pretending it has length 1 might take us out-of-bounds. Only pretend to be length 1 for the malloc.
(This used to be ctdb commit
6de2823f5f7976d4efa20761e518d6b67753f054)
Rusty Russell [Wed, 29 Jul 2009 05:23:03 +0000 (14:53 +0930)]
Port from SAMBA tdb: commit
54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 Author: Rusty Russell <rusty@rustcorp.com.au> Date: Sat Jul 18 15:28:58 2009 +0930
Make tdb transaction lock recursive (samba version)
This patch replaces
6ed27edbcd3ba1893636a8072c8d7a621437daf7 and
1a416ff13ca7786f2e8d24c66addf00883e9cb12, which fixed the bug where traversals
inside transactions would release the transaction lock early.
This solution is more general, and solves the more minor symptom that nested
traversals would also release the transaction lock early. (It was also suggestd in
Volker's comment in
6ed27ed).
This patch also applies to ctdb, if the traverse.c part is removed (ctdb's tdb
code never received the previous two fixes).
Tested using the testsuite from ccan (adapted to the samba code). Thanks to
Michael Adam for feedback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Adam <obnox@samba.org>
commit
760104188d0d2ed96ec4a70138e6d0bf86d797ed
Author: Rusty Russell <rusty@rustcorp.com.au>
Date: Tue Jul 21 16:23:35 2009 +0930
tdb: fix locking error
54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 "Make tdb transaction lock
recursive (samba version)" was broken: I "cleaned it up" and prevented
it from ever unlocking.
To see the problem:
$ bin/tdbtorture -s
1248142523
tdb_brlock failed (fd=3) at offset 8 rw_type=1 lck_type=14 len=1
tdb_transaction_lock: failed to get transaction lock
tdb_transaction_start failed: Resource deadlock avoided
My testcase relied on the *count* being correct, which it was. Fixing that
now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
ce19658ba13272238058e9b9bc03e62f48b737c0)
Rusty Russell [Wed, 29 Jul 2009 05:21:34 +0000 (14:51 +0930)]
Port from SAMBA tdb: commit
a6cc04a20089e8fbcce138c271961c37ddcd6c34 Author: Andrew Tridgell <tridge@samba.org> Date: Mon Jun 1 13:13:07 2009 +1000
overallocate all records by 25%
This greatly reduces the fragmentation of databases where records
tend to grow slowly by a small amount each time. The case where this
is most seen is the ldb index records. Adding this overallocation
reduced the size of the resulting database by more than 20x when
running a test that adds 10k users.
(This used to be ctdb commit
e72974e5cefabc7035399d16633f727f868caa61)
Rusty Russell [Wed, 29 Jul 2009 05:21:12 +0000 (14:51 +0930)]
Port from SAMBA tdb: commit
a386173fa1c7c5bcc11ea9260d84b6c52c154b3d Author: Andrew Tridgell <tridge@samba.org> Date: Mon Jun 1 13:11:39 2009 +1000
auto-repack in transactions that expand the tdb
The idea behind this is to recover from badly fragmented free
lists. Choosing the point where the file expands is fairly arbitrary,
but seems to work well.
(This used to be ctdb commit
233c52bfb087f636ad61e95c12616c02901f4f83)
Rusty Russell [Wed, 29 Jul 2009 06:32:51 +0000 (16:02 +0930)]
Port from SAMBA ctdb: commit
936d76802f98d04d9743b2ca8eeeaadd4362db51 Author: Andrew Tridgell <tridge@samba.org> Date: Tue Dec 16 14:38:17 2008 +1100
imported the tdb_repack() code from CTDB
The tdb_repack() function repacks a TDB so that it has a single
freelist entry. The file doesn't shrink, but it does remove all
freelist fragmentation. This code originated in the CTDB vacuuming
code, but will now be used in ldb to cope with fragmentation from
re-indexing
(This used to be ctdb commit
fe3ceb101a5a9c336973c2c6c31406bd8181c2fe)
Rusty Russell [Wed, 29 Jul 2009 05:20:39 +0000 (14:50 +0930)]
Port from SAMBA tdb: commit
4b4fec65db4e202afa13b2d15867f4d8a54d154e Author: Andrew Tridgell <tridge@samba.org> Date: Thu May 28 16:08:28 2009 +1000
make TDB_NOSYNC affect all the fsync/msync calls in transactions
During a transaction commit tdb normally uses fsync/msync calls to
make it crash safe. This can be disabled using the TDB_NOSYNC flag,
but it wasn't disabling all the code paths that caused a fsync/msync.
(This used to be ctdb commit
e03980add02a28609a7a0a0c87ebc85419b98144)
Rusty Russell [Wed, 29 Jul 2009 05:19:57 +0000 (14:49 +0930)]
Port from SAMBA tdb: commit
a91bcbccf8a2243dac57cacec6fdfc9907580f69 Author: Jim McDonough <jmcd@samba.org> Date: Thu May 21 16:26:26 2009 -0400
Detect tight loop in tdb_find()
(This used to be ctdb commit
5253a0ba3a34fbf5810f363ecc094203d49e835f)
Rusty Russell [Wed, 29 Jul 2009 05:18:42 +0000 (14:48 +0930)]
Port from SAMBA tdb: commit
42c0931441ef53a3f977e1334355fa83f05ac184 Author: Tim Prouty <tprouty@samba.org> Date: Tue Mar 31 16:24:07 2009 -0700
tdb: Remove unused variable
(This used to be ctdb commit
aa22d1875b1997664af983c0baeabe34e40dd253)
Rusty Russell [Wed, 29 Jul 2009 05:17:29 +0000 (14:47 +0930)]
Port from SAMBA tdb:
commit
b90863c0b7b860b006ac49c9396711ff351f777f
Author: Howard Chu <hyc@highlandsun.com>
Date: Tue Mar 31 13:15:54 2009 +1100
Add tdb_transaction_prepare_commit()
Using tdb_transaction_prepare_commit() gives us 2-phase commits. This
allows us to safely commit across multiple tdb databases at once, with
reasonable transaction semantics
Signed-off-by: tridge@samba.org
(This used to be ctdb commit
4c3dac215a088947f645f727343997f5d47e3260)
Ronnie Sahlberg [Mon, 3 Aug 2009 02:51:55 +0000 (12:51 +1000)]
update STOP/CONTINUE to better handle when we stop the last node
(This used to be ctdb commit
9a251078f22aea15b9ca37393e0b5e2740aa21fb)
Martin Schwenke [Fri, 31 Jul 2009 01:04:37 +0000 (11:04 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
abf4540bfb06de56b0a7b5976b5f1b2a24a8743d)
Martin Schwenke [Thu, 30 Jul 2009 04:10:34 +0000 (14:10 +1000)]
Test suite: Retrieval NFS_TICKLE_SHARED_DIRECTORY more defensively.
In complex/31_nfs_tickle.sh we run sed against a file that might not
exist, causing potential garbage from stderr in the output. Check
that the file exists before running sed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
f9b71757f034732647228d4b8a8f00528028b6b0)
Martin Schwenke [Thu, 30 Jul 2009 04:03:44 +0000 (14:03 +1000)]
Test suite: Better diagnostics for recent change to complex/31_nfs_tickle.sh.
Add a -v so we see the output of the command that tries to get the
value of NFS_TICKLE_SHARED_DIRECTORY. That way we can tell if a value
was retrived OK or if we're using the default.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c53353c6402f378f29200313d82f1f9262d628b1)
Martin Schwenke [Thu, 30 Jul 2009 03:57:40 +0000 (13:57 +1000)]
Test suite: complex/31_nfs_tickle.sh should use NFS_TICKLE_SHARED_DIRECTORY.
Rather than hardcoding the location of the shared tickle directory,
attempt to use the value of NFS_TICKLE_SHARED_DIRECTORY from
/etc/sysconfig/nfs on node 0.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
878437a909ea44dfc3635f082e34741ee256e505)
Martin Schwenke [Thu, 30 Jul 2009 03:45:06 +0000 (13:45 +1000)]
Test suite: Ask CTDB about CIFS tickles registered for the actual test node.
This failed when node 0 had no public IPs because we would always run
"ctdb gettickles" on node. We now ask node 0 for the tickles on the
test node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
8fcc4610de926f44e18ec85fb57ca5f7d3c28bd6)
Martin Schwenke [Thu, 30 Jul 2009 03:20:23 +0000 (13:20 +1000)]
Test suite: Turn off strict host key checking in the SSH failover test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
b7787255391eddef8458f81ff9b75d9116e2afd3)
Ronnie Sahlberg [Thu, 30 Jul 2009 00:55:56 +0000 (10:55 +1000)]
Merge commit 'martins/master'
(This used to be ctdb commit
32a69b0efa078b069802470be6488a4efe32961d)
Martin Schwenke [Thu, 30 Jul 2009 00:47:36 +0000 (10:47 +1000)]
Test suite: fix test file permissions in complex/44_failover_nfs_oneway.sh.
Something, perhaps root_squash, causing permission denied on the test
file after we copy it over with scp. This sets the initial
permissions to be friendly and adds -p to the scp command to maintain
those friendly permissions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
52f21f5a92eb14df7540a2ae9e212d936e646c06)
Martin Schwenke [Wed, 29 Jul 2009 08:10:05 +0000 (18:10 +1000)]
Test suite: fix the test suite's generic event script.
Add a "stopped" case to log events and stop the event script from
failing with an unknown event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
7f67f7395e2233f0bba2e9662404aad49e13f645)
Martin Schwenke [Wed, 29 Jul 2009 08:01:07 +0000 (18:01 +1000)]
Test suite: Fixes for node state parsing plus new stop/continue tests.
The parsing of "ctdb status -Y" output to determine various node
states was implemented very strictly. Therefore, the parsing broke
due to the addition of the new "stopped" state to the output of "ctdb
status -Y". This relaxes the parsing so that it should work for
versions prior to the introduction of the "stopped" state, as well as
future versions that add new states to the end of the list of bits in
output of "ctdb status -Y".
Similarly the check for cluster unhealthy (in _cluster_is_healthy())
now just checks for a single 1 in any bit in the "ctdb status -Y"
output, rather than checking for a particular number of 0s.
New tests
tests/simple/{41_ctdb_stop.sh,42_ctdb_continue.sh,43_stop_recmaster_yield.sh}
do rudimentary testing of the stop and continue functions.
Remove tests tests/simple/41_ctdb_ban.sh and
tests/simple/42_ctdb_unban.sh. They were both unreliable.
tests/simple/21_ctdb_disablemonitor.sh now schedules a restart, since
one will be required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
67c5bfb5f02c9d45a32d976021ede4fb2174dfe9)
Ronnie Sahlberg [Wed, 29 Jul 2009 03:31:12 +0000 (13:31 +1000)]
change the defaults for repacking to repack once every 120 seconds and letting it work for 30 second before timing out.
(This used to be ctdb commit
2aa5d18bb42dca4ef9cb049b4fa9d7bc999ce4ad)
Wolfgang Mueller-Friedt [Tue, 28 Jul 2009 20:09:28 +0000 (23:09 +0300)]
repack limit tunable
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
(This used to be ctdb commit
a2768b0732f2ab2e3fafda55587bd2e99eedf0fa)
Wolfgang Mueller-Friedt [Tue, 28 Jul 2009 14:49:41 +0000 (17:49 +0300)]
remove repack from eventscript
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
(This used to be ctdb commit
dd334caa98882fc59765b7c84eca8e86de785487)
Wolfgang Mueller-Friedt [Tue, 28 Jul 2009 14:45:31 +0000 (17:45 +0300)]
added event repacking
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
(This used to be ctdb commit
78466364f22d6a183710338f138b8c808c6b7753)
Ronnie Sahlberg [Thu, 23 Jul 2009 06:03:39 +0000 (16:03 +1000)]
vacuum event framework
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
(This used to be ctdb commit
30cdad97706a9e9bb210120699aa939f6b16e8ca)
Ronnie Sahlberg [Wed, 29 Jul 2009 03:25:43 +0000 (13:25 +1000)]
initial part of new vacuuming patch.
create some new fields for ctdb_db and tunables
(This used to be ctdb commit
3a8e7d36cc42aedf4b7665364224140dcbfb3efa)
Ronnie Sahlberg [Wed, 29 Jul 2009 01:18:02 +0000 (11:18 +1000)]
From Michael Adam:
Update the transaction test tool to the new api for transactions
(This used to be ctdb commit
4d9a53f142deba6ab578af2fc35bfa99c29c3a99)
Michael Adam [Mon, 20 Jul 2009 14:34:56 +0000 (16:34 +0200)]
client: refuse to do record_store() on a persistent tdb.
Only allow stores wrapped in transactions on persistent dbs.
Michael
(This used to be ctdb commit
9dea71cf72ef79a9aadf8ee7cf1a1899527459ff)
Michael Adam [Mon, 20 Jul 2009 14:33:53 +0000 (16:33 +0200)]
ctdbd: refuse PERSISTENT_STORE if transaction is running.
Michael
(This used to be ctdb commit
c07d6d90f7afd19213ad44624c3e2b9c85f4eea8)
Michael Adam [Tue, 21 Jul 2009 09:30:38 +0000 (11:30 +0200)]
Fix persistent transaction commit race condition.
In ctdb_client.c:ctdb_transaction_commit(), after a failed
TRANS2_COMMIT control call (for instance due to the 1-second
being exceeded waiting for a busy node's reply), there is a
1-second gap between the transaction_cancel() and
replay_transaction() calls in which there is no lock on the
persistent db. And due to the lack of global state
indicating that a transaction is in progress in ctdbd, other nodes
may succeed to start transactions on the db in this gap and
even worse work on top of the possibly already pushed changes.
So the data diverges on the several nodes.
This change fixes this by introducing global state for a transaction
commit being active in the ctdb_db_context struct and in a db_id field
in the client so that a client keeps track of _which_ tdb it as
transaction commit running on. These data are set by ctdb upon
entering the trans2_commit control and they are cleared in the
trans2_error or trans2_finished controls. This makes it impossible
to start a nother transaction or migrate a record to a different
node while a transaction is active on a persistent tdb, including
the retry loop.
This approach is dead lock free and still allows recovery process
to be started in the retry-gap between cancel and replay.
Also note, that this solution does not require any change in the
client side.
This was debugged and developed together with
Stefan Metzmacher <metze@samba.org> - thanks!
Michael
(This used to be ctdb commit
f88103516e5ad723062fb95fcb07a128f1069d69)
Michael Adam [Thu, 16 Jul 2009 20:00:10 +0000 (22:00 +0200)]
client: set dmaster in ctdb_transaction_store() also when updating an existing record
Michael
(This used to be ctdb commit
e9194a130327d6b05a8ab90bd976475b0e93b06d)
Martin Schwenke [Wed, 29 Jul 2009 00:08:56 +0000 (10:08 +1000)]
Merge commit 'origin/master'
(This used to be ctdb commit
d7ff60a74595dcb4ae41f5a8193de5b898d61227)
Ronnie Sahlberg [Tue, 28 Jul 2009 23:58:40 +0000 (09:58 +1000)]
When processing the stop node control reply in the client code we should
also check the returned status code in case the _stop() command failed
due to the eventscripts failing.
If this happens, make "ctdb stop" log an error to the console and try
the operation again.
(This used to be ctdb commit
20e82e0c48e07d1012549f5277f1f5a3f4bd10d1)
Martin Schwenke [Tue, 28 Jul 2009 06:00:11 +0000 (16:00 +1000)]
onnode: update tests for healthy and connected to cope with new stopped bit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
bfc926c866e361ab28330747544b268ba130bf30)
Ronnie Sahlberg [Tue, 28 Jul 2009 03:54:08 +0000 (13:54 +1000)]
document the two new commands setlmasterrole and setrecmasterrole
(This used to be ctdb commit
1d7d7dd515e7ef62cacf2a712a2f4c4d62a38fa5)
Ronnie Sahlberg [Tue, 28 Jul 2009 03:45:13 +0000 (13:45 +1000)]
add two commands : setlmasterrole and setrecmasterrole to enable/disable these capabilities at runtime
(This used to be ctdb commit
51aaed0e9e42e901451292e8dd545297ab725a62)
Ronnie Sahlberg [Tue, 28 Jul 2009 00:02:39 +0000 (10:02 +1000)]
Document the natgw flag and how this changes the output of "ctdb
getcapabilities"
(This used to be ctdb commit
9b395986962909a5b0548eaea7e45215df72a08e)
Ronnie Sahlberg [Tue, 28 Jul 2009 00:00:33 +0000 (10:00 +1000)]
update the natgw eventscript to set the NATGW capability when this feature is used
This does not modify any behaviour of the daemon itself other than showing this flag as ON in the ctdeb getcapabilities output
(This used to be ctdb commit
fb337c151bd16ad5ad0c99431224451979d8c651)
Ronnie Sahlberg [Mon, 27 Jul 2009 23:58:11 +0000 (09:58 +1000)]
add a command "setnatgwstate {on|off}" that can be used to indicate if this node is using natgw functionality or not.
(This used to be ctdb commit
89a9bb29a60a6fb1fba55987e6cf0a4baa695e50)
Ronnie Sahlberg [Mon, 27 Jul 2009 23:27:00 +0000 (09:27 +1000)]
describe how to activate NATGW without restarting the nodes on a running
cluster
(This used to be ctdb commit
b6c8011024ce4574f945d5a470075c6779b34a43)
Ronnie Sahlberg [Fri, 17 Jul 2009 03:01:11 +0000 (13:01 +1000)]
new version 1.0.87
(This used to be ctdb commit
d187eb8507f35a650ff3ffc50fa49110eebca0bd)
Ronnie Sahlberg [Fri, 17 Jul 2009 02:45:08 +0000 (12:45 +1000)]
Merge commit 'martins/master'
(This used to be ctdb commit
febf3d6d3f2bdf187c042f560aefc54b8ac72454)
Ronnie Sahlberg [Fri, 17 Jul 2009 02:30:05 +0000 (12:30 +1000)]
document the new stopped event
(This used to be ctdb commit
70603d9a79c80379bf65d9d703c399a65c109c52)
Ronnie Sahlberg [Fri, 17 Jul 2009 02:26:16 +0000 (12:26 +1000)]
create a new event : stopped.
This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ...
Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered.
(This used to be ctdb commit
65e9309564611bf937ded3c74a79abff895d7c59)
Ronnie Sahlberg [Fri, 17 Jul 2009 01:37:03 +0000 (11:37 +1000)]
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed.
(This used to be ctdb commit
ca4982c40d81db528fe915d5ecc01fcf7df0b522)
Ronnie Sahlberg [Thu, 16 Jul 2009 23:45:05 +0000 (09:45 +1000)]
update the eventscript to ensure that stopped nodes can not become the natgw master
also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not.
(This used to be ctdb commit
7f273ee769d671d8c8be87c9187302fb77e814f3)
Ronnie Sahlberg [Thu, 16 Jul 2009 23:36:22 +0000 (09:36 +1000)]
if all nodes are STOPPED, pick one of the STOPPED nodes as natgw master
(This used to be ctdb commit
8bbd96cfbbe98f3fc19e432797cbf4478f753a0b)
Ronnie Sahlberg [Thu, 16 Jul 2009 23:29:58 +0000 (09:29 +1000)]
Do not allow STOPPED or DELETED nodes to become the NATGW master
(This used to be ctdb commit
4505ea15408ad40dd8deb4041fd75a65a0ad9336)
Martin Schwenke [Thu, 16 Jul 2009 04:04:06 +0000 (14:04 +1000)]
Test suite: Fix debug code for unexpectedly unhealthy cluster.
The debug code should run "ctdb status" on a cluster node, not on the
test client.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
34e6f8a04b12f8879eb42d417f9741502ccccf0f)
Ronnie Sahlberg [Thu, 9 Jul 2009 04:44:03 +0000 (14:44 +1000)]
stopped nodes can not win a recmaster election
stopped nodes must yield the recmaster role
(This used to be ctdb commit
b75ac1185481060ab71bd743e1e48d333d716eba)
Ronnie Sahlberg [Thu, 9 Jul 2009 04:34:12 +0000 (14:34 +1000)]
change the infolevel when logging stop/continue commands
(This used to be ctdb commit
1e007c833098b03dd81797c081da1ae1b10c971c)
Ronnie Sahlberg [Thu, 9 Jul 2009 04:19:32 +0000 (14:19 +1000)]
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode
(This used to be ctdb commit
99f239f8b96c8c0a06ac8ca8b8083be96265865a)
Ronnie Sahlberg [Thu, 9 Jul 2009 03:07:15 +0000 (13:07 +1000)]
document the new commands ctdb stop/continue
(This used to be ctdb commit
d6ddea4167ccdad05e88378ee3f22b6125969562)
Ronnie Sahlberg [Thu, 9 Jul 2009 03:20:14 +0000 (13:20 +1000)]
dont let other nodes modify the STOPPED flag for the local process when pushing out flags changes
(This used to be ctdb commit
501a2747d839ca291b70c761098549cf6d47a158)
Ronnie Sahlberg [Thu, 9 Jul 2009 02:22:46 +0000 (12:22 +1000)]
add two new controls, CTOP_NODE and CONTINUE_NODE
that are used to stop/continue a node instead of using modflags messages
(This used to be ctdb commit
54b4a02053a0f98f8c424e7f658890254023d39a)
Ronnie Sahlberg [Thu, 9 Jul 2009 01:57:20 +0000 (11:57 +1000)]
make it possible to start the daemon in STOPPED mode
(This used to be ctdb commit
866aa995dc029db6e510060e9e95a8ca149094ac)
Ronnie Sahlberg [Thu, 9 Jul 2009 01:43:37 +0000 (11:43 +1000)]
remove the header printed for the machinereadable output for natgwlist
(This used to be ctdb commit
049271c83a09afb8d6c3e5212cf9ca782956b0c6)
Ronnie Sahlberg [Thu, 9 Jul 2009 01:38:18 +0000 (11:38 +1000)]
Add a new node flag : STOPPED
This node flag means the node is DISABLED and that all its public ip addresses
are failed over, but also that it has been removed from the VNNmap.
A STOPPED node should be in recovery mode active untill restarted using the continue command.
Adding two new commands "ctdb stop" "ctdb continue"
(This used to be ctdb commit
d47dab1026deba0554f21282a59bd172209ea066)