ctdb.git
13 years agoserver: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().
Michael Adam [Thu, 30 Dec 2010 16:44:51 +0000 (17:44 +0100)]
server: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().

This is supposed to contain logic for deleting records that are safe
to delete and scheduling records for deletion. It will be called in
server context for non-persistent databases instead of the standard
ctdb_ltdb_store() function.

13 years agodaemon: fill ctdb->ctdbd_pid early
Michael Adam [Tue, 28 Dec 2010 12:14:23 +0000 (13:14 +0100)]
daemon: fill ctdb->ctdbd_pid early

13 years agotest: send SCHEDULE_FOR_DELETION control from randrec test.
Michael Adam [Tue, 21 Dec 2010 14:29:46 +0000 (15:29 +0100)]
test: send SCHEDULE_FOR_DELETION control from randrec test.

13 years agoclient: add accessor function ctdb_header_from_record_handle().
Michael Adam [Tue, 21 Dec 2010 14:29:23 +0000 (15:29 +0100)]
client: add accessor function ctdb_header_from_record_handle().

13 years agovacuum: add ctdb_local_schedule_for_deletion()
Michael Adam [Tue, 28 Dec 2010 12:13:34 +0000 (13:13 +0100)]
vacuum: add ctdb_local_schedule_for_deletion()

13 years agoserver: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.
Michael Adam [Tue, 21 Dec 2010 13:25:48 +0000 (14:25 +0100)]
server: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.

13 years agocontrol: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION
Michael Adam [Tue, 8 Mar 2011 23:57:55 +0000 (00:57 +0100)]
control: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION

13 years agocontrol: add macro CHECK_CONTROL_MIN_DATA_SIZE.
Michael Adam [Tue, 8 Mar 2011 23:56:25 +0000 (00:56 +0100)]
control: add macro CHECK_CONTROL_MIN_DATA_SIZE.

This is for the control dispatcher to check whether the input data has
a required minimum size.

13 years agovacuum: lower level of hash collision debug message to INFO
Michael Adam [Thu, 23 Dec 2010 10:54:09 +0000 (11:54 +0100)]
vacuum: lower level of hash collision debug message to INFO

13 years agovacuum: add statistics output to the fast and full traverse runs.
Michael Adam [Wed, 22 Dec 2010 23:27:27 +0000 (00:27 +0100)]
vacuum: add statistics output to the fast and full traverse runs.

13 years agovacuum: refactor insert_delete_record_data_into_tree() out of add_record_to_delete_tree()
Michael Adam [Tue, 21 Dec 2010 13:19:00 +0000 (14:19 +0100)]
vacuum: refactor insert_delete_record_data_into_tree() out of add_record_to_delete_tree()

for reuse in filling the delete_queue.

13 years agovacuum: change all Vacuum*Interval tunables to default to 10
Michael Adam [Mon, 20 Dec 2010 20:43:41 +0000 (21:43 +0100)]
vacuum: change all Vacuum*Interval tunables to default to 10

So, by default we have a fastpath vacuuming every 10 seconds and
full blown db-traverse vacuuming once every 10 minutes.

13 years agovacuum: disable full db-traverse vacuuming runs when VacuumFastPathCount == 0
Michael Adam [Mon, 20 Dec 2010 20:30:39 +0000 (21:30 +0100)]
vacuum: disable full db-traverse vacuuming runs when VacuumFastPathCount == 0

13 years agovacuum: Only run full vacuumig (db traverse) every VacuumFastPathCount times.
Michael Adam [Mon, 20 Dec 2010 17:03:38 +0000 (18:03 +0100)]
vacuum: Only run full vacuumig (db traverse) every VacuumFastPathCount times.

13 years agovacuum: reset the fast path count in the event handle if it exceeds the limit.
Michael Adam [Mon, 20 Dec 2010 16:54:04 +0000 (17:54 +0100)]
vacuum: reset the fast path count in the event handle if it exceeds the limit.

13 years agovacuum: bump the number of fast-path runs in the vacuum child destructor
Michael Adam [Mon, 20 Dec 2010 16:49:29 +0000 (17:49 +0100)]
vacuum: bump the number of fast-path runs in the vacuum child destructor

13 years agovacuum: add a fast_path_count to the vacuum_handle.
Michael Adam [Mon, 20 Dec 2010 16:44:02 +0000 (17:44 +0100)]
vacuum: add a fast_path_count to the vacuum_handle.

13 years agoAdd a tunable VacuumFastPathCount.
Michael Adam [Mon, 20 Dec 2010 16:42:25 +0000 (17:42 +0100)]
Add a tunable VacuumFastPathCount.

This will control how many fast-path vacuuming runs wil have to
be done, before a full vacuuming will be triggered, i.e. one with
a db-traversal.

13 years agovacuum: traverse the delete_queue befor traversing the database.
Michael Adam [Mon, 20 Dec 2010 16:25:35 +0000 (17:25 +0100)]
vacuum: traverse the delete_queue befor traversing the database.

13 years agovacuum: add delete_queue_traverse() for traversal of the delete_queue.
Michael Adam [Mon, 20 Dec 2010 16:24:32 +0000 (17:24 +0100)]
vacuum: add delete_queue_traverse() for traversal of the delete_queue.

13 years agovacuum: reduce indentation in add_record_to_delete_tree()
Michael Adam [Tue, 21 Dec 2010 10:22:50 +0000 (11:22 +0100)]
vacuum: reduce indentation in add_record_to_delete_tree()

This simplyfies the logical structure a bit by using early return.

13 years agovacuum: refactor new add_record_to_delete_tree() out of vacuum_traverse().
Michael Adam [Mon, 20 Dec 2010 16:11:27 +0000 (17:11 +0100)]
vacuum: refactor new add_record_to_delete_tree() out of vacuum_traverse().

This will be reused by the traversal of the delete_queue list.

13 years agovacuum: skip adding records to list of records to send to lmaster on lmaster
Michael Adam [Mon, 20 Dec 2010 15:41:13 +0000 (16:41 +0100)]
vacuum: skip adding records to list of records to send to lmaster on lmaster

This list is skipped afterwards when the lists are processed.

13 years agovacuum: refactor new add_record_to_vacuum_fetch_list() out of vacuum_traverse().
Michael Adam [Mon, 20 Dec 2010 15:31:27 +0000 (16:31 +0100)]
vacuum: refactor new add_record_to_vacuum_fetch_list() out of vacuum_traverse().

This is the function that fills the list of records to send to each lmaster
with the VACUUM_FETCH message.

This function will be reused in the traverse function for the delete_queue.

13 years agoserver: rename ctdb_repack_db() to ctdb_vacuum_and_repack_db()
Michael Adam [Mon, 20 Dec 2010 09:55:53 +0000 (10:55 +0100)]
server: rename ctdb_repack_db() to ctdb_vacuum_and_repack_db()

13 years agoWhen wiping a database, clear the delete_queue.
Michael Adam [Fri, 17 Dec 2010 01:22:02 +0000 (02:22 +0100)]
When wiping a database, clear the delete_queue.

13 years agovaccum: clear the fast-path vacuuming delete_queue after creating the vacuuming child.
Michael Adam [Fri, 17 Dec 2010 00:53:25 +0000 (01:53 +0100)]
vaccum: clear the fast-path vacuuming delete_queue after creating the vacuuming child.

Maybe we should keep a copy for the case that the vacuuming fails?

13 years agoWhen attaching to a non-persistent DB, initialize the delete_queue.
Michael Adam [Fri, 17 Dec 2010 00:38:09 +0000 (01:38 +0100)]
When attaching to a non-persistent DB, initialize the delete_queue.

13 years agoAdd a delete_queue to the ctdb database context struct.
Michael Adam [Wed, 22 Dec 2010 13:50:53 +0000 (14:50 +0100)]
Add a delete_queue to the ctdb database context struct.

This list will be filled by the client using a new
delete control. The list will then be used to implement
a fast-path vacuuming that will traverse this list instead
of traversing the database.

13 years agocall: becoming dmaster in VACUUM_MIGRATION, set the VACUUM_MIGRATED record flag
Michael Adam [Fri, 10 Dec 2010 13:11:38 +0000 (14:11 +0100)]
call: becoming dmaster in VACUUM_MIGRATION, set the VACUUM_MIGRATED record flag

This temporary flag is used for the local record storage function to
decide whether to delete an empty record which has never been migrated
with data as part of the fast-path vacuuming process or, or to store
the record.

13 years agocall: hand the submitted record_flags to local record storage function.
Michael Adam [Fri, 10 Dec 2010 13:07:21 +0000 (14:07 +0100)]
call: hand the submitted record_flags to local record storage function.

13 years agocall: transfer the record flags in the ctdb call packets.
Michael Adam [Fri, 10 Dec 2010 13:02:33 +0000 (14:02 +0100)]
call: transfer the record flags in the ctdb call packets.

This way, the MIGRATED_WITH_DATA information can be transported
along with the records. This is important for vacuuming to function
properly.

The record flags are appended to the data section of the ctdb_req_dmaster
and ctdb_reply_dmaster structs.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

13 years agoserver: in the VACUUM_FETCH handler, add the VACUUM_MIGRAION to the call flags
Michael Adam [Fri, 10 Dec 2010 12:59:37 +0000 (13:59 +0100)]
server: in the VACUUM_FETCH handler, add the VACUUM_MIGRAION to the call flags

This way, the records coming in via this handler, can be treated appropriately.
Namely, they can be deleted instead of being stored when the meet the fast-path
vacuuming criteria (empty, never migrated with data...)

13 years agoadd a new record flag CTDB_REC_FLAG_VACUUM_MIGRATED.
Michael Adam [Fri, 10 Dec 2010 12:57:01 +0000 (13:57 +0100)]
add a new record flag CTDB_REC_FLAG_VACUUM_MIGRATED.

This is to be used internally. The purpose is to flag a record
as been migrated by a VACUUM_MIGRATION, which is triggered by
a VACUUM_FETCH message as part of the vacuuming. The local store
routine will base its decision whether to delete or to store
the record (among other things) upon the value of this flag.

This flag should never be stored in the local database copies.

13 years agocall: Move definition of call flags down to the definition of the flags field.
Michael Adam [Fri, 10 Dec 2010 13:22:55 +0000 (14:22 +0100)]
call: Move definition of call flags down to the definition of the flags field.

13 years agocall: add new call flag CTDB_CALL_FLAG_VACUUM_MIGRATION
Michael Adam [Fri, 10 Dec 2010 13:24:40 +0000 (14:24 +0100)]
call: add new call flag CTDB_CALL_FLAG_VACUUM_MIGRATION

This is to be used when the CTDB_SRVID_VACUUM_FETCH message
triggers the migration of deleted records to the lmaster.
The lmaster can then delete records that have not been
migrated with data instead of storing them.

13 years agorecoverd: in a recovery, set the MIGRATED_WITH_DATA flag on all records
Michael Adam [Fri, 3 Dec 2010 14:24:06 +0000 (15:24 +0100)]
recoverd: in a recovery, set the MIGRATED_WITH_DATA flag on all records

Those records that are kept after recovery, are non-empty, and
stored identically on all nodes. So this is as if they had been
migrated with data.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

13 years agoserver: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag
Michael Adam [Fri, 3 Dec 2010 14:21:51 +0000 (15:21 +0100)]
server: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag

13 years agovacuum: check lmaster against num_nodes instead of vnn_map->size
Michael Adam [Thu, 3 Feb 2011 11:15:41 +0000 (12:15 +0100)]
vacuum: check lmaster against num_nodes instead of vnn_map->size

When lmaster is bigger than the biggest recorded node number,
then exit the traverse with error.

13 years agovacuum: reduce indentation of the loop sending VACUUM_FETCH controls
Michael Adam [Thu, 3 Feb 2011 16:47:36 +0000 (17:47 +0100)]
vacuum: reduce indentation of the loop sending VACUUM_FETCH controls

This slightly improves the code structure in that loop.

13 years agovacuum: correctly send TRY_DELETE_RECORDS ctrl to all active nodes
Michael Adam [Thu, 3 Feb 2011 11:26:45 +0000 (12:26 +0100)]
vacuum: correctly send TRY_DELETE_RECORDS ctrl to all active nodes

Originally, the control was sent to all records in the vnn_map, but
there was something still missing here:
When a node can not become lmaster (via CTDB_CAPABILITY_LMASTER=no)
then it will not be part of the vnn_map. So such a node would
be active but never receive the TRY_DELETE_RECORDS control from a
vacuuming run.

This is fixed in this change by correctly building the list of
active nodes first in the same way that the recovery process does it.

13 years agovacuum: in ctdb_vacuum_db, fix the length of the array of vacuum fetch lists
Michael Adam [Thu, 3 Feb 2011 11:18:58 +0000 (12:18 +0100)]
vacuum: in ctdb_vacuum_db, fix the length of the array of vacuum fetch lists

This patch fixes segfaults in the vacuum child when at least one
node has been stopped or removed from the cluster:

The size of the vnn_map is only the number of active nodes
(that can be lmaster). But the node numbers that are referenced
by the vnn_map spread over all configured nodes.

Since the array of vacuum fetch lists is referenced by the
key's lmaster's node number later on, the array needs to
be of size num_nodes instad of vnn_map->size.

13 years agoFix typos in a comment in vacuum_traverse.
Michael Adam [Mon, 20 Dec 2010 15:26:50 +0000 (16:26 +0100)]
Fix typos in a comment in vacuum_traverse.

13 years agotests: fix segfault in store test when connection to ctdbd failed.
Michael Adam [Tue, 21 Dec 2010 16:18:03 +0000 (17:18 +0100)]
tests: fix segfault in store test when connection to ctdbd failed.

13 years agotests: fix segfault in fetch_one test when connection to ctdbd fails
Michael Adam [Tue, 21 Dec 2010 16:15:41 +0000 (17:15 +0100)]
tests: fix segfault in fetch_one test when connection to ctdbd fails

13 years agotests: fix segfault in fetch test when connection to ctdb failed.
Michael Adam [Tue, 21 Dec 2010 16:14:33 +0000 (17:14 +0100)]
tests: fix segfault in fetch test when connection to ctdb failed.

13 years agotests: fix segfault in randrec test when connection to daemon fails.
Michael Adam [Tue, 21 Dec 2010 16:11:26 +0000 (17:11 +0100)]
tests: fix segfault in randrec test when connection to daemon fails.

13 years agogitignore: add tags file
Michael Adam [Fri, 3 Dec 2010 14:39:44 +0000 (15:39 +0100)]
gitignore: add tags file

13 years agogitignore: add vi swap files
Michael Adam [Fri, 3 Dec 2010 14:39:26 +0000 (15:39 +0100)]
gitignore: add vi swap files

13 years agoRestart recovery dameon if it looks like it hung.
Ronnie Sahlberg [Thu, 3 Mar 2011 19:55:24 +0000 (06:55 +1100)]
Restart recovery dameon if it looks like it hung.
Dont shutdown ctdbd completely, that only makes the problem worse.

13 years agoIf/when the recovery daemon terminates unexpectedly, try to restart it again from...
Ronnie Sahlberg [Tue, 1 Mar 2011 01:09:42 +0000 (12:09 +1100)]
If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too.

While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust.

13 years agonew version 1.2.23
Ronnie Sahlberg [Thu, 24 Feb 2011 23:46:16 +0000 (10:46 +1100)]
new version 1.2.23

13 years agoATTACH_DB: simplify the code slightly and change the semantics to only
Ronnie Sahlberg [Thu, 24 Feb 2011 23:33:12 +0000 (10:33 +1100)]
ATTACH_DB: simplify the code slightly and change the semantics to only
refuse a db attach during recovery IF we can associate the request from a
genuine real client instead of deciding this on whether client_id is zero or

This will suppress/avoid messages like these :
DB Attach to database %s refused. Can not match clientid...

13 years agoNew version 1.2.22.
Michael Adam [Mon, 21 Feb 2011 04:55:16 +0000 (15:55 +1100)]
New version 1.2.22.

13 years agorecover: finish pending trans3 commits when a recovery is finished.
Michael Adam [Wed, 23 Feb 2011 16:39:57 +0000 (17:39 +0100)]
recover: finish pending trans3 commits when a recovery is finished.

When the end_recovery control is received, pending trans3 commits are
finished. During the recovery, all the actions like persistent_callback
and persistent_store_timeout had been disabled to let the recovery do
its job. After the recover is completed, send the reply to the waiting
clients.

13 years agopersistent: add ctdb_persistent_finish_trans3_commits().
Michael Adam [Wed, 23 Feb 2011 16:38:40 +0000 (17:38 +0100)]
persistent: add ctdb_persistent_finish_trans3_commits().

This function walks all databases and checks for running trans3 commits.
It sends replies to all of them (with error code) and ends them.
To be called when a recovery finishes.

13 years agodaemon: correctly end a running trans3_commit if the client disconnects.
Michael Adam [Wed, 23 Feb 2011 16:37:42 +0000 (17:37 +0100)]
daemon: correctly end a running trans3_commit if the client disconnects.

13 years agopersistent: add a client context to the persistent_stat and track the db_id
Michael Adam [Wed, 23 Feb 2011 16:35:27 +0000 (17:35 +0100)]
persistent: add a client context to the persistent_stat and track the db_id

The db_id is tracked in the client context as an indication that a
transaction commit is in progress. This is cleared in the persistent_state
talloc destructor.

This is in order to properly treat running trans3_commits if the client
disconnects.

13 years agopersistent: reject trans3_control when a commit is already active.
Michael Adam [Tue, 22 Feb 2011 23:03:07 +0000 (00:03 +0100)]
persistent: reject trans3_control when a commit is already active.

This should actually never happen.

13 years agopersistent: allocate the persistent state in the ctdb_db struct in trans3_commit
Michael Adam [Tue, 22 Feb 2011 23:01:13 +0000 (00:01 +0100)]
persistent: allocate the persistent state in the ctdb_db struct in trans3_commit

Make sure that ctdb_db->persistent_state is correctly NULL-ed when
the state is freed. This way, we can use ctdb_db->persistent_state
as an indication for whether a transaction commit is currently
running.

13 years agopersistent: add a ctdb_db context to the ctdb_persistent_state struct.
Michael Adam [Tue, 22 Feb 2011 23:23:18 +0000 (00:23 +0100)]
persistent: add a ctdb_db context to the ctdb_persistent_state struct.

13 years agopersistent: add a ctdb_persistent_state member to the ctdb_db context.
Michael Adam [Tue, 22 Feb 2011 23:00:04 +0000 (00:00 +0100)]
persistent: add a ctdb_persistent_state member to the ctdb_db context.

To be used for tracking running transaction commits through recoveries.

13 years agopersistent_callback: print "no error message given" instead of "(null)"
Michael Adam [Tue, 22 Feb 2011 21:49:52 +0000 (22:49 +0100)]
persistent_callback: print "no error message given" instead of "(null)"

13 years agopersistent: reduce indentation for the finishing moves in ctdb_persistent_callback
Michael Adam [Tue, 22 Feb 2011 21:47:30 +0000 (22:47 +0100)]
persistent: reduce indentation for the finishing moves in ctdb_persistent_callback

13 years agopersistent: if a node failed to update_record, trigger a recovery
Michael Adam [Tue, 22 Feb 2011 21:44:16 +0000 (22:44 +0100)]
persistent: if a node failed to update_record, trigger a recovery

and stop processing of the update_record replies in order to let
the recovery finish the trans3_commit control.

13 years agopersistent_store_timout: do not really time out the trans3_commit control in recovery
Michael Adam [Tue, 22 Feb 2011 21:24:50 +0000 (22:24 +0100)]
persistent_store_timout: do not really time out the trans3_commit control in recovery

If a recovery was started, then all further processing of the update_record
controls sent by the trans3_commit control and timing them out is disabled.
The recovery should trigger sending the reply for the update record control
when finished.

13 years agopersistent_callback: ignore the update-recordreturn code of remote node in recovery
Michael Adam [Tue, 22 Feb 2011 21:24:50 +0000 (22:24 +0100)]
persistent_callback: ignore the update-recordreturn code of remote node in recovery

If a recovery was started, then all further processing of the update_record
controls sent by the trans3_commit control is disabled. The recovery should
trigger sending the reply for the update record control when finished.

13 years agoDeferred attach : at early startup, defer any db attach calls until we are out of...
Ronnie Sahlberg [Wed, 23 Feb 2011 04:46:36 +0000 (15:46 +1100)]
Deferred attach : at early startup, defer any db attach calls until we are out of recovery.

13 years agoNew version 1.2.21
Ronnie Sahlberg [Mon, 21 Feb 2011 04:55:16 +0000 (15:55 +1100)]
New version 1.2.21

13 years agoctdb_req_dmaster from non-master
Ronnie Sahlberg [Fri, 18 Feb 2011 00:21:19 +0000 (11:21 +1100)]
ctdb_req_dmaster from non-master

If we find a situatior where we get a stray packet with the wrong
dmaster, dont suicide with ctdb_fatal() since this is too disruptive.
Just drop the stray packet and force a recovery to make sure all is good again.

CQ S1022004

13 years agoRevert "Dont allow client processes to attach to databases while we are still in...
Ronnie Sahlberg [Thu, 17 Feb 2011 20:39:14 +0000 (07:39 +1100)]
Revert "Dont allow client processes to attach to databases while we are still in recovery mode."

This reverts commit faf3b1542fd27b3ad32ac7b362ef39d8cb0b05ff.

git pull ... 1.2-splitbrain
does not do what I think it does.
Revert patch and pull it into the right branch instead.

13 years agoDont allow client processes to attach to databases while we are still in recovery...
Ronnie Sahlberg [Thu, 17 Feb 2011 02:14:41 +0000 (13:14 +1100)]
Dont allow client processes to attach to databases while we are still in recovery mode.

The exception is the local recovery daemon which needs to be able to attach (==create) any missing databases during recovery. This process requires the use of the attach control.

13 years agoNew version 1.2.20
Ronnie Sahlberg [Tue, 8 Feb 2011 06:01:33 +0000 (17:01 +1100)]
New version 1.2.20

13 years agoWe default to non-deterministic ip now where ips are "sticky" and dont change
Ronnie Sahlberg [Wed, 2 Feb 2011 04:00:53 +0000 (15:00 +1100)]
We default to non-deterministic ip now where ips are "sticky" and dont change
too much.
This means we can simplify the way we add ips significantly and stop
trying to move them.

We also check if the node already hosts the ip, in which case we used to return an error. Instead just print an error string but return 0, ok.
This makes it easier to script, and works around broken scripts.

CQ1021034

13 years agoNew version 1.2.19
Ronnie Sahlberg [Mon, 31 Jan 2011 06:48:22 +0000 (17:48 +1100)]
New version 1.2.19

13 years agoIf the node is stopped, put a log entry in /var/log/* to indicate this is why we...
Ronnie Sahlberg [Mon, 31 Jan 2011 06:40:26 +0000 (17:40 +1100)]
If the node is stopped,   put a log entry in /var/log/* to indicate this is why we never become ready

13 years agoLockWait congestion.
Ronnie Sahlberg [Mon, 24 Jan 2011 00:42:50 +0000 (11:42 +1100)]
LockWait congestion.

Add a dlist to track all active lockwait child processes.
Everytime creating a new lockwait handle, check if there is already an
active lockwait process for this database/key and if so,
send the new request straight to the overflow queue.

This means we will only have one active lockwaic child process for a certain key,
even if there were thousands of fetch-lock requests for this key.

When the lockwait processing finishes for the original request, the processing in d_overflow() will automagically process all remaining keys as well.

Add back a --nosetsched argument to make it easier to run under gdb

13 years agoCompile fix
Ronnie Sahlberg [Sun, 23 Jan 2011 22:43:45 +0000 (09:43 +1100)]
Compile fix

13 years agoctdb_lockwait: create overflow queue.
Rusty Russell [Fri, 21 Jan 2011 10:47:02 +0000 (21:17 +1030)]
ctdb_lockwait: create overflow queue.

Once we have more than 200 children waiting on a particular db, don't create
any more.  Just put them on an overflow queue, and when a child gets a lock
search that queue to see if others were after the same lock (they probably
were).

13 years agoAdd a new test tool that fetch locks a record and then blocks until it receives
Ronnie Sahlberg [Sun, 23 Jan 2011 20:39:33 +0000 (07:39 +1100)]
Add a new test tool that fetch locks a record and then blocks until it receives
user input to unlock the record again.

13 years ago60.nfs
Ronnie Sahlberg [Thu, 20 Jan 2011 23:56:56 +0000 (10:56 +1100)]
60.nfs
Dont update the statd settings that often.
When we have very many nodes and very many ips, this would generate
a lot of unnessecary load on the system

13 years agoTDB : Fix for a deadlock with transaction lock and lockall/lockallmark
Ronnie Sahlberg [Tue, 18 Jan 2011 21:00:36 +0000 (08:00 +1100)]
TDB : Fix for a deadlock with transaction lock and lockall/lockallmark
causing ctdbd hangs

13 years agoctdb: hold transaction locks during freeze, mark during recover.
Ronnie Sahlberg [Tue, 18 Jan 2011 02:33:24 +0000 (13:33 +1100)]
ctdb: hold transaction locks during freeze, mark during recover.

Make the ctdb parent "mark" the transaction lock once the child process
has frozen/locked the entire database.
This stops the ctdb daemon from using  a blocking fcntl() locking on the tdb during the
read traverse during recovery.

CQ 1021388

13 years agotdb: expose transaction lock infrastructure for ctdb
Rusty Russell [Tue, 18 Jan 2011 00:17:11 +0000 (10:47 +1030)]
tdb: expose transaction lock infrastructure for ctdb

tdb_traverse_read() grabs the transaction lock.  This can cause ctdbd
(which uses it) to block when it should not; expose mark and normal
variants of this lock, so ctdbd's child (the recovery daemon) can
acquire it and the ctdbd parent can mark it was held.

13 years agoNew version 1.2.17
Ronnie Sahlberg [Mon, 17 Jan 2011 01:05:43 +0000 (12:05 +1100)]
New version 1.2.17

13 years agochange Christinas previous patch to only perform the check/logging
Ronnie Sahlberg [Mon, 17 Jan 2011 01:00:18 +0000 (12:00 +1100)]
change Christinas previous patch to only perform the check/logging
if we are the main ctdb daemon.
Other daemons/child processes are not guaranteed to get events on regular basis
so those should not be checked.

13 years agoimprove timing issue detections
Christian Ambach [Fri, 14 Jan 2011 12:55:28 +0000 (13:55 +0100)]
improve timing issue detections

the original "Time jumped" messages are too coarse to interpret
exactly what was going wrong inside of CTDB.

This patch removes the original logs and adds two other logs that
differentiate between the time it took to work on an event and
the time it took to get the next event.

13 years agoLIBCTDB: add support for traverse
Ronnie Sahlberg [Fri, 14 Jan 2011 06:35:31 +0000 (17:35 +1100)]
LIBCTDB: add support for traverse

13 years agoWe can not always rely on the recovery daemon pinging us in a timely manner
Ronnie Sahlberg [Thu, 13 Jan 2011 22:46:04 +0000 (09:46 +1100)]
We can not always rely on the recovery daemon pinging us in a timely manner
so we need a "ticker" in the main ctdbd daemon too to ensure we get at least one event to process every second.

This will improve the accuracy of "Time jumped" messages and remove false positives when the recovery daemon is "slow".

13 years agoADDIP failure
Ronnie Sahlberg [Thu, 13 Jan 2011 05:17:43 +0000 (16:17 +1100)]
ADDIP failure

Found during automatic regression testing.
We do not allow the takeip/releaseip events to be executed during a recovery.

All of "ctdb addip, ctdb delip, ctdb moveip" use and force these events to
trigger to perform the ip assignments required.

If these commands collide with a recovery, these commands could fail since we do
not allow takeip/releaseip events to trigger during the recovery.
While it is easy to just try running hte command again, this is suboptimal for script use.

Change these commands to retry these operations a few times until either successfull or until we give up.
This makes the commands much easier to use in scripts.

13 years agoIPALLOCATION : If the node is held pinned down in "init" state
Ronnie Sahlberg [Wed, 12 Jan 2011 22:35:37 +0000 (09:35 +1100)]
IPALLOCATION : If the node is held pinned down in "init" state
by external services failing to start, or blocking CTDBD from finishing the startup phase,
we can encounter a situation where we have not yet fully initialized, but a
remote recovery master tries to release a certain ip clusterwide.

In this situation the node that is pinned down in init/startup phase
would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces.

In this situation, we just need to remain unhealthy, there is on need to
also ban the node.

Remove the autobanning for this condition and just let the node remain in
unhealthy mode.
Banning is overkill in this situation when the system is broken and just
draws attention to ctdbd instead of the root cause.

13 years agoEventscripts: lower the fail/restart limits for nfsd.
Martin Schwenke [Tue, 11 Jan 2011 06:13:57 +0000 (17:13 +1100)]
Eventscripts: lower the fail/restart limits for nfsd.

We were potentially leaving a node unable to serve requests for too
long.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoEventscripts: use "startstop_nfs restart" to reconfigure NFS.
Martin Schwenke [Tue, 11 Jan 2011 06:13:06 +0000 (17:13 +1100)]
Eventscripts: use "startstop_nfs restart" to reconfigure NFS.

This was defaulting to just "service nfs restart", which doesn't have
the workarounds we need.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoEventscripts: only autostart during a monitor event.
Martin Schwenke [Tue, 11 Jan 2011 06:12:03 +0000 (17:12 +1100)]
Eventscripts: only autostart during a monitor event.

Otherwise we might short-circuit events that are run only once and
actually need to do something.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoEventscripts: print a message when reconfiguring a service.
Martin Schwenke [Tue, 11 Jan 2011 06:10:55 +0000 (17:10 +1100)]
Eventscripts: print a message when reconfiguring a service.

Otherwise there can be strange error messages from services
stopping/starting, without any context.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoEventscripts: work around NFS restart failure under load.
Martin Schwenke [Tue, 11 Jan 2011 06:06:48 +0000 (17:06 +1100)]
Eventscripts: work around NFS restart failure under load.

"service nfs restart" can fail.  To stop nfsd it sends a SIGINT and
nfsd might take a while to process it if the system is loaded.
Starting nfsd may then fail because resources are still in use.

This does some /proc magic to tell nfsd to do no more processing.  It
then runs service stop, kills nfsd with SIGKILL, and then runs service
start.  This is much less likely to fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTYPO
Ronnie Sahlberg [Tue, 11 Jan 2011 05:17:06 +0000 (16:17 +1100)]
TYPO

13 years agoSTATD is 100027 not 1000247
Ronnie Sahlberg [Tue, 11 Jan 2011 05:15:41 +0000 (16:15 +1100)]
STATD is 100027    not 1000247

13 years agoLIBCTDB uninitialized inqueue element
Ronnie Sahlberg [Mon, 10 Jan 2011 20:37:17 +0000 (07:37 +1100)]
LIBCTDB uninitialized inqueue element

From Michael Anderson,
initialize the inqueue element of the ctdb structure to NULL,
else it might be used uninitialized and cause a segv.

13 years agorecoverd: avoid triggering a full recovery if just some ip allocation
Ronnie Sahlberg [Mon, 10 Jan 2011 05:51:56 +0000 (16:51 +1100)]
recoverd: avoid triggering a full recovery if just some ip allocation
has failed.
We dont need to rebuild the databases in this situation, we just
need to try again to sort out the ip address allocations.