obnox/ctdb.git
11 years agoRevert "server: Ensure remaining children call ctdb_set_child_info" obnox-1.2.40-scripts
Michael Adam [Thu, 18 Apr 2013 23:49:28 +0000 (01:49 +0200)]
Revert "server: Ensure remaining children call ctdb_set_child_info"

This reverts commit 5a06f52307f5f35aee1688d3a5df858f4f580486.

11 years agoTODO - replace more calls to fork() by ctdb_fork()
Michael Adam [Thu, 18 Apr 2013 23:49:05 +0000 (01:49 +0200)]
TODO - replace more calls to fork() by ctdb_fork()

11 years agoTODO - ctdb:recover: use ctdb_fork() instead of fork() in ctdb_control_set_recmode()
Michael Adam [Thu, 18 Apr 2013 23:20:49 +0000 (01:20 +0200)]
TODO - ctdb:recover: use ctdb_fork() instead of fork() in ctdb_control_set_recmode()

11 years agoscripts: Crash cleanup script should pass a tag to logger
Martin Schwenke [Tue, 16 Apr 2013 06:10:04 +0000 (16:10 +1000)]
scripts: Crash cleanup script should pass a tag to logger

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: ctdb-crash-cleanup.sh uses initscript to see if ctdbd is running
Martin Schwenke [Mon, 15 Apr 2013 05:42:55 +0000 (15:42 +1000)]
scripts: ctdb-crash-cleanup.sh uses initscript to see if ctdbd is running

"ctdb ping" (or "ctdb status") can time out.  How many times should we
try?

Instead, depend on the initscript to implement something sane.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit 87da2a59da389cefbf6b9f930b9f7f4eb4cfad07)

Conflicts:
config/ctdb-crash-cleanup.sh

11 years agoinitscript: Use a PID file to implement the "status" option
Martin Schwenke [Mon, 15 Apr 2013 05:18:12 +0000 (15:18 +1000)]
initscript: Use a PID file to implement the "status" option

Using "ctdb ping" and "ctdb status" is fraught with danger.  These
commands can timeout when ctdbd is running, leading callers to believe
that ctdbd is not running.  Timeouts could be increased but we would
still have to handle potential timeouts.

Everything else in the world implements the "status" option by
checking if the relevant process is running.  This change makes CTDB
do the same thing and uses standard distro functions.

This change is backward compatible in sense that a missing
/var/run/ctdb/ directory means that we don't do a PID file check but
just depend on the distro's checking method.  Therefore, if CTDB was
started with an older version of this script then "service ctdb
status" will still work.

This script does not support changing the value of CTDB_VALGRIND
between calls.  If you start with CTDB_VALGRIND=yes then you need to
check status with the same setting.  CTDB_VALGRIND is a debug
variable, so this is acceptable.

This also adds sourcing of /lib/lsb/init-functions to make the Debian
function status_of_proc() available.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit c90b621f9e854fc446f0294ca6d6c1ae10179656)

Conflicts:
config/ctdb.init

11 years agoctdbd: Add --pidfile option
Martin Schwenke [Mon, 15 Apr 2013 03:32:57 +0000 (13:32 +1000)]
ctdbd: Add --pidfile option

Default is not to create a pid file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 39f9c550fde323eb9bfe049c0168a1264888e777)

Conflicts:
server/ctdb_daemon.c

11 years agoserver: Ensure remaining children call ctdb_set_child_info
Martin Schwenke [Tue, 16 Apr 2013 05:13:11 +0000 (15:13 +1000)]
server: Ensure remaining children call ctdb_set_child_info

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoutil: ctdb_fork() should call ctdb_set_child_info()
Martin Schwenke [Tue, 16 Apr 2013 05:03:54 +0000 (15:03 +1000)]
util: ctdb_fork() should call ctdb_set_child_info()

For now we pass NULL as the child name.  Later we'll give ctdb_fork()
and friends an extra argument and pass that through.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoutil: New functions ctdb_set_child_info() and ctdb_is_child_process()
Martin Schwenke [Tue, 16 Apr 2013 05:01:29 +0000 (15:01 +1000)]
util: New functions ctdb_set_child_info() and ctdb_is_child_process()

Must be called by all child processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
11 years agoRevert "util: New functions ctdb_set_child_info() and ctdb_is_child_process()"
Michael Adam [Thu, 18 Apr 2013 15:41:24 +0000 (17:41 +0200)]
Revert "util: New functions ctdb_set_child_info() and ctdb_is_child_process()"

This reverts commit 5873ab482d5439731f2ebc8cc9412f91977455e5.

11 years agoutil: New functions ctdb_set_child_info() and ctdb_is_child_process()
Martin Schwenke [Tue, 16 Apr 2013 05:01:29 +0000 (15:01 +1000)]
util: New functions ctdb_set_child_info() and ctdb_is_child_process()

Must be called by all child processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoNew version 1.2.61
Amitay Isaacs [Fri, 5 Apr 2013 05:26:24 +0000 (16:26 +1100)]
New version 1.2.61

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolockwait: Pass CTDB daemon PID on command line
Amitay Isaacs [Fri, 5 Apr 2013 04:31:26 +0000 (15:31 +1100)]
lockwait: Pass CTDB daemon PID on command line

In lockwait helper process we cannot rely on getppid() to find the pid
of CTDB daemon as CTDB daemon can go away before the helper executes. In
which case, ctdb helper process will hang around forever.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agorecoverd/takeover: Use IP->node mapping info from nodes hosting that IP
Amitay Isaacs [Fri, 5 Apr 2013 02:34:06 +0000 (13:34 +1100)]
recoverd/takeover: Use IP->node mapping info from nodes hosting that IP

When collating IP information for IP layout, only trust the nodes that are
hosting an IP, to have correct information about that IP.  Ignore what all the
other nodes think.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 1c7adbccc69ac276d2b957ad16c3802fdb8868ca)

11 years agostatd-callout: Make sure statd callout script always runs as root
Amitay Isaacs [Wed, 3 Apr 2013 03:44:08 +0000 (14:44 +1100)]
statd-callout: Make sure statd callout script always runs as root

In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(cherry picked from commit fe8c4880b371492a38554868d4ca10918c54e412)

Conflicts:
packaging/RPM/ctdb.spec.in

11 years agoclient: Set the socket non-blocking only after connect succeeds
Amitay Isaacs [Mon, 18 Mar 2013 02:45:08 +0000 (13:45 +1100)]
client: Set the socket non-blocking only after connect succeeds

If the socket is set non-blocking before connect, then we should catch
EAGAIN errors and retry. Instead of adding a random number of retries,
better to wait for connect to succeed and then set the socket to
non-blocking.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 524ec206e6a5e8b11723f4d8d1251ed5d84063b0)

11 years agocommon/messaging: Use the jenkins hash in ctdb_message
Volker Lendecke [Wed, 3 Apr 2013 12:59:21 +0000 (14:59 +0200)]
common/messaging: Use the jenkins hash in ctdb_message

This give a better hash distribution
(cherry picked from commit f7f8bde2376f8180a0dca6d7b8d7d2a4a12f4bd8)

11 years agocommon/messaging: use tdb_parse_record in message_list_db_fetch
Volker Lendecke [Fri, 5 Apr 2013 02:11:31 +0000 (13:11 +1100)]
common/messaging: use tdb_parse_record in message_list_db_fetch

This avoids malloc/free in a hot code path.
(cherry picked from commit c137531fae8f7f6392746ce1b9ac6f219775fc29)

11 years agocommon/messaging: Abstract db related operations inside db functions
Amitay Isaacs [Wed, 3 Apr 2013 04:08:14 +0000 (15:08 +1100)]
common/messaging: Abstract db related operations inside db functions

This simplifies the use of message indexdb API and abstracts tdb related code
inside the API.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit bf7296ce9b98563bcb8426cd035dbeab6d884f59)

11 years agocommon/messaging: Don't forget to free the result returned by tdb_fetch()
Amitay Isaacs [Tue, 2 Apr 2013 05:57:51 +0000 (16:57 +1100)]
common/messaging: Don't forget to free the result returned by tdb_fetch()

This fixes a memory leak in the messaging code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 20be1f991dd75c2333c9ec9db226432a819f57ba)

11 years agocommon/messaging: Free message list header if all message handlers are freed
Amitay Isaacs [Tue, 2 Apr 2013 01:08:39 +0000 (12:08 +1100)]
common/messaging: Free message list header if all message handlers are freed

This makes sure that even if the srvids are not deregistered, the header
structure is freed when the last message handler has been freed as a result of
client going away.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 4e1ec7412866f2d31c41de1bec0fbf788c03051b)

11 years agoNew version 1.2.60
Amitay Isaacs [Mon, 25 Mar 2013 07:05:07 +0000 (18:05 +1100)]
New version 1.2.60

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolockwait: Allow for zero length key requests
Amitay Isaacs [Thu, 14 Mar 2013 04:44:44 +0000 (15:44 +1100)]
lockwait: Allow for zero length key requests

Samba sends zero length key requests for notify database. To support older
Samba behaviour for now, allow zero length key requests. Zero length key is
encoded as "NULL" string.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolockwait: Pass all locking information on commandline to lockwait helper
Amitay Isaacs [Wed, 13 Mar 2013 06:05:00 +0000 (17:05 +1100)]
lockwait: Pass all locking information on commandline to lockwait helper

Simplify lockwait code by getting rid of the communication between ctdbd
and ctdb lockwait helper child by passing all the locking information
on command line.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agolockwait: Check result of lockwait child
Volker Lendecke [Tue, 12 Mar 2013 12:35:17 +0000 (13:35 +0100)]
lockwait: Check result of lockwait child

11 years agolockwait: fix a comment typo
Michael Adam [Wed, 13 Mar 2013 08:12:50 +0000 (09:12 +0100)]
lockwait: fix a comment typo

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agoutil: Add hex_decode_talloc() to decode hex string into a binary blob
Amitay Isaacs [Wed, 13 Mar 2013 11:57:44 +0000 (22:57 +1100)]
util: Add hex_decode_talloc() to decode hex string into a binary blob

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 307416afda707b687f5e89e8438e45c154a4c806)

11 years agologging: Do not ignore stdout/stderr from the exec'd children
Amitay Isaacs [Wed, 13 Mar 2013 00:46:18 +0000 (11:46 +1100)]
logging: Do not ignore stdout/stderr from the exec'd children

To log debugging information from child processes that are started
with vfork and exec, do not set close_on_exec on STDOUT and STDERR for
that process.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 08c53ee609b80f87450a7a1d7dd24fbcdf5ab7bc)

11 years agoNew Version 1.2.59
Amitay Isaacs [Wed, 6 Mar 2013 06:48:44 +0000 (17:48 +1100)]
New Version 1.2.59

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Exec lockwait helper for locking a record
Amitay Isaacs [Mon, 18 Feb 2013 07:05:28 +0000 (18:05 +1100)]
ctdbd: Exec lockwait helper for locking a record

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Create a standalone helper for record locking
Amitay Isaacs [Mon, 18 Feb 2013 07:04:07 +0000 (18:04 +1100)]
ctdbd: Create a standalone helper for record locking

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotevent: optimize adding new timer events
Stefan Metzmacher [Fri, 22 Feb 2013 11:45:39 +0000 (12:45 +0100)]
tevent: optimize adding new timer events

There're two cases:

1. Adding a timer with a zero timestamp.
   Such events were used before we had immediate events.
   It's likely that there're a lot of this events
   and we need to add new ones in fifo order.

2. Adding a timer with a real timestamp.
   As this timestamps typically get higher:-)
   it's better to traverse the existing list from
   the tail.

This is not completely optimal, but it should be better
than before.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
11 years agocommon/io: For scheduling immediate events use tevent_schedule_immediate
Amitay Isaacs [Fri, 22 Feb 2013 01:59:39 +0000 (12:59 +1100)]
common/io: For scheduling immediate events use tevent_schedule_immediate

tevent_schedule_immediate() is much more efficient at handling events that need
to be processed immediately rather than creating timed events with
timeval_zero().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: 11734be353a1e246163eda631d35dfe55d1d6fb1

11 years agoctdbd: Add an index db for message list for faster searches
Amitay Isaacs [Thu, 21 Feb 2013 02:16:15 +0000 (13:16 +1100)]
ctdbd: Add an index db for message list for faster searches

When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers.  Using a hash based index significantly improves the
performance of search in a linked list.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: 3e09f25d419635f6dd679b48fa65370f7860be7d

11 years agotools/ctdb: delip no longer fails if IP can not be moved
Martin Schwenke [Wed, 27 Feb 2013 05:01:55 +0000 (16:01 +1100)]
tools/ctdb: delip no longer fails if IP can not be moved

Moving the IP is an optimisation so should not cause failure.

Refactor and simplify the retry-move-IP into new function
try_moveip().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: 5402f85dde045576cbaf64e01c68e28ed52204e8

11 years agorecoverd: Do not send "ipreallocated" event to stopped nodes
Martin Schwenke [Mon, 18 Feb 2013 05:32:14 +0000 (16:32 +1100)]
recoverd: Do not send "ipreallocated" event to stopped nodes

Stopped nodes will reject "ipreallocated" because they are in
recovery, so they will eventually be banned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: c270381ee81903ff459a8b23fd57c997d038cf14

11 years agoclient: New generic node listing function list_of_nodes()
Martin Schwenke [Tue, 19 Feb 2013 03:29:06 +0000 (14:29 +1100)]
client: New generic node listing function list_of_nodes()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: a73bb56991b8c07ed0e9517ffcf0dc264be30487

11 years agoctdbd: Remove the variable declaration shadowing earlier declaration
Amitay Isaacs [Fri, 22 Feb 2013 01:28:56 +0000 (12:28 +1100)]
ctdbd: Remove the variable declaration shadowing earlier declaration

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Use the correct local variable to check status
Amitay Isaacs [Fri, 22 Feb 2013 01:28:25 +0000 (12:28 +1100)]
ctdbd: Use the correct local variable to check status

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Fix a struct initializer
Volker Lendecke [Wed, 20 Feb 2013 09:46:47 +0000 (10:46 +0100)]
ctdbd: Fix a struct initializer

11 years agoNew Version 1.2.58
Amitay Isaacs [Tue, 19 Feb 2013 07:09:05 +0000 (18:09 +1100)]
New Version 1.2.58

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoRECOVER: When we pull databases during recovery, we used to reallocate the databuffer...
Ronnie Sahlberg [Fri, 25 May 2012 02:27:59 +0000 (12:27 +1000)]
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region.

Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move  operations that may be required.

Create a tunable to override/change how much preallocation should be used.

Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c

Cherry-pick-from: 1f262deaad0818f159f9c68330f7fec121679023

Also, make sure the preallocation size is 10MB and not 100MB.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoLogging: Free the ringbuffer in child processes created with ctdb_fork()
Martin Schwenke [Tue, 5 Feb 2013 01:09:36 +0000 (12:09 +1100)]
Logging: Free the ringbuffer in child processes created with ctdb_fork()

At the moment the log ringbuffer is duplicated in every child process.
Althought it is copy-on-write we want to see if it is contributing to
out-of-memory situations when there are a lot of children.

The ringbuffer isn't accessible from any of the children anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Conflicts:
common/ctdb_fork.c

Cherry-pick-from: a82d3ec12f0fda16d6bfa8442a07595de897c10e

11 years agoLogging: New function ctdb_log_ringbuffer_free()
Martin Schwenke [Tue, 5 Feb 2013 01:08:11 +0000 (12:08 +1100)]
Logging: New function ctdb_log_ringbuffer_free()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Cherry-pick-from: a4f622e85168f59417c11705f1734e0352e1d44a

11 years agoeventscripts: Remove calls to "smbstatus -np" for samba cleanup
Amitay Isaacs [Mon, 11 Feb 2013 00:25:49 +0000 (11:25 +1100)]
eventscripts: Remove calls to "smbstatus -np" for samba cleanup

This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Conflicts:
config/events.d/50.samba

Cherry-pick-from: 053b89c6dbce47001505524606889334559d2ec4

11 years agocommon/io: Rewrite socket handling code to read all available data
Amitay Isaacs [Thu, 17 Jan 2013 23:42:14 +0000 (10:42 +1100)]
common/io: Rewrite socket handling code to read all available data

This improves the processing of packets considerably.  It has been
observed that there can be as many as 10 packets in the socket buffer and
the current code of reading a single packet from a socket at a time is
not very optimal.  This change reads all the bytes from socket buffer and
then parses to extract multiple packets.  If there are multiple packets,
set up a timed event to process next packet.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: d788bc8f7212b7dc1587ae592242dc8c876f4053

Conflicts:
common/ctdb_io.c

11 years agodaemon: Make sure all the traverse children are terminated if traverse times out
Amitay Isaacs [Tue, 22 Jan 2013 02:27:20 +0000 (13:27 +1100)]
daemon: Make sure all the traverse children are terminated if traverse times out

When traverse times out, callback function is called with key and data set to
tdb_null.  This is also the way to signal end of traverse.  So if the traverse
times out, callback function treats it as traverse ended and frees state without
calling the destructor.

Keep track if the traverse timed out, so callback function can take appropriate
action for traverse timeout and traverse end.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: 35da9a7c2a0f5e54e61588c3c3455f06ebc66822

11 years agoNew version 1.2.57
Amitay Isaacs [Wed, 6 Feb 2013 03:33:51 +0000 (14:33 +1100)]
New version 1.2.57

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoinitscript: export CTDB_DEBUG_LOCKS
Martin Schwenke [Tue, 5 Feb 2013 05:40:39 +0000 (16:40 +1100)]
initscript: export CTDB_DEBUG_LOCKS

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoinitscript: export CTDB_EXTERNAL_TRACE
Martin Schwenke [Tue, 5 Feb 2013 02:16:46 +0000 (13:16 +1100)]
initscript: export CTDB_EXTERNAL_TRACE

This means it can be set like any other configuration option in the
configuration file, without needing to export it there.

Cherry-pick-from: a0ef73e197dc9147f7718e0813fe803ff0b3d54d
Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Backport use of external script to debug hung eventscript
Martin Schwenke [Thu, 17 May 2012 00:17:51 +0000 (10:17 +1000)]
ctdbd: Backport use of external script to debug hung eventscript

This is a cherry-pick from 6e68797af67bee36f2bad045f94806e7e98f27e9,
combined with several recent fixes:

  8507303b525d20c74e8ec4e7c4f5f275945cd3b6
    scripts: debug-hung-script.sh doesn't need functions/loadconfig
  501461cc3e132d4adee9e91b5d4513a26bae2846
    ctdbd: Remove debug_hung_script_ctx
  0581f9a84e58764d194f4e04064c2c5b393c348b
    ctdbd: Remove command-line option --debug-hung-script
  3400b2ed34b6eb9496eb55f1aab6f89d2952060d
    ctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable
  9b0d56b16775aa16f33bdfdf831256e085fa3339
    ctdbd: Don't use a fixed length buffer for the hung script command

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agodoc: Rebuild all documentation
Amitay Isaacs [Tue, 5 Feb 2013 01:59:53 +0000 (12:59 +1100)]
doc: Rebuild all documentation

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoDOC document the FetchCollapse tunable
Ronnie Sahlberg [Tue, 20 Mar 2012 00:38:20 +0000 (11:38 +1100)]
DOC document the FetchCollapse tunable

Cherry-pick-from: c37aa6f3738693653f64c2fa015ace061da38b5a

11 years agoFETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including...
Ronnie Sahlberg [Tue, 20 Mar 2012 00:31:59 +0000 (11:31 +1100)]
FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records

Conflicts:
server/ctdb_tunables.c

Cherry-pick-from: eafd7bbaaa5931546a96c8beae3cf9a39a49c925

11 years agoRecord Fetch Collapse: Collapse multiple fetch request into one single request.
Ronnie Sahlberg [Mon, 7 Nov 2011 19:55:46 +0000 (06:55 +1100)]
Record Fetch Collapse: Collapse multiple fetch request into one single request.

When multiple clients fetch the same record concurrently, send only one single
fetch across the network and deferr all other fetches locally.
This improves performance for hot records and reduces cpu load on ctdb.

Conflicts:
server/ctdb_ltdb_server.c

Cherry-pick-from: 82d6946ad8b3348e8b9d3d971f24925ade02d1be

11 years agoscripts: Fix the variable name for sed expressions
Amitay Isaacs [Wed, 9 Jan 2013 00:03:18 +0000 (11:03 +1100)]
scripts: Fix the variable name for sed expressions

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoNew version 1.2.56
Amitay Isaacs [Wed, 2 Jan 2013 05:17:58 +0000 (16:17 +1100)]
New version 1.2.56

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agodaemon: Change the default recovery method for persistent databases
Amitay Isaacs [Wed, 2 Jan 2013 04:49:39 +0000 (15:49 +1100)]
daemon: Change the default recovery method for persistent databases

Use sequence numbers to do the recovery for persistent databases instead
of RSNs.  This fixes the problem of registry corrpution during recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoDOC: describe the RecoverPDBBySeqNum tunable
Ronnie Sahlberg [Tue, 29 Nov 2011 21:59:03 +0000 (08:59 +1100)]
DOC: describe the RecoverPDBBySeqNum tunable

Conflicts:
doc/ctdbd.1
doc/ctdbd.1.html

Cherry-pick-from: 86d956170d4806065f1470fc44710c085c57f17a

11 years agoRecover Persistent database DB by DB and not record by record
Ronnie Sahlberg [Mon, 28 Nov 2011 02:56:30 +0000 (13:56 +1100)]
Recover Persistent database DB by DB and not record by record

Add a new tunable that changes the mode how persistent databases are recovered.
RecoveryPDBBySeqNum

When set to 1, persistent databases will be recovered in whole from the node which
has the highest "__db_sequence_number__" record.
This record is managed by samba for those databases where we do persistent writes and have
inter-record relations.
For these databases we do not want the usual "blend records from all nodes based
on individual record RSN" but instead a mode where we pick one instance of the persistent database.

If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN".
Some persistent databases do not contain record interrelations and as such does not
contain this special record at all.

Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c

Cherry-pick-from: 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc

11 years agoLibCTDB: add get persistent db seqnum control
Ronnie Sahlberg [Mon, 28 Nov 2011 05:30:46 +0000 (16:30 +1100)]
LibCTDB: add get persistent db seqnum control

Conflicts:
tools/ctdb.c

Cherry-pick-from: 6e96a62494bbb2c7b0682ebf0c2115dd2f44f7af

11 years agoDB Seqnum: must provide a ctdb_ltdb_header when calling ctdb_ltdb_fetch()
Ronnie Sahlberg [Sun, 27 Nov 2011 23:41:17 +0000 (10:41 +1100)]
DB Seqnum: must provide a ctdb_ltdb_header when calling ctdb_ltdb_fetch()

Cherry-pick-from: 1fea9ef55a6a9d201ad1b49583451ac3e6b1c66d

11 years agoscripts: Add helper script to log locking information using /proc/locks
Amitay Isaacs [Wed, 5 Dec 2012 00:38:42 +0000 (11:38 +1100)]
scripts: Add helper script to log locking information using /proc/locks

This finds any processes locking tdb databases used by CTDB and logs
stack trace for each process.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agodaemon: Run an external script if freeze locks were not obtained during recovery
Amitay Isaacs [Wed, 5 Dec 2012 00:37:26 +0000 (11:37 +1100)]
daemon: Run an external script if freeze locks were not obtained during recovery

If the freeze child is already created in ctdb_start_freeze(), then it indicates
that the child process has not yet obtained the locks.  This may be because
another process has locked the databases and has not yet released the locks.

In this case, invoke a helper script defined by environmental variable
CTDB_DEBUG_LOCKS, to log information about locks.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoNew version 1.2.55
Amitay Isaacs [Tue, 27 Nov 2012 04:50:54 +0000 (15:50 +1100)]
New version 1.2.55

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoRevert "when creating/adding a public ip, set the initial interface to be the first...
Amitay Isaacs [Thu, 22 Nov 2012 03:37:45 +0000 (14:37 +1100)]
Revert "when creating/adding a public ip, set the initial interface to be the first interface specified"

This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f.

When IP is added to a node on a new interface for the first time,
vnn->iface gets set to the first interface defined for that IP.  This
actually causes problem in ctdb_vnn_assign_iface().  Since vnn->iface
is set it takes an early exit without updating vnn->pnn.  This results
in IP being hosted on the node, but CTDB still thinks it's unassigned.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoEventscripts: 10.interface should list configured interfaces
Martin Schwenke [Fri, 16 Nov 2012 09:21:15 +0000 (20:21 +1100)]
Eventscripts: 10.interface should list configured interfaces

The current code lists available interfaces.  If IPs are configured in
some other way than the public addresses file (e.g. ctdb addip) and their
interfaces default to being marked down then, since down interfaces are
not available, these interfaces can never be marked up.

The configured interfaces should be listed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Cherry-pick-from: d8f010355b715e49709836e057a5d0f110919897

Conflicts:
config/events.d/10.interface

11 years agoctdbd: Make the link status of new interfaces more flexible
Martin Schwenke [Fri, 16 Nov 2012 08:43:14 +0000 (19:43 +1100)]
ctdbd: Make the link status of new interfaces more flexible

Neither up nor down is a good default value for the link status of a
new interface.  Up means that IPs can be assigned to interfaces before
the true state is known and they can move away quickly if the interface
is actually down.  Down means that IPs can't be assigned to an interface
for a variable amount of time - until a monitor cycle occurs - and this
can result in imbalanced IPs.

This is a neat compromise.  Before the startup event completes, IPs
can't be assigned to interfaces because all interfaces begin in a down
state.  As soon as the startup event completes, IPs can be allocated
to any interface that has been marked up by the eventscript.  Later,
during normal operation, newly added IPs can be assigned to new
interfaces immediately.  The IPs will still move away if an interface
is noticed to be down in the next monitor cycle, but that is the
exception rather than the rule.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Cherry-pick-from: 9275a69a414482f1053ae14528d5972575b9214e

11 years agotools/ctdb: Do not use function return value as pnn
Amitay Isaacs [Tue, 6 Nov 2012 06:06:54 +0000 (17:06 +1100)]
tools/ctdb: Do not use function return value as pnn

This fixes the wrong code where same variable 'ret' is used to track the pnn
and the return value of a function call.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: 718233c445cd6627ab3962b6565c2655f1f8efd0

11 years agorecoverd: Track the nodes that fail takeover run and set culprit count
Amitay Isaacs [Tue, 23 Oct 2012 05:23:12 +0000 (16:23 +1100)]
recoverd: Track the nodes that fail takeover run and set culprit count

If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:

  "Unable to setup public takeover addresses. Try again later"

And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.

In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-pick-from: a5c6bb1fffb8dc3960af113957a1fd080cc7c245

Conflicts:
include/ctdb_private.h
server/ctdb_takeover.c

11 years agodaemon: Do not ignore timed out monitor events
Amitay Isaacs [Tue, 23 Oct 2012 04:06:33 +0000 (15:06 +1100)]
daemon: Do not ignore timed out monitor events

If an eventscript times out for monitor event, it is considered successful
and the remaining eventscripts are not run. This can make a node prematurely
healthy, cause healthy node to fail over IPs to this node and this node will
not be able to host those IPs. Thus causing loss of access and in case of NAT-GW
configuration, loss of a default route.

Copy-code-from: 6e68797af67bee36f2bad045f94806e7e98f27e9

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoNew Version 1.2.54
Amitay Isaacs [Tue, 30 Oct 2012 01:39:00 +0000 (12:39 +1100)]
New Version 1.2.54

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoscripts: Remove duplicate code from init script to set tunables
Amitay Isaacs [Mon, 3 Sep 2012 02:39:36 +0000 (12:39 +1000)]
scripts: Remove duplicate code from init script to set tunables

The tunable variables defined in CTDB configuration file are currently
set up from init script as well as part of "setup" event in 00.ctdb
eventscript.  Remove the duplication of this code and set tunable
variables only from setup event.  During the "setup" event, it's possible
that ctdb tool commands can timeout if CTDB daemon is not ready.  To guard
against such eventuality, wait till "ctdb ping" command succeeds before
executing any other ctdb tool commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Cherry-picked-from: 632c1b9c1cc2e242376358ce49fd2022b3f27aa2

Conflicts:
config/events.d/00.ctdb

11 years agodaemon: Protect against double free of callback state while shutting down
Amitay Isaacs [Mon, 29 Oct 2012 03:56:10 +0000 (14:56 +1100)]
daemon: Protect against double free of callback state while shutting down

When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it.  This includes
callback state for current_monitor, if the current monitor event has
not yet finished.  As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.

So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoNew version 1.2.53
Amitay Isaacs [Fri, 26 Oct 2012 05:19:35 +0000 (16:19 +1100)]
New version 1.2.53

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoInitscript - add backup of corrupt non-persistent databases
Martin Schwenke [Wed, 28 Mar 2012 03:50:36 +0000 (14:50 +1100)]
Initscript - add backup of corrupt non-persistent databases

Corrupt non-persistent databases never get analysed because ctdbd
zeroes them at startup.

Modify the initscript so that corrupt non-persistent databases are
moved aside to a backup.  If the number of backups for a particular
database exceeds $CTDB_MAX_CORRUPT_DB_BACKUPS (default 10) then the
oldest excess backups are garbage collected.

Abstracts from and cleans up the code for checking persistent
databases.

Logging of related messages is done to syslog or a log file as
specified.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Cherry-picked-from: 00cd75595685dae829758abf1a4cb644af7ed50e

Conflicts:
config/ctdb.init

11 years agoNew version 1.2.52
Martin Schwenke [Fri, 5 Oct 2012 02:05:19 +0000 (12:05 +1000)]
New version 1.2.52

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoutil: ctdb_fork() closes all sockets opened by the main daemon
Martin Schwenke [Tue, 2 Oct 2012 01:51:24 +0000 (11:51 +1000)]
util: ctdb_fork() closes all sockets opened by the main daemon

Do some other housekeeping including stopping tevent.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoRevert "logging: Close unix socket /tmp/ctdb.socket in syslogd process"
Martin Schwenke [Tue, 2 Oct 2012 01:54:00 +0000 (11:54 +1000)]
Revert "logging: Close unix socket /tmp/ctdb.socket in syslogd process"

This reverts commit 450bedccbee3f89aba3b33777a4ae8841c456a65.

This will be fixed in ctdb_fork() for all children.  Won't somebody
PLEASE think of the children?!?

11 years agoNew version 1.2.51
Amitay Isaacs [Tue, 2 Oct 2012 02:45:10 +0000 (12:45 +1000)]
New version 1.2.51

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoLogging: Map TEVENT_DEBUG_FATAL to DEBUG_CRIT
Martin Schwenke [Thu, 27 Sep 2012 23:39:12 +0000 (09:39 +1000)]
Logging: Map TEVENT_DEBUG_FATAL to DEBUG_CRIT

This is currently mapped to DEBUG_EMERG.  CTDB really has no business
logging anything at EMERG level since the whole system is not about to
abort or catch fire.  EMERG causes the message to appear on the
console and on every terminal.  That's a bit overzealous!

There would be very few situations where logs are being filtered at
level below ERROR, so CRIT should certainly suffice.

The trigger for this was curious messages saying "No event for <n>
seconds!" logged in a user's terminal.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: "recovered" event should not fail on NATGW failure
Martin Schwenke [Wed, 26 Sep 2012 04:37:49 +0000 (14:37 +1000)]
Eventscripts: "recovered" event should not fail on NATGW failure

The recovery process has no protection against the "recovered" event
failing, so this can cause a recovery loop.

Instead of failing the "recovered" event, add a "monitor" event and
fail that instead.  In this case the failure semantics are well
defined.

A separate patch should ban nodes if the "recovered" event fails for
an unknown reason.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoNew version 1.2.50
Amitay Isaacs [Wed, 12 Sep 2012 05:02:30 +0000 (15:02 +1000)]
New version 1.2.50

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agocommon: Debug ctdb_addr_to_str() using new function ctdb_external_trace()
Martin Schwenke [Thu, 6 Sep 2012 10:22:38 +0000 (20:22 +1000)]
common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()

We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace.  If we can reproduce it then this might
help us to debug it.

The idea is that you do something like the following in /etc/sysconfig/ctdb:

  export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"

When we hit this error than we call out to gcore to get a core file so
we can do forensics.  This might block CTDB for a few seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoNew version 1.2.49
Martin Schwenke [Tue, 21 Aug 2012 04:35:35 +0000 (14:35 +1000)]
New version 1.2.49

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoInitscript: Kill any existing ctdbd processes if the ping succeeds
Martin Schwenke [Tue, 21 Aug 2012 04:28:37 +0000 (14:28 +1000)]
Initscript: Kill any existing ctdbd processes if the ping succeeds

Initialising a new ctdbd will destroy the Unix domain socket so
existing processes will be useless anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agologging: Close unix socket /tmp/ctdb.socket in syslogd process
Amitay Isaacs [Tue, 14 Aug 2012 05:42:12 +0000 (15:42 +1000)]
logging: Close unix socket /tmp/ctdb.socket in syslogd process

Since the unix socket is opened before syslogd process is forked, syslogd
process also keeps listening to it.  If main ctdbd process dies and has any
child processes that are blocked waiting for locks, these child processes
keep connecting to unix socket and thus syslogd cannot exit.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoNew version 1.2.48
Martin Schwenke [Thu, 9 Aug 2012 09:47:08 +0000 (19:47 +1000)]
New version 1.2.48

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: new functions set_proc() and get_proc().
Martin Schwenke [Tue, 28 Jun 2011 04:54:33 +0000 (14:54 +1000)]
Eventscripts: new functions set_proc() and get_proc().

These provide a thin layer around writing and reading files in /proc.
They can be easily replaced by stubs for unit testing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoNew version 1.2.47
Amitay Isaacs [Thu, 9 Aug 2012 06:57:15 +0000 (16:57 +1000)]
New version 1.2.47

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoEventscripts: Add "reconfigure" pseudo-event for policy routing
Martin Schwenke [Fri, 3 Aug 2012 00:54:30 +0000 (10:54 +1000)]
Eventscripts: Add "reconfigure" pseudo-event for policy routing

This rebuilds all policy routes and can be used if the configuration
changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoNew version 1.2.46
Martin Schwenke [Tue, 24 Jul 2012 01:26:32 +0000 (11:26 +1000)]
New version 1.2.46

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Default route on NAT gateway should have a metric of 10
Martin Schwenke [Fri, 20 Jul 2012 06:43:39 +0000 (16:43 +1000)]
Eventscripts: Default route on NAT gateway should have a metric of 10

At the moment routes from 11.routing can fail to be added because they
conflict with the default route added by 11.natgw.

NAT gateway is meant to be a last resort, so routes from 11.routing
should override it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoNew version 1.2.45
Martin Schwenke [Thu, 12 Jul 2012 04:03:58 +0000 (14:03 +1000)]
New version 1.2.45

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoWhen we find an ip we shouldnt host, just release it
Ronnie Sahlberg [Wed, 20 Jun 2012 05:10:05 +0000 (15:10 +1000)]
When we find an ip we shouldnt host, just release it

Dont call a full blown clusterwide ipreallocation,  just release it locally

11 years agoWhen we release an ip, get the interface name from the kernel
Ronnie Sahlberg [Wed, 20 Jun 2012 00:08:11 +0000 (10:08 +1000)]
When we release an ip, get the interface name from the kernel

instead of using the interface where ctdb thinks the ip is hosted at.
The difference is that this now allows us to handle cases where we want to release an ip   but ctdbd does not know which interface the ip is assigned on.
(user has used 'ip addr add...'  and manually assigned an ip to the wrong interface)

11 years agoAdd new command to find which interface is located on
Ronnie Sahlberg [Wed, 20 Jun 2012 03:32:02 +0000 (13:32 +1000)]
Add new command to find which interface is located on

11 years agoNew version 1.2.44
Ronnie Sahlberg [Fri, 29 Jun 2012 02:31:13 +0000 (12:31 +1000)]
New version 1.2.44

11 years agoeventscripts: 13.per_ip_routing - flock should have a timeout
Martin Schwenke [Thu, 21 Jun 2012 04:18:35 +0000 (14:18 +1000)]
eventscripts: 13.per_ip_routing - flock should have a timeout

... and flock failure should be fatal.

Signed-off-by: Martin Schwenke <martin@meltin.net>