samba.git
10 years agopackaging: Install docs using %doc directive
Amitay Isaacs [Thu, 4 Jul 2013 02:45:32 +0000 (12:45 +1000)]
packaging: Install docs using %doc directive

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6fe584d05543eebd24abd19bab502dc4da04e921)

10 years agopackaging: Remove ctdb_transaction from docdir
Amitay Isaacs [Thu, 4 Jul 2013 01:33:38 +0000 (11:33 +1000)]
packaging: Remove ctdb_transaction from docdir

It's bundled in ctdb-tests package.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7e53fbf92b6dd5211d918ea0e23126b7dfa50c42)

10 years agodoc: Add a disclaimer for the EnableBans tunable
Martin Schwenke [Sun, 30 Jun 2013 07:23:08 +0000 (17:23 +1000)]
doc: Add a disclaimer for the EnableBans tunable

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 145b1966c1b34f1667a175235e1df2741294391c)

10 years agodoc: Add banning bug fixes to NEWS
Martin Schwenke [Sun, 30 Jun 2013 07:22:06 +0000 (17:22 +1000)]
doc: Add banning bug fixes to NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b4c06e8ec8b227c1e6c01444038c3b15b5f9e606)

10 years agoctdbd: Don't ban self if init or shutdown event fails
Amitay Isaacs [Tue, 2 Jul 2013 02:40:37 +0000 (12:40 +1000)]
ctdbd: Don't ban self if init or shutdown event fails

There is no point in banning the node if init or shutdown event times
out since it's going to quit anyway.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ef1c4e99ca66e7a990bc557f34abb624c315e6ba)

10 years agodoc: The second half of monitoring is only for recovery master
Amitay Isaacs [Thu, 27 Jun 2013 07:46:43 +0000 (17:46 +1000)]
doc: The second half of monitoring is only for recovery master

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit fcd5e1f04c5fe6c98399429b8f0918b8779acba6)

10 years agorecoverd: when the recmaster is banned, use that information when forcing an election
Michael Adam [Wed, 26 Jun 2013 07:23:22 +0000 (09:23 +0200)]
recoverd: when the recmaster is banned, use that information when forcing an election

When we trigger an election because the recmaster considers itself inactive,
update our local nodemap with the recmaster's flags before calling
force_election(). This way, we don't send the inactive node freeze commands
(e.g.) that may fail and then lead to ourselves getting banned.

The theory is that this should help avoiding banning loops.

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 932360992b08a5483d90c0590218ba0fd756119e)

10 years agorecoverd: fix a comment typo
Michael Adam [Wed, 26 Jun 2013 05:11:51 +0000 (07:11 +0200)]
recoverd: fix a comment typo

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 741944f118e98f178b860194eecb215180949d18)

10 years agorecoverd: fix a comment in main_loop
Michael Adam [Fri, 21 Jun 2013 15:57:37 +0000 (17:57 +0200)]
recoverd: fix a comment in main_loop

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit ac06c46e4a80c635f6094b5ac6f0bf3e3a02db95)

10 years agorecoverd: eliminate some trailing spaces from ctdb_election_win()
Michael Adam [Fri, 21 Jun 2013 12:06:22 +0000 (14:06 +0200)]
recoverd: eliminate some trailing spaces from ctdb_election_win()

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit df30c0a05ed908fc2a997c56ff5484736b23b70f)

10 years agorecoverd: Don't continue if the current node gets banned
Martin Schwenke [Fri, 28 Jun 2013 06:31:07 +0000 (16:31 +1000)]
recoverd: Don't continue if the current node gets banned

Can not continue with recovery or monitoring cluster.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a)

10 years agorecoverd: Refactor code to ban misbehaving nodes
Amitay Isaacs [Fri, 28 Jun 2013 04:31:02 +0000 (14:31 +1000)]
recoverd: Refactor code to ban misbehaving nodes

Since we have nodemap information, there is no need to hardcode the
limit of 20.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit aea12dce83ef385e9fb3bc03ac7ace0874a0e3fe)

10 years agorecoverd: Move code to ban other nodes after we get local node flags
Amitay Isaacs [Thu, 27 Jun 2013 06:01:16 +0000 (16:01 +1000)]
recoverd: Move code to ban other nodes after we get local node flags

If a node gets banned first, then it should not ban other nodes.

This code was moved up in main_loop to avoid waiting for nodemap
from other nodes (commit 83b0261f2cb453195b86f547d360400103a8b795).

To prevent a banned node from banning other nodes, we need to first get
nodemap information from local node, so trying to ban other nodes can
fail if we are already banned.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ae1693905036ecdbc4594fde1f12500faae4a554)

10 years agorecoverd: Delay the initial election if node is started in stopped state
Amitay Isaacs [Thu, 27 Jun 2013 05:44:27 +0000 (15:44 +1000)]
recoverd: Delay the initial election if node is started in stopped state

Since there is an early exit if a node is stopped or banned, we can wait till
the node becomes active to start initial election.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 593a17678fbd3109e118154b034d43b852659518)

10 years agorecoverd: Update capabilities only if the current node is active
Amitay Isaacs [Thu, 27 Jun 2013 05:33:49 +0000 (15:33 +1000)]
recoverd: Update capabilities only if the current node is active

Since we do an early return if a node is stopped or banned, move update
capabilities code below the early return and just before we check the
capabilities of current recovery master.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 93bcb6617e1024f810533e12390a572f51703ca0)

10 years agorecoverd: No need to check if node is recovery master when inactive
Amitay Isaacs [Thu, 27 Jun 2013 05:46:04 +0000 (15:46 +1000)]
recoverd: No need to check if node is recovery master when inactive

If a node is stopped or banned, it will cause early return from the
main_loop, so this check is redundent.  The election will called by an
active node.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 815ddd3341b7e9db39e05a3a3fcd9a1420f053bc)

10 years agorecoverd: Always do an early exit from main_loop if node is stopped or banned
Amitay Isaacs [Thu, 27 Jun 2013 05:39:15 +0000 (15:39 +1000)]
recoverd: Always do an early exit from main_loop if node is stopped or banned

A stopped or banned node cannot do anything useful.  So do not participate
in any cluster activity and do not cause any unnecessary network traffic.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2396981c4bcf30530aeb7f4395093cc202105b50)

10 years agorecoverd: Do not set banning credits on a node if current node is inactive
Amitay Isaacs [Fri, 28 Jun 2013 04:10:47 +0000 (14:10 +1000)]
recoverd: Do not set banning credits on a node if current node is inactive

If the current node is banned or stopped, then it should not assign banning
credits to other nodes since the current node will not have up-to-date flags
of other nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 38304f88e0c634e97d4687c25adef975f71537b8)

10 years agobanning: Do not come out of ban if databases are not frozen
Amitay Isaacs [Mon, 1 Jul 2013 07:40:36 +0000 (17:40 +1000)]
banning: Do not come out of ban if databases are not frozen

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a60f228f8380f222f838eb619d2ab55f96f11ac2)

10 years agobanning: No need to check if banned pnn is for local node
Amitay Isaacs [Mon, 24 Jun 2013 04:33:32 +0000 (14:33 +1000)]
banning: No need to check if banned pnn is for local node

If the banned pnn is not the local node, the function returns early.
So no need for additional check.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 297d93cecc3c0655e72ecac38508e113bdbeab9c)

10 years agobanning: Make ctdb_local_node_got_banned() a void function
Amitay Isaacs [Fri, 28 Jun 2013 04:04:18 +0000 (14:04 +1000)]
banning: Make ctdb_local_node_got_banned() a void function

When this function is called, we are already committed to banning
and there is no point in failing this function.  In case, freezing of
databases fails, it will be fixed from recovery daemon.

(This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)

10 years agorecoverd: Also check if current node is in recovery when it is banned
Amitay Isaacs [Fri, 28 Jun 2013 04:02:44 +0000 (14:02 +1000)]
recoverd: Also check if current node is in recovery when it is banned

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8)

10 years agorecoverd: Set node_flags information as soon as we get nodemap
Amitay Isaacs [Fri, 28 Jun 2013 04:09:35 +0000 (14:09 +1000)]
recoverd: Set node_flags information as soon as we get nodemap

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8d622660a14c929e365d306147b378ea6ab92175)

10 years agorecovered: Remove old comment as the code corresponding to that has gone away
Amitay Isaacs [Wed, 26 Jun 2013 06:02:23 +0000 (16:02 +1000)]
recovered: Remove old comment as the code corresponding to that has gone away

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 34af2cdf686d5d77854cbaa7bbcd8f878e9171c7)

10 years agobanning: Log ban state changes for other nodes at higher debug level
Amitay Isaacs [Mon, 24 Jun 2013 04:31:50 +0000 (14:31 +1000)]
banning: Log ban state changes for other nodes at higher debug level

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c6f8407648abb37f2ed781afa5171dad8c9f59e9)

10 years agofreeze: Make ctdb_start_freeze() a void function
Amitay Isaacs [Mon, 1 Jul 2013 06:28:04 +0000 (16:28 +1000)]
freeze: Make ctdb_start_freeze() a void function

If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46efe7a886f8c4c56f19536adc98a73c22db906a)

10 years agofreeze: If priority is invalid here, it's time to abort
Amitay Isaacs [Mon, 1 Jul 2013 06:21:00 +0000 (16:21 +1000)]
freeze: If priority is invalid here, it's time to abort

ctdb_start_freeze() is called from ctdb_control_freeze() which fixes the
priority if it's 0 and return error if it's invalid.  Other callers of
ctdb_start_freeze() are internal to CTDB.  So if priority is invalid in
ctdb_start_freeze(), definitely something is seriously wrong.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 87716e8f504d659515d3dbcf93badbf106873bc8)

10 years agofreeze: Log message from ctdb_start_freeze() and ctdb_control_freeze()
Amitay Isaacs [Mon, 1 Jul 2013 03:26:33 +0000 (13:26 +1000)]
freeze: Log message from ctdb_start_freeze() and ctdb_control_freeze()

This ensures that whenever databases are frozen either via sending
control or by calling ctdb_start_freeze(), the action is logged.
Since ctdb_control_freeze() calls ctdb_start_freeze(), move logging of
message in early return condition if databases are already frozen.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 478e24bceda3fedfba54ccb48faa115df726b819)

10 years agorecoverd: Print banning message only after verifying pnn
Amitay Isaacs [Mon, 24 Jun 2013 04:18:58 +0000 (14:18 +1000)]
recoverd: Print banning message only after verifying pnn

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4be8dff3a4451192f838497b4747273685959bed)

10 years agorecoverd: When updating flags on nodes, send updated flags and not old flags
Amitay Isaacs [Wed, 26 Jun 2013 05:22:46 +0000 (15:22 +1000)]
recoverd: When updating flags on nodes, send updated flags and not old flags

This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa.
Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control
was sent to the local daemon which in turn informed the recovery daemon.
And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc)

10 years agotools/ctdb: Add "force" option to "recover" command
Martin Schwenke [Wed, 26 Jun 2013 04:34:47 +0000 (14:34 +1000)]
tools/ctdb: Add "force" option to "recover" command

At the moment there is no easy way to force a recovery when attempting
to reproduce certain classes of bugs.  This option is added without
documentation because it is dangerous until the bugs are fixed!  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4f87925a287f612a6ab3b5da1a387a31c7bea28f)

10 years agoclient: Exit with non-zero status when unix socket is closed
Amitay Isaacs [Mon, 24 Jun 2013 07:37:15 +0000 (17:37 +1000)]
client: Exit with non-zero status when unix socket is closed

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 733fc909425860f6a02c205c2d8f34a731853922)

10 years agodoc: Fix ctdb ping entry in manpage
Martin Schwenke [Fri, 21 Jun 2013 04:49:20 +0000 (14:49 +1000)]
doc: Fix ctdb ping entry in manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit abeb65ef02d018a7c14d4f8cea71e15c6cf9e357)

10 years agodoc: Fix documentation for NoIPTakeover in ctdbd manpage
Martin Schwenke [Fri, 21 Jun 2013 04:47:20 +0000 (14:47 +1000)]
doc: Fix documentation for NoIPTakeover in ctdbd manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5d0215be5aefe492258a92c7bff2d41960379580)

10 years agodoc: Update notification script section in ctdbd manpage
Martin Schwenke [Fri, 21 Jun 2013 04:33:12 +0000 (14:33 +1000)]
doc: Update notification script section in ctdbd manpage

The example notification script is now much more useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4ba7c73eeab98296c9168e0b0fed1f6bb9f32733)

10 years agodoc: Add nodestatus command to the ctdb manpage
Martin Schwenke [Fri, 21 Jun 2013 04:32:50 +0000 (14:32 +1000)]
doc: Add nodestatus command to the ctdb manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4369c8e6ead9062ef7855ada375df74262acf925)

10 years agodoc: Update NEWS
Martin Schwenke [Fri, 21 Jun 2013 00:52:05 +0000 (10:52 +1000)]
doc: Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cd6227aa38d3bb4e5043faeffe436004e27b6d06)

10 years agotests: Integration tests use "ctdb nodestatus" for healthy cluster check
Martin Schwenke [Thu, 20 Jun 2013 06:43:10 +0000 (16:43 +1000)]
tests: Integration tests use "ctdb nodestatus" for healthy cluster check

Also check that we're not in recovery mode.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b7aaa28b3a6a2de923417f3d143f8d516447711e)

10 years agotests: Integration test infrastructure should do only a single recovery
Martin Schwenke [Thu, 20 Jun 2013 06:42:30 +0000 (16:42 +1000)]
tests: Integration test infrastructure should do only a single recovery

No need for 2 recoveries after a restart.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b953524185632d7f96a76d8f3bbed7ac1d143d40)

10 years agoctdbd: Fix panic on overlapping shutdowns
Martin Schwenke [Sat, 22 Jun 2013 05:44:28 +0000 (15:44 +1000)]
ctdbd: Fix panic on overlapping shutdowns

The runstate can't be set to SHUTDOWN twice, so the current naive code
causes a panic on the 2nd shutdown.  This regression was introduced in
commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458)

10 years agoctdbd: Refactor shutdown sequence
Martin Schwenke [Wed, 19 Jun 2013 00:58:14 +0000 (10:58 +1000)]
ctdbd: Refactor shutdown sequence

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b32fd04bfbf33062d45365b37a7247e272a76ceb)

10 years agoeventscripts: "setup" event doesn't need to wait for SETUP runstate
Martin Schwenke [Sun, 16 Jun 2013 11:01:43 +0000 (21:01 +1000)]
eventscripts: "setup" event doesn't need to wait for SETUP runstate

The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e)

10 years agotests/eventscripts: New tests for 00.ctdb "init" event
Martin Schwenke [Tue, 18 Jun 2013 05:07:26 +0000 (15:07 +1000)]
tests/eventscripts: New tests for 00.ctdb "init" event

These test dropping of IPs and TDB checking.

New stubs for date, tdbdump, tdbtool.

Enhance ip stub to handle "ip addr show to ..."

Tweak some infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit aabf0bf41cb8ec344f06b69492fb6c2a27f9e900)

10 years agoeventscripts: 13.per_ip_routing should not try hard to find public_addresses
Martin Schwenke [Tue, 18 Jun 2013 05:02:05 +0000 (15:02 +1000)]
eventscripts: 13.per_ip_routing should not try hard to find public_addresses

This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!

The test code has been fixed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332)

10 years agotests/eventscripts: setup_ctdb() should always set $CTDB_PUBLIC_ADDRESSES
Martin Schwenke [Tue, 18 Jun 2013 05:05:39 +0000 (15:05 +1000)]
tests/eventscripts: setup_ctdb() should always set $CTDB_PUBLIC_ADDRESSES

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c3e7a6e10d486ba0dbafdf110db540675b2317bc)

10 years agologging: Notify parent when logging daemon is up
Martin Schwenke [Mon, 17 Jun 2013 05:14:53 +0000 (15:14 +1000)]
logging: Notify parent when logging daemon is up

Messages are lost until it is really up because syslogd_is_started is
set too early.  Adding a pipe to do the notification allows the parent
to wait and only set syslogd_is_started when the logging daemon is
actually ready.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f3dd2eec200d6eeada2ea19cd7e76f1edfad6167)

10 years agoscripts: Move TDB checking from initscript to "init" event
Martin Schwenke [Mon, 17 Jun 2013 00:14:24 +0000 (10:14 +1000)]
scripts: Move TDB checking from initscript to "init" event

It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0)

10 years agoscripts: Move dropping of all IPs from initscript to "init" event
Martin Schwenke [Sun, 16 Jun 2013 10:29:33 +0000 (20:29 +1000)]
scripts: Move dropping of all IPs from initscript to "init" event

It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0b77cceb49a30a181063adc7868d42d2851318e8)

10 years agoscripts: drop_ip() should use delete_ip_from_iface()
Martin Schwenke [Tue, 18 Jun 2013 04:53:17 +0000 (14:53 +1000)]
scripts: drop_ip() should use delete_ip_from_iface()

Otherwise secondary addresses that aren't owned by CTDB could be
dropped.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)

10 years agoscripts: drop_all_public_ips() now prints messages to stdout, not log
Martin Schwenke [Sun, 16 Jun 2013 10:24:10 +0000 (20:24 +1000)]
scripts: drop_all_public_ips() now prints messages to stdout, not log

Change all callers to maintain current behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)

10 years agoctdbd: "init" event should run earlier in daemon initialisation
Martin Schwenke [Sun, 16 Jun 2013 09:49:02 +0000 (19:49 +1000)]
ctdbd: "init" event should run earlier in daemon initialisation

It should run before:

* the transport is started;
* databases are attached; and
* processing configuration files (e.g. nodes, public_addresses).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0a0c8543f167e11b75a622513367b083e42cbd3f)

10 years agotools/ctdb: Do not exit prematurely on control timeout if retrying in a loop
Amitay Isaacs [Tue, 18 Jun 2013 04:27:34 +0000 (14:27 +1000)]
tools/ctdb: Do not exit prematurely on control timeout if retrying in a loop

This avoids premature exits from "ctdb stop" and "ctdb continue" due to
intermittent control (e.g. getpnn, getnodemap) timeouts.

This needs a proper fix to distinguish between timeout and failure
conditions and take appropriate action.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c48583fd238496a81ddc46a21892f0b49559036a)

10 years agopackaging: Update the minimum required library versions
Amitay Isaacs [Thu, 13 Jun 2013 02:55:29 +0000 (12:55 +1000)]
packaging: Update the minimum required library versions

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 5f8547b1531bba4950b3d873a997585c3a16d31e)

10 years agobuild: Enable VERBOSE option to display build command line
Amitay Isaacs [Fri, 7 Jun 2013 01:24:17 +0000 (11:24 +1000)]
build: Enable VERBOSE option to display build command line

make V=1 or make VERBOSE=1 will display build commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 02c63c591cc273122b3a547bb301b92f0e4bd217)

10 years agobuild: Fix tdb.h path to enable building with system TDB library
Mathieu Parent [Thu, 6 Jun 2013 19:58:02 +0000 (21:58 +0200)]
build: Fix tdb.h path to enable building with system TDB library

(This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)

10 years agolibctdb: Include config.h in libctdb/ctdb.c
Mathieu Parent [Thu, 6 Jun 2013 19:43:08 +0000 (21:43 +0200)]
libctdb: Include config.h in libctdb/ctdb.c

Bug-Debian: http://bugs.debian.org/703551

(This used to be ctdb commit 14a79c0f3967c88f8ffc8200d122f6c5ffdb63a8)

10 years agoctdbd: Make sure we don't kill init process by mistake
Amitay Isaacs [Thu, 6 Jun 2013 06:42:02 +0000 (16:42 +1000)]
ctdbd: Make sure we don't kill init process by mistake

If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1).  This does not happen on RHEL, but does on AIX.

Reported-by: Chris Cowan <cc@us.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit edb2a3556d03e248b42f63dd2c62382b723bc98f)

10 years agotests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS
Martin Schwenke [Thu, 13 Jun 2013 06:32:06 +0000 (16:32 +1000)]
tests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS

Includes minor test infrastructure updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cd4358b01c6c3d413b431f5760029d2b163b9c03)

10 years agotests/eventscripts: Fix -X tracing in iterate_test()
Martin Schwenke [Thu, 13 Jun 2013 06:30:45 +0000 (16:30 +1000)]
tests/eventscripts: Fix -X tracing in iterate_test()

... and delete a bogus comment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0e2b5a8f89440a53f996482ac0c98b31a4f2cad3)

10 years agotests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT
Martin Schwenke [Thu, 13 Jun 2013 05:50:44 +0000 (15:50 +1000)]
tests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT

Includes minor test infrastructure updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ce2ef2be8aa22c0baf868daac8d4cf27246baa14)

10 years agoeventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
Martin Schwenke [Thu, 13 Jun 2013 01:56:25 +0000 (11:56 +1000)]
eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS

If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped.  This can be useful for trying to determine
why nfsd is stuck.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)

10 years agoeventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Martin Schwenke [Thu, 13 Jun 2013 00:17:20 +0000 (10:17 +1000)]
eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT

Consider the following example:

1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
   underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.

Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)

10 years agorecoverd: Log node that causes takoever run to fail
Martin Schwenke [Fri, 31 May 2013 04:55:07 +0000 (14:55 +1000)]
recoverd: Log node that causes takoever run to fail

Extend takeover_fail_callback() to just log (and not do any ban
processing) when the callback data is NULL.  Always call
ctdb_takeover_run() with the callback so that useful errors are always
logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea)

11 years agodoc: Add release notes for 2.2
Martin Schwenke [Fri, 24 May 2013 05:38:54 +0000 (15:38 +1000)]
doc: Add release notes for 2.2

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ac0892d3a57adb0587a37de0f94fa686bed8970f)

11 years agobuild: Fix extra whitespaces
Amitay Isaacs [Wed, 29 May 2013 05:14:42 +0000 (15:14 +1000)]
build: Fix extra whitespaces

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 78cff9d54f241fb6a2943e50346f9c2ad9decc78)

11 years agotevent: Sync to tevent 0.9.18 from upstream
Amitay Isaacs [Wed, 29 May 2013 04:12:14 +0000 (14:12 +1000)]
tevent: Sync to tevent 0.9.18 from upstream

(This used to be ctdb commit 82d61f77c01df0fbb42743593937b175ce22a445)

11 years agoreplace: Sync to latest replace from upstream
Amitay Isaacs [Wed, 29 May 2013 04:44:03 +0000 (14:44 +1000)]
replace: Sync to latest replace from upstream

The latest commits affecting lib/replace remove autoconf build from
Samba tree.  So using following commit as a sync point.

  commit 9ddfd7d8784e6f546628f48990b69ee2850be52d
  Author: Andrew Bartlett <abartlet@samba.org>
  Date:   Wed May 22 17:23:30 2013 +1000

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 506b27c944b4031e8a325816bd12abddd442a0bb)

11 years agotdb: Sync to tdb 1.2.11 from upstream
Amitay Isaacs [Wed, 29 May 2013 04:05:50 +0000 (14:05 +1000)]
tdb: Sync to tdb 1.2.11 from upstream

(This used to be ctdb commit bb3a32ec055432afc7225c9fd7504fb187694bda)

11 years agotalloc: Sync to talloc 2.0.8 from upstream
Amitay Isaacs [Wed, 29 May 2013 03:53:38 +0000 (13:53 +1000)]
talloc: Sync to talloc 2.0.8 from upstream

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 3bffca8c17e441364525df115ee2ac16b5969e24)

11 years agoctdbd: Log node state transitions at higher debug level
Amitay Isaacs [Wed, 29 May 2013 02:11:49 +0000 (12:11 +1000)]
ctdbd: Log node state transitions at higher debug level

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit db31dc48bd3135e9242af08bb79b67a17a2b1668)

11 years agogit: Ignore generated ctdb.spec file
Amitay Isaacs [Wed, 29 May 2013 04:17:59 +0000 (14:17 +1000)]
git: Ignore generated ctdb.spec file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ca7ba26362eabfbcc329c66919d9c4da79c3b799)

11 years agogit: Ignore ctdb_version.h file
Amitay Isaacs [Wed, 29 May 2013 04:17:00 +0000 (14:17 +1000)]
git: Ignore ctdb_version.h file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 641f539ffc7dd9542e669a3ec20c004f8bbcbf1e)

11 years agobuild: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rules
Amitay Isaacs [Fri, 24 May 2013 05:25:52 +0000 (15:25 +1000)]
build: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rules

This fixes the build on AIX where libreplace is required to build
ctdb_lock_helper, ctdb_fetch_lock_once, ctdb_fetch_readonly_once.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit fa757b49374e44c2380d4457e9b0eb3582981fac)

11 years agobuild: Support for building on AIX xlc compiler
Amitay Isaacs [Fri, 24 May 2013 05:14:20 +0000 (15:14 +1000)]
build: Support for building on AIX xlc compiler

xlc does not support -fPIC, -Wno-format-zero-length

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2cf95741fdab2ee5f724950a0b1ef257d6aeade7)

11 years agotests: Do not use err() to support AIX
Amitay Isaacs [Fri, 24 May 2013 04:44:45 +0000 (23:44 -0500)]
tests: Do not use err() to support AIX

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1734562a7b3512853b9e0232880c42d50c1c2e4c)

11 years agotests: Include system/time.h to support building on AIX
Amitay Isaacs [Fri, 24 May 2013 04:52:09 +0000 (14:52 +1000)]
tests: Include system/time.h to support building on AIX

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 0320bb4f8ca8171812ec7f41556aed847c74bfb4)

11 years agolibctdb: Do not include sys/time.h to support build on AIX
Amitay Isaacs [Fri, 24 May 2013 04:51:46 +0000 (14:51 +1000)]
libctdb: Do not include sys/time.h to support build on AIX

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2c19fa78ce0b25c3615b23664df32233bdbdea42)

11 years agoutil: Do not stop build if backtracing is not supported
Amitay Isaacs [Fri, 24 May 2013 04:42:23 +0000 (23:42 -0500)]
util: Do not stop build if backtracing is not supported

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit b091f09ea01482823bd850d1d4e2329e0a19c959)

11 years agoeventscripts: Fix statd-callout update handling
Martin Schwenke [Tue, 28 May 2013 02:01:57 +0000 (12:01 +1000)]
eventscripts: Fix statd-callout update handling

60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run.  This stops the statd-callout updates from ever being called.

Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file.  Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>
(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)

11 years agotests/integration: Improve debug output for unhealthy cluster after restart
Martin Schwenke [Tue, 28 May 2013 01:26:17 +0000 (11:26 +1000)]
tests/integration: Improve debug output for unhealthy cluster after restart

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 25a6fd784cde96f3d20a79f70b5589b5c4aca675)

11 years agotests/scripts: Delete unused $rows and $ww variables from run_tests
Martin Schwenke [Mon, 27 May 2013 05:16:28 +0000 (15:16 +1000)]
tests/scripts: Delete unused $rows and $ww variables from run_tests

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 80b3cf2c652c6098390cdd0dbb3edc648f7df487)

11 years agopackaging: Create separate package for pcp pmda
Martin Schwenke [Tue, 28 May 2013 04:19:32 +0000 (14:19 +1000)]
packaging: Create separate package for pcp pmda

To build ctdb-pcp-pmda package, run packaging/RPM/makerpms.sh script with
"--with pmda" option.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 85e11b9b13b3add88c1b8957be51793cc1db4f2d)

11 years agobuild: Separate autoconf macros for pmda
Martin Schwenke [Tue, 28 May 2013 04:16:02 +0000 (14:16 +1000)]
build: Separate autoconf macros for pmda

The pmda stuff is no longer built by default even if the headers are
available.  To build, run "configure --enable-pmda".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 194f7a0dec26d693a5f3e6734b1c82f61f8e4d19)

11 years agobuild: Fix install paths for pcp pmda
Martin Schwenke [Tue, 28 May 2013 04:16:25 +0000 (14:16 +1000)]
build: Fix install paths for pcp pmda

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 11af486754bb04899e3dc544157bf70530e66cd1)

11 years agopackaging: makerpms.sh can take multiple arguments for rpmbuild
Martin Schwenke [Mon, 27 May 2013 04:43:03 +0000 (14:43 +1000)]
packaging: makerpms.sh can take multiple arguments for rpmbuild

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f2ef3510407fbad29908195c58e4160d5a81e8a4)

11 years agoeventscripts: Stop NAT gateway's delete_all() from polluting the log
Martin Schwenke [Mon, 27 May 2013 02:56:41 +0000 (12:56 +1000)]
eventscripts: Stop NAT gateway's delete_all() from polluting the log

Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:

  ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1

Since this usually fails it is better to mute the error than to have
it pollute the log.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)

11 years agorecoverd: Backward compatibility for nodes without IPREALLOCATED control
Martin Schwenke [Mon, 27 May 2013 01:29:42 +0000 (11:29 +1000)]
recoverd: Backward compatibility for nodes without IPREALLOCATED control

Consider the case of upgrading a cluster node by node, where some
nodes are still running older versions of CTDB without the
IPREALLOCATED control.  If a "new" node takes over as recovery master
and a failover occurs, then it will attempt to send IPREALLOCATED
controls to all nodes.  The "old" nodes will fail in a fairly
nondescript way (result == -1).

To try to handle this situation, fall back to the EVENTSCRIPT control
to handle "ipreallocated".  Only do this on the failed nodes.
However, do not do this on nodes that timed out (they've probably
implemented the control and we should call the regular fail_callback
to get those nodes banned) or for stopped nodes (since they can't
actually run the "ipreallocated" event via the EVENTSCRIPT control).

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2654853ce9b7c18c5874b080bc94d3118078a5d)

11 years agoscripts: Provide mktemp function for platforms without mktemp command
Martin Schwenke [Sat, 25 May 2013 09:57:24 +0000 (19:57 +1000)]
scripts: Provide mktemp function for platforms without mktemp command

This is needed for AIX and possibly others.

Also provide a cheaper mktemp function is needed in the run_tests
script.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)

11 years agotests: Fix integration tests to use real private IPs
Martin Schwenke [Sat, 25 May 2013 09:08:49 +0000 (19:08 +1000)]
tests: Fix integration tests to use real private IPs

192.0.2.x was a typo.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c9e36f596c63c9af7f80d7cb8d7a5c6dcca4860a)

11 years agopmda: handle new ctdb_statistics format
David Disseldorp [Fri, 24 May 2013 14:11:12 +0000 (16:11 +0200)]
pmda: handle new ctdb_statistics format

The ctdb_statistics structure was recently changed. Update the PMDA to
dereference the new structure member names.

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit e5a5ab53173d9aa4190ddf68c4ae316d4473eb56)

11 years agotests/takeover: New test with 900 IPs
Martin Schwenke [Fri, 5 Apr 2013 09:47:47 +0000 (20:47 +1100)]
tests/takeover: New test with 900 IPs

(This used to be ctdb commit 75a620c516e384f042b5d675183b3a1b48fd6115)

11 years agotests/takeover: Takeover tests can use up to 1024 and checks limits
Martin Schwenke [Fri, 5 Apr 2013 09:45:08 +0000 (20:45 +1100)]
tests/takeover: Takeover tests can use up to 1024 and checks limits

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cfd1371d3a1f78a0ed86485d83bd4d311727c3d4)

11 years agotests/takeover: LCP2 tests for weird, unbalanced corner-cases
Martin Schwenke [Mon, 8 Apr 2013 04:37:44 +0000 (14:37 +1000)]
tests/takeover: LCP2 tests for weird, unbalanced corner-cases

2 tests to show a bad result and a 3rd test for the fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ef35c8889d90220929e48e66eb62da9ea2025ede)

11 years agotests/takeover: Allow takeover runs with differing IP allocations per node
Martin Schwenke [Mon, 8 Apr 2013 04:37:08 +0000 (14:37 +1000)]
tests/takeover: Allow takeover runs with differing IP allocations per node

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 954ae6f84cb06a8dcbc12456d4752280072be5bf)

11 years agovacuum: Reduce the priority of non-critical error
Amitay Isaacs [Fri, 24 May 2013 08:07:39 +0000 (18:07 +1000)]
vacuum: Reduce the priority of non-critical error

Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain.  We will try again to store this record.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)

11 years agoctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.
Michael Adam [Fri, 17 May 2013 09:05:44 +0000 (11:05 +0200)]
ctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit b697625b184227dad1be31a41b7a3fd9bd312e29)

11 years agoctdbd: remove a nonempty blank line
Michael Adam [Fri, 17 May 2013 09:01:31 +0000 (11:01 +0200)]
ctdbd: remove a nonempty blank line

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit d9e24782a90d9ce29c0e6584b75d2b186142174d)

11 years agoctdbd: update comment describing ctdb_call_send_redirect()
Michael Adam [Fri, 17 May 2013 09:00:32 +0000 (11:00 +0200)]
ctdbd: update comment describing ctdb_call_send_redirect()

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 9a21d417c51fb9cad8f2e87e00ca54d379aef860)

11 years agotests/takeover: New tests to check runstate handling
Martin Schwenke [Mon, 6 May 2013 10:31:08 +0000 (20:31 +1000)]
tests/takeover: New tests to check runstate handling

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c57430998a3bdedc8a904eb3a9cdfde1421aff50)

11 years agorecoverd: Nodes can only takeover IPs if they are in runstate RUNNING
Martin Schwenke [Mon, 6 May 2013 05:36:29 +0000 (15:36 +1000)]
recoverd: Nodes can only takeover IPs if they are in runstate RUNNING

Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined.  Both of
these events can (re)start services.

This stops IPs being hosted before the "startup" event has completed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)