git.samba.org - ctdb.git/log

daemon: Fix the usage for lock helper

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Sep 25 17:16:31 CEST 2014 on sn-devel-104

(Imported from commit 0f92de8463b71a2d7e9acdd27454be7859713436)

recoverd: If obtaining recovery lock fails, try again

When ctdb daemon starts up, it considers itself the recovery master
and tries to do first recovery. However, it's possible that there is
already a recovery master and the current node has not yet heard from it.
So do not ban ourselves immediately if ctdb_recovery_lock() fails when
doing first recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 57310f80c9b8146a0978d912f73b0a64fde7697e)

scripts: Fix the regular expresssion for parsing /proc/locks

The major and minor device numbers are hexadecimal not decimal.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Sep 25 07:19:59 CEST 2014 on sn-devel-104

(Imported from commit f1e281cd47d9ebd79e09294606b8fa411ec0fbb4)

locking: Reset ttimer before doing an early return

When timer expires, timeout handler routine sets lock_ctx->ttimer
to a newly created timer event. However, when a node is INACTIVE,
timeout handler returns early with lock_ctx->ttimer set to the previous
timer event. This timer event gets freed when the callback returns and
lock_ctx->ttimer remains set to already freed timer event.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit c64369cba2e5a975d87d518737abbf04c9871a26)

doc: Update NEWS

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

scripts: Do not export variables if they are not set

Variables that are not set but exported, may return an empty string
for getenv(). Tested on freebsd.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Sep 17 09:55:47 CEST 2014 on sn-devel-104

(Imported from commit 22257dd4b6d226ee956ede5a847ce0bcb99333be)

scripts: Fix a typo

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 8509bffdebb7884b765904f8112ff83056511a30)

util: Log an error if there is no way to set scheduler

Although configure should catch this, logging a run-time error is
better than being mystified when ctdbd silently exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fb9c49f2ce0838baa5f94f4ca03d1c92cb58b306)

doc: Add reference to new magepage ctdb-statistics

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Sep 12 11:13:56 CEST 2014 on sn-devel-104

(Imported from commit d744eb03c5236284cf0141c1a2f687263cbd8414)

doc: Add ctdb-statistics manual page

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit efd34bb274a5ed015d7fe9374718671e0d7f9cc6)

daemon: Decrement pending calls statistics when calls are deferred

Deferred calls should not be treated as pending calls since they are
re-processed from the beginning.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit f5f11e1a05d4d75a7662d6c413a14c4cd18f8ed9)

tests: Do not expect real-time priority when running local daemons

Local daemons are started mainly for testing and usually not as root.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 3c1bae12217ead74863a7cdd9b8a338aef80adb1)

daemon: Make sure ctdb runs with real-time priority

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit d410b20601cccd8b67d48c42a6d689cd65e94f61)

locking: Fork lock helper with vfork_with_logging()

Otherwise errors printed by the lock helper get lost.

lock_helper_args() no longer adds the program name to the list of
arguments, since vfork_with_logging() does that. Update the lock
helper to handle the extra log_fd parameter passed by
vfork_with_logging() and send stdout/stderr there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 7ae7a9c46301e4fed870516c448a79bb7a9ac53a)

locking: Add argc parameter to lock_helper_args()

To make this sane, also add an argv parameter and change the return
type to bool. Anticipating a subsequent change, make the type of argv
match what is needed by vfork_with_logging() and cast it when passing
to execv(). This also means changing the type of the name member of
struct db_namelist.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 2e17b0ecddffb8590c4e8b9afaf1767ef7e8f89c)

locking: Set real-time priority for lock helpers

To avoid lock helper starvation when userspace robust mutexes are
enabled.

Commit 6f072f85a138f595494dbec137bcf23d1e666acc removed reset_scheduler(),
to avoid resetting scheduler priority. However, that is not sufficient
because of commit 1be8564e553ce044426dbe7b3987edf514832940, which sets
SCHED_RESET_ON_FORK flag. With SCHED_RESET_ON_FORK, all CTDB child
processes will automatically have normal scheduling priority.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Sep 11 11:31:10 CEST 2014 on sn-devel-104

(Imported from commit 4e5a6b154e1549e959c5de4b58432e33c0d57b55)

daemon: Increment pending calls statistics correctly

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit e6127a9eceb215e421ee56c09032bb1e81c8131e)

call: Drop all deferred requests from older generation

Deferring packets has a nasty interaction with recovery. All deferred
packets must be dropped when recovery happens, since those packets are
tracked as pending requests and will be re-sent with new generation.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Sep 5 09:30:50 CEST 2014 on sn-devel-104

(Imported from commit 2c57cc9597cb9cfe5ab3a458df74d6b5cda45465)

locking: Do not reset real-time priority for lock helpers

When using TDB robust mutexes, the kernel wakes waiting processes one
by one, in the priority list order. To ensure that ctdb lock helper
processes do not starve, lock helper processes need to run at a higher
priority than smbd.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 6f072f85a138f595494dbec137bcf23d1e666acc)

daemon: Defer all calls when processing dmaster packets

When CTDB receives DMASTER_REQUEST or DMASTER_REPLY packet, the specified
record needs to be updated as soon as possible to avoid inconsistent
dmaster information between nodes. During this time, queue up all calls
for that record and process them only after dmaster request/reply has
been processed.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit ef59f2e6bbd502f7cb58ad3a74a6448ccd1ebe59)

daemon: Remove duplicate code with refactored function

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit deb7bb89b3844f209ef73cc5707fcb4673bf08d7)

common: Refactor code to convert TDB_DATA key to aligned uint32 array

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit bd133894672fcf3c79868605466ba7b527af3018)

include: Remove declaration of non-existent function

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 13d5af48ac514621a6a820ba800771a7fdb4fe75)

locking: Remove unused function ctdb_free_lock_request_context

There is no need for a special function to free lock request and
corresponding lock context. Freeing lock request will free lock
context also.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 2592ae5a56e813bb7cb68789f93fc281b1822a82)

locking: Talloc lock request from client specified context

This makes sure that when the client context is destroyed, the lock
request goes away. If the lock requests is already scheduled, then the
lock child process will be terminated.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 374cbc7b0ff68e04ee4e395935509c7df817b3c0)

locking: Run debug locks script only if the node is active

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit d9e4622a446c9ed60771c508638fb89055320f03)

daemon: Fix some strict-aliasing warnings

Seeing these with -Wall:

  ../server/ctdb_call.c:1117:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     record_flags = *(uint32_t *)&c->data[c->keylen + c->datalen];
     ^

memcpy() seems to be the easiest way to get fix these.  The
alternative would be to use unmarshalling functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6fd3ce53914c5c5aa79b972b42258c722b227b88)

util: Fix warning about ignored result from system()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 2807b185f438c40544d4fd133bc386e411b12d0c)

Use sys_read() and sys_write() to ensure correct signal interaction

... and avoid compiler warnings in some cases.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c1558adeaa980fb4bd6177d36250ec8262e9b9fe)

common: Copy functions sys_read() and sys_write() from source3

We really should extricate these from source3 and into some common
code. However, just copy them for now to help get rid of a lot of
warnings.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit fcd6ee1eac8627e75f72019027513cc46429a3a9)

tools: Be more helpful when CTDB CLI tool is run on unconfigured node

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 72fa984423b77eaddb16b63e6c3857600e054836)

tools: Factor out new function find_node_xpnn() from control_xpnn()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 26a02a64cda0501d57db686dee61fda0846083ee)

replace: Remove unused item returned by FAILED()

The (return) value of FAILED() is a constant 1.  However, it is never
used, so the compiler complains when run with -Wall:

  lib/replace/test/os2_delete.c: In function ‘cleanup’:
  lib/replace/test/os2_delete.c:39:163: warning: right-hand operand of comma expression has no effect [-Wunused-value]
   FAILED("system");

So just get remove the ", 1" since it is the bit that does nothing and
is never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Aug 20 16:54:31 CEST 2014 on sn-devel-104

(Imported from commit 47e7440be9ab422b3b2544c0b071fb8717a7a915)

readonly: Do not abort if revoke of readonly record fails on a node

Revoking readonly record involves first marking the record on dmaster as
RO_REVOKING_READONLY.  Then all the other nodes are sent update_record
control to get rid of RO_DELEGATION.  Once that succeeds, the record
is marked RO_REVOKING_COMPLETE.

Currently, revoking of readonly delegations on the nodes is tried only
once.  If a node goes in recovery, it can fail update_record control and
revoke code will abort ctdb.  Since database recovery would revoke all
readonly delegations anyway, there is no reason to abort.  Simply undo
the start of revoke process by resetting RO_REVOKING_READONLY flag.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Aug 13 11:24:09 CEST 2014 on sn-devel-104

(Imported from commit c6d0e8dadcff55ea21973f4f7a89f241180d17e8)

readonly: Add an early return to simplify code

This patch makes the subsequent logic change small and easier to
understand.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit f96f395d853e0181d9ee031c3e3f1d31f5cff35c)

doc: Fix default database directories in ctdbd.1

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Aug 13 07:06:42 CEST 2014 on sn-devel-104

(Imported from commit b8e9f6b015811d7fb162634f85721b5d27ab503b)

locking: Simplify ctdb_find_lock_context()

I like early returns that avoid else branches :-)

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Aug 6 14:44:31 CEST 2014 on sn-devel-104

(Imported from commit e185ff22caf430f680f8bad1edf14bc98dd7c64e)

locking: TALLOC_FREE copes with NULL

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9f596c17c7d255213df6201d4d489df1580faef4)

locking: Add per database queues for pending and active lock requests

This avoids traversing a single pending queue which is quite expensive
when there are lots of pending lock requests. This seems to happen
quite a lot on a loaded cluster for notify_index.tdb.

Adding per database queues avoids the need to traverse pending queue
for that database if there are already the maximum number of active
lock requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Aug 4 20:23:45 CEST 2014 on sn-devel-104

(Imported from commit 88f6a6c188b8e43f710c50a9c1f88af660772e3d)

locking: Update a comment

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit f73adff737c8fd3ab797de35bf1463359ce801cd)

locking: Simplify check for locks on record or database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit a890e760bbcb2d0f384aff285d1282de2a42d313)

locking: Decrement pending statistics when lock is scheduled

and not when the lock is obtained.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit aa1ff305f9bdd97675ceb4ce2b18f4cd623b8a38)

locking: Update ctdb statistics for all lock types

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit dce68a21416dd3dc016ed6a7c884b1314ffca121)

locking: Add DB lock requests to head of the pending queue

This allows to schedule DB locks quickly without having to scan through
the pending lock requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 7189437be447d33038eb26bca055b1025cebacd3)

locking: Remove unused variable lock_num_pending

The number of pending locks displayed in ctdb statistics are stored in
ctdb_statistics structure and not ctdb_context.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 3aa96c3a3eb87fc6a1ad94c983e363b402b48ff5)

locking: Increase number of lock processes per database to 200

This was the original limit in the older versions of CTDB.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 3ff8ec02830b5fd2f88e33748a2bfd9f066a1285)

locking: Add new tunable LockProcessesPerDB

This allows to change the maximum number of lock processes that can
be active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 59d45ea307fd460953a3b4924dfa60f5ab6dea4a)

locking: Allocate lock request soon after allocating lock context

This avoids extra work in case lock request allocation fails.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit e0d54594519de67b6f0d0ec003bc0327f70f026b)

locking: Remove unused function find_lock_context()

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 97a5c579574fb5702a743e07b896c9a0ec0acc4f)

locking: Schedule the next possible lock based on per-db limit

This prevents searching through active lock requests for every pending
lock request to check if the pending lock request can be scheduled or not.
The locks are scheduled in strict first-in-first-out order.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit c9664b4b17660c03ed96072a9f5392dbb0800f2c)

locking: Remove multiple lock requests per lock context (part 2)

Store only a single request instead of storing a queue in lock context.
Lock request structure does not need to be a linked list any more.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 19b3810b61e4856dc5ab6b8a5a785b836172f01e)

locking: Remove multiple lock requests per lock context (part 1)

This was a bad idea and caused out of order scheduling of lock requests.

The logic to append lock requests to existing lock context is already
commented. Remove the commented code and there is no need to check if
lock_ctx is NULL, since we are always creating a new one.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit a89f3508796d6be8efe45ccc1f9ffee7e4d3f4f3)

locking: Remove unused structure members

block_child was used to keep track of a process which was created to debug
why a lock process has blocked. That logic was replaced to execute an
external debug script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit b93d9c062229252a78a4497fba402ec968be9713)

locking: Fix the lock_type_str corresponding to LOCK_ALLDB

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 8aa6c039ae8f2bfd99515999e1ce647f0e4028d7)

eventscripts: Remove special case for virtio_net

The current check is incorrect in 2 ways:

* Commit be71a84565e9e7532a77c175732b764d1f42c1cd contained a thinko
that stops virtio_net interfaces from simply being marked up

* virtio_net interfaces can actually be down

virtio_net has supported ethtool since Linux 2.6.29, so just remove
the special case. This means that testing CTDB on very old virtual
machines is not supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 31 13:08:47 CEST 2014 on sn-devel-104

(Imported from commit bc59e508d381e6ec2a47eed1e0bc8fc3025904a2)

eventscripts: Remove unused argument to natgw_ensure_master()

This was used to limit damage in the "recovered" event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jul 29 10:03:16 CEST 2014 on sn-devel-104

(Imported from commit 7c2c6748e323fb0e54fbc2d1b773608904458e94)

tests: Add another LCP2 takeover test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit d011e70215717a0ec94f6ca30d44b0302e4533ef)

eventscripts: Remove NAT gateway "monitor" event

This event was introduced to handle misconfiguration.  For example,
where all nodes where configured as NAT gateway slaves.

However, this event can fail when there are performance issues and
capabilities can't be retrieved from a remote node.  The problem is
most likely with the remote node, so marking the local node UNHEALTHY
is probably a mistake.

Having a NAT gateway master node only matters in "ipreallocated", so
leave it to do the checking.  Given that a node will run
"ipreallocated" as part of the first recovery, this should cause
misconfigurations to be detected nice and early.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit cb94eba157679574c05d85f05828195e4099f2ba)

vacuum: stop vacuuming when the first delete_list traverse fails.

This indirect caller of delete_marshall_traverse was missed
in fa4a81c86b6073b2563b090aa657d8e8b63c1276
which lets failure of the second travers fail the vacuum run.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 9d6f187b5811faed6e9b6c4bc61e42175c0c0ae2)

vacuum: Use existing function ctdb_marshall_finish

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 23 09:44:00 CEST 2014 on sn-devel-104

(Imported from commit f87b7f664f813957ee55a6f35abb208eb0f3dcad)

vacuum: Use ctdb_marshall_add to add a record to marshall buffer

This avoids duplicate code and extra talloc in ctdb_marshall_record.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 6edc4f23e9094860ad5cc6b93ce66169dd99047a)

util: Refactor record marshalling routines to avoid extra talloc

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 42ba7a0a400c970dd534e92d2effa3ed385f8d6d)

util: Refactor ctdb_marshall_record

Create new routines ctdb_marshall_record_size and ctdb_marshall_record_copy

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 64ea6e30ef601d91ea16f6a9c5b7a6b9395c0152)

util: Fix nonempty line endings

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 5eac2302819b26b8eaf4f6c0a333e4af2b368679)

vacuum: If talloc_realloc fails, terminate traverse

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit fa4a81c86b6073b2563b090aa657d8e8b63c1276)

vacuum: Fix talloc hierarchy in delete_marshall_traverse

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
(Imported from commit 9a4a9ccda397e20b0a894541f4f1a6d24e09bf19)

common: Fix verbose_memory_names

If we have already partly written a packet, "data" and thus "pkt->data"
does not point to the start of the packet anymore. Assign "hdr" while
it still points at the start of the header.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jul 22 06:09:50 CEST 2014 on sn-devel-104

(Imported from commit 478ef9493f131c4d94bada708f790db3254f0a59)

common: Avoid a talloc in ctdb_queue_send

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 70c79f514024551128acc2d3ba879ef1407ed130)

recoverd: Gently abort recovery when election is underway

Sometimes the recovery daemon fails to get the recovery lock on one
node so that node is banned.  This seems to always happen during an
election.  The recovery is triggered because other nodes are found to
have recovery mode enabled.  They have recovery mode enabled because
an election has been forced.

The recovery daemon's main_loop() only does an initial check for an
election.  After that, a node can force an election and, in the
process, set itself to be the current winner.  In this situation,
verify_recmode() will always return MONITOR_RECOVERY_NEEDED so
do_recovery() is called.  If the previous recovery master hasn't
admitted defeat and released the recovery lock, then do_recovery()
will rightly fail.  However, it would be better if it failed a little
more gracefully, since this case is not that unusual.

Instead of trying to take the recovery lock, return early with an
error if there is an election in progress.  Note that the race is
still there but it is now much narrower.

There are probably more subtle ways of avoiding this issue, including
something like this in main_loop():

- if (pnn != rec->recmaster) {
+ if (pnn != rec->recmaster || rec->election_timeout) {
return;
}

However, this check is done earlier so it leaves the race window open
a little wider.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jul 21 06:57:07 CEST 2014 on sn-devel-104

(Imported from commit 705e4174c988eea5c5b3a834710f9f920369c8ee)

ltdb: Use tdb_null instead of zeroing TDB_DATA variable

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Jul 14 16:01:31 CEST 2014 on sn-devel-104

(Imported from commit 208b2d88c4efacee79fe4d856ee8256c680cad5c)

daemon: Support per-node robust mutex feature

To enable TDB mutex support, set tunable TDBMutexEnabled=1.

When databases are attached for the first time, attach flags must include
TDB_MUTEX_LOCKING and TDBMutexEnabled must set to enable mutex support.

However, when CTDB attaches databases internally for recovery, it will
enable mutex support if TDBMutexEnabled is set.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 9 06:45:17 CEST 2014 on sn-devel-104

(Imported from commit 55fbe364b93000c7766e95e16fa35cc6a80c697b)

daemon: Enable robust mutexes only if TDB_MUTEX_LOCKING is defined

Runtime check for robust mutexes is performed just before opening local tdb.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 2e7b0870ec1014f8320032b86dc54f0a6fd55776)

daemon: Allow flag TDB_MUTEX_LOCKING to pass into db_attach

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 1627171792567fc55290330feaaef9d9efc66c48)

daemon: Simplify code a bit

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 91be76dbe93a2be763a93163bec8c17d35057944)

daemon: Use false instead of 0 for boolean arguments

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
(Imported from commit 1ed330f7cbd753b6c29246d522c5ddca5160d8bb)

tests: Add a test for ctdb restoredb

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Jul 7 16:06:39 CEST 2014 on sn-devel-104

(Imported from commit eccce073d084eceb4bfb5c25001b5873e2c0f2b2)

tests: Check that ctdb wipedb cleans the database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 9c8c8a7b0bfd4c1cafa3deaa012049b7f0851617)

daemon: Do not thaw databases if recovery is active

This prevents ctdb tool from thawing databases prematurely in
thaw/wipedb/restoredb commands if recovery is active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 2855173dac5386bff655d1bb94c1848591b963e1)

recoverd: Set recovery mode before freezing databases

Setting recovery mode to active is the only correct way to inform recovery
daemon to run database recovery. Only freezing databases without setting
recovery mode should not trigger database recovery, as this mechanism
is used in tool to implement wipedb/restoredb commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 28a1b75886fb4aea65e23bfd00b9f4c98780fdfd)

tools: There is no need for forcing a recovery

This effectively reverts commit 442953c540424ad0c64f4264b5ee27c45a3130e8.
The correct way of telling recovery daemon to trigger a database recovery is
by setting recovery mode to active. There is no need to freeze databases as
recovery master will do that across the cluster anyway.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit 72c6500ee440779819b9adb768a7022cc251f07e)

Revert "It was possible for ->recovery_mode to get out of sync with the new three db priorities in such a way that"

This reverts commit 6578a97bd94fc14d5b6df85b84e50447f7bdb2e3.

This condition cannot happen since when recovery is triggered, all the
databases would get frozen and thawed in the order of priority. The only
other place where databases get frozen are for implementation of ctdb
wipedb/restoredb commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(Imported from commit e5cd81da77ef58992b7eb9ff7d972b499b946bb7)

common: Use SCHED_RESET_ON_FORK when setting SCHED_FIFO

This makes the scheduler reset code a no-op.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jul 7 13:28:25 CEST 2014 on sn-devel-104

(Imported from commit 1be8564e553ce044426dbe7b3987edf514832940)

recoverd: Don't say "Election timed out"

That makes people think there's a problem (and report bugs) so say
something a bit less scary instead...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a283b9e43a602b9c72065336edbe8ad7c2499117)

recoverd: Log a message when releasing the recovery lock

It is a non-trivial event and will make it easier to debug recovery
lock issues.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 8bdb9b85cc02f589a3b219de07f3c2ef7510d937)

scripts: Support NFS on RHEL7 with systemd

Need to be able to recognise a RHEL system. Still use "system" to
start and stop service, since that still works and yields the smallest
change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 61b1fdec2fdb19be9b9cd39bc5298917e914cc04)

recoverd: No need to set ctdbd_pid again

This is unnecessary since ctdbd_pid is set very early in the code before
creating any other processes including recovery daemon.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Sat Jul 5 09:20:27 CEST 2014 on sn-devel-104

(Imported from commit 331fb7fc64c0a4f64c28001a1644a2a6a923be75)

daemon: Remove ctdbd_pid global variable

This duplicates ctdb->ctdbd_pid.

Thanks to Sumit Bose <sbose@redhat.com> for the suggestion.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 1677dd499c571081a8ddaf560eb3b033156e1c67)

daemon: Check PID in ctdb_remove_pidfile(), not unreliable flag

If something unexpectedly uses fork() then an exiting child will
remove the PID file while the main daemon is still running.  The real
test is whether the current process has the PID of the main CTDB
daemon, which is the process that calls setsid().

This could be done using getpgrp() instead.  At the moment the
eventscript handler harmlessly calls setpgid() - harmless because the
atexit() handlers are cleared upon exec().  However, it is possible
that process groups will be used more in future so it is probably
better to rely on the session ID.

Thanks to Sumit Bose <sbose@redhat.com> for the idea.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit e454e5ac9c8ed77409d9fa4463b2b29985e67e10)

daemon: Exit if setting the session ID fails

Currently ctdbd_wrapper depends on the session ID. Very soon PID file
removal will too. :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c7b3be97d96ee5a17bb88dceec42c57e9bf69c5d)

tests: Fix racy test for debugging hung scripts

Debugging can still be running when a monitor event times out and
scriptstatus output changes.

When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done. The test
then waits for the log file to appear.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 3 08:19:23 CEST 2014 on sn-devel-104

(Imported from commit a7c55007659ab768293f15c5f5fc00c5d9e5c814)

scripts: Always print footer when debugging hung script

There shouldn't be an early exit for the "init" event. Just make the
"ctdb scriptstatus" call conditional.

While here, move the comment about only running a single instance to
be near locking code. The comment is more useful there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b0c191e5de15e54646b02925e37458d6a56db015)

eventscripts: Ensure $GANRECDIR points to configured subdirectory

Check that the $GANRECDIR symlink points to the location specified by
$CTDB_GANESHA_REC_SUBDIR and replace it if incorrect.  This handles
reconfiguration and filesystem changes.

While touching this code:

* Create the $GANRECDIR link as a separate step if it doesn't exist.
  This means there is only 1 place where the link is created.

* Change some variables names to the style used for local function
  variables.

* Remove some "ln failed" error messages.  ln failures will be logged
  anyway.

* Add -v to various mkdir/rm/ln commands so that these actions are
  logged when they actually do something.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 20 05:40:16 CEST 2014 on sn-devel-104

(Imported from commit aac607d7271eb50e776423329f2446a1e33a2641)

daemon: Debugging for tickle updates

This was useful for debugging the race fixed by commit
4f79fa6c7c843502fcdaa2dead534ea3719b9f69. It might be useful again.

Also fix a nearby comment typo.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 20 02:07:48 CEST 2014 on sn-devel-104

(Imported from commit 6f43896e1258c4cf43401cbfeba24a50de3c3140)

tests: Try harder to avoid failures due to repeated recoveries

About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery.  This was to avoid unexpected
recoveries causing tests to fail.  However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.

Instead, have cluster_is_healthy() warn if the cluster is in recovery.

Also:

* Rename wait_until_healthy() to wait_until_ready() because it waits
  until both healthy and out of recovery.

* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
  add a loop to wait (for 2 seconds at a time) if the cluster is back
  in recovery.  The logic here is that the re-recovery timeout has
  been set to 1 second, so sleeping for just 1 second might race
  against the next recovery.

* Use reverse logic in node_has_status() so that it works for "all".

* Tweak wait_until() so that it can handle timeouts with a
  recheck-interval specified.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 6a552f1a12ebe43f946bbbee2a3846b5a640ae4f)

vacuum: always run freelist_size again

and not only if repack_limit != 0. This partially reverts
commit 48f2d1158820bfb063ba0a0bbfb6f496a8e7522.

With the new tdb code this defragments the
free list by merging adjacent records.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 5334881afab42eae77bb2015ec21cbfe1df87807)

vacuum: add missing return to ctdb_vacuum_traverse_db() error path.

This got lost in commit 19948702992c94553e1a611540ad398de9f9d8b9
("ctdb-vacuum: make ctdb_vacuum_traverse_db() void.")

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 026d79cb009beba6987da6a6dd5fd98609140136)

vacuum: remove now unused talloc ctx argument from ctdb_vacuum_db()

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit b8658b395921a5400c9f794a07748f5ad18991f8)

vacuum: move init of vdata into init_vdata funcion

This is a small code cleanup.
vdata is only used in ctdb_vacuum_db() and not in
ctdb_vacuum_and_repack_db() where it is currently initialized.

This patch moves creation and all previously scattered
initialization of vacuum_data into ctdb_vacuum_init_vacuum_data
which is called from ctdb_vacuum_db.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit c3cb8c277a02a8a68c11ef8d341c8116172e989b)

vacuum: remove vacuum limit from vdata - not used

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit 3cf018935e057c1748ab44491135c632c023de9f)

vacuum: remove a superfluous comment.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(Imported from commit a99035a4c52f68a4a4f1862c74c1c71273a47d5b)