Stefan Metzmacher [Mon, 18 Jan 2010 12:05:54 +0000 (13:05 +0100)]
config: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path
This is very useful for testing, I use such a script:
cat ~/bin/ethtool
#!/bin/sh
IFACE=$1
case "$IFACE" in
Neth2)
;;
Neth3)
;;
Neth4)
;;
Neth5)
;;
*)
exec /usr/sbin/ethtool $@
;;
esac
ip link set down $IFACE
exec /usr/sbin/ethtool $@
metze
Stefan Metzmacher [Tue, 19 Jan 2010 07:42:48 +0000 (08:42 +0100)]
server: reload the public addresses before doing a takeover run
metze
Stefan Metzmacher [Mon, 18 Jan 2010 14:04:32 +0000 (15:04 +0100)]
server: ban ourself if the ctdb and kernel knowledge of a public ip differs
metze
Stefan Metzmacher [Mon, 18 Jan 2010 14:38:01 +0000 (15:38 +0100)]
server: give an error if we're getting an takeover_ip event with a wrong pnn
metze
Stefan Metzmacher [Mon, 18 Jan 2010 14:08:15 +0000 (15:08 +0100)]
server: return an error if we get an takeover ip event and we cannot serve the ip
metze
Stefan Metzmacher [Mon, 18 Jan 2010 14:12:46 +0000 (15:12 +0100)]
server: print node number as signed integer on release ip event
metze
Stefan Metzmacher [Mon, 18 Jan 2010 14:22:16 +0000 (15:22 +0100)]
server: debug redundant takeover ip events with level INFO
metze
Stefan Metzmacher [Mon, 18 Jan 2010 14:04:32 +0000 (15:04 +0100)]
server: be less verbose on redundant release_ip events
metze
Stefan Metzmacher [Sat, 16 Jan 2010 14:01:17 +0000 (15:01 +0100)]
server: add a ctdb_do_updateip()
metze
Stefan Metzmacher [Sat, 16 Jan 2010 12:30:58 +0000 (13:30 +0100)]
server: split out a ctdb_do_takeover_ip() function
metze
Stefan Metzmacher [Sat, 16 Jan 2010 12:20:45 +0000 (13:20 +0100)]
server: split out a ctdb_announce_vnn_iface() function
metze
Stefan Metzmacher [Mon, 21 Dec 2009 07:45:19 +0000 (08:45 +0100)]
events: add updateip event to 13.per_ip_routing
metze
Stefan Metzmacher [Mon, 21 Dec 2009 07:40:50 +0000 (08:40 +0100)]
events: 10.interface handle updateip event
metze
Stefan Metzmacher [Mon, 21 Dec 2009 07:33:55 +0000 (08:33 +0100)]
server: add updateip event
metze
Stefan Metzmacher [Mon, 21 Dec 2009 13:02:03 +0000 (14:02 +0100)]
config: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig
With this option set to "yes", we don't become unhealthy
as long as at least one interface is still available.
metze
Stefan Metzmacher [Mon, 21 Dec 2009 18:18:10 +0000 (19:18 +0100)]
server: start with disabled interfaces and let the event scripts enable the interfaces explicit
This makes sure that we don't get public addresses assigned during the
initial recovery and remove them again in the startup event.
metze
Stefan Metzmacher [Tue, 22 Dec 2009 14:25:30 +0000 (15:25 +0100)]
config: 10.interfaces call monitor_interfaces on startup
metze
Stefan Metzmacher [Tue, 22 Dec 2009 14:25:30 +0000 (15:25 +0100)]
config: 10.interfaces call ctdb ifaces and ctdb setifacelink for monitoring
metze
Stefan Metzmacher [Mon, 14 Dec 2009 10:59:45 +0000 (11:59 +0100)]
events: splitout a monitor_interfaces function in 10.interface
metze
Stefan Metzmacher [Tue, 22 Dec 2009 14:21:08 +0000 (15:21 +0100)]
server: monitor interfaces in verify_ip_allocation()
metze
Stefan Metzmacher [Tue, 22 Dec 2009 14:21:08 +0000 (15:21 +0100)]
server: only trigger one takeover run in verify_ip_allocation()
metze
Stefan Metzmacher [Mon, 21 Dec 2009 12:30:45 +0000 (13:30 +0100)]
tools/ctdb: add PartiallyOnline state for "ctdb status" and "ctdb status -Y"
This is based on the GET_IFACES control against each node.
metze
Stefan Metzmacher [Sat, 16 Jan 2010 09:36:35 +0000 (10:36 +0100)]
tools/ctdb: display interfaces in "ctdb ip" and "ctdb ip -Y" outputs
metze
Stefan Metzmacher [Sat, 16 Jan 2010 09:35:41 +0000 (10:35 +0100)]
tests: add a all_ips_on_node() helper function that wraps ctdb ip -Y
metze
Stefan Metzmacher [Fri, 15 Jan 2010 09:53:14 +0000 (10:53 +0100)]
tests/simple/11_ctdb_ip.sh: be more strict in checking ctdb ip -Y output
metze
Stefan Metzmacher [Thu, 17 Dec 2009 10:23:59 +0000 (11:23 +0100)]
tools/ctdb: add "ctdb ipinfo <ip>"
metze
Stefan Metzmacher [Wed, 16 Dec 2009 16:02:23 +0000 (17:02 +0100)]
tools/ctdb: add "ctdb setifacelink <iface> <status>"
metze
Stefan Metzmacher [Wed, 16 Dec 2009 15:50:23 +0000 (16:50 +0100)]
tools/ctdb: add "ctdb ifaces"
metze
Stefan Metzmacher [Thu, 17 Dec 2009 09:30:36 +0000 (10:30 +0100)]
server: implement ctdb_control_set_iface_link()
This only marks the interface status and doesn't
generate any directly triggered action.
The actions is later taken by the recovery process
in verify_ip_allocation.
metze
Stefan Metzmacher [Wed, 16 Dec 2009 10:14:44 +0000 (11:14 +0100)]
server: implement ctdb_control_get_ifaces()
metze
Stefan Metzmacher [Wed, 16 Dec 2009 10:20:28 +0000 (11:20 +0100)]
server: implement ctdb_control_get_public_ip_info()
metze
Stefan Metzmacher [Wed, 16 Dec 2009 15:18:36 +0000 (16:18 +0100)]
client: implement ctdb_ctrl_set_iface_link()
metze
Stefan Metzmacher [Wed, 16 Dec 2009 14:30:07 +0000 (15:30 +0100)]
client: implement ctdb_ctrl_get_ifaces()
metze
Stefan Metzmacher [Wed, 16 Dec 2009 15:23:08 +0000 (16:23 +0100)]
client: implement ctdb_ctrl_get_public_ip_info()
metze
Stefan Metzmacher [Wed, 16 Dec 2009 13:40:21 +0000 (14:40 +0100)]
controls: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE
metze
Stefan Metzmacher [Wed, 16 Dec 2009 15:09:40 +0000 (16:09 +0100)]
server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.
metze
Stefan Metzmacher [Wed, 16 Dec 2009 15:08:45 +0000 (16:08 +0100)]
server: implement CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE behavior
metze
Stefan Metzmacher [Wed, 16 Dec 2009 14:50:06 +0000 (15:50 +0100)]
client: add CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE ctdb_ctrl_get_public_ips_flags()
metze
Stefan Metzmacher [Mon, 21 Dec 2009 11:10:18 +0000 (12:10 +0100)]
reserve upper bits in ctdb_control->flags for opcode specific flags
metze
Stefan Metzmacher [Wed, 16 Dec 2009 09:39:40 +0000 (10:39 +0100)]
server: keep the interface information in a list of ctdb_iface structures
metze
Stefan Metzmacher [Wed, 16 Dec 2009 08:48:21 +0000 (09:48 +0100)]
server: we don't need to copy strings we pass as talloc_asprintf() arguments
metze
Stefan Metzmacher [Mon, 21 Dec 2009 07:39:21 +0000 (08:39 +0100)]
events: 10.interfaces allow multiple interfaces per public address
metze
Stefan Metzmacher [Mon, 14 Dec 2009 17:52:06 +0000 (18:52 +0100)]
server: allow multiple interfaces comma separated in public_addresses
metze
Stefan Metzmacher [Wed, 16 Dec 2009 07:54:02 +0000 (08:54 +0100)]
server: add a ctdb_vnn_iface_string() helper function to access vnn->iface
metze
Stefan Metzmacher [Mon, 14 Dec 2009 18:33:35 +0000 (19:33 +0100)]
server: add a ctdb_set_single_public_ip() helper function
metze
Stefan Metzmacher [Sat, 19 Dec 2009 17:26:01 +0000 (18:26 +0100)]
config: add 13.per_ip_routing event script
With this script it's possible to generate routing tables
per public ip address.
metze
Stefan Metzmacher [Fri, 11 Dec 2009 18:56:36 +0000 (19:56 +0100)]
config: add some ipv4 helper shell functions
Many thanks to Michael Adam <obnox@samba.org>
for the basic work.
metze
Stefan Metzmacher [Wed, 20 Jan 2010 10:10:48 +0000 (11:10 +0100)]
config: add interface_modify.sh and call it under flock to make modification on interfaces atomic
When two releaseip events run in parallel it's possible that the 2nd script
readds a secondary ip that was removed by the 1st script.
metze
Stefan Metzmacher [Fri, 18 Dec 2009 10:08:22 +0000 (11:08 +0100)]
events/10.interfaces: move some parts to helper functions
metze
Stefan Metzmacher [Fri, 18 Dec 2009 08:43:20 +0000 (09:43 +0100)]
config/functions: add tickle_tcp_connections()
metze
Stefan Metzmacher [Tue, 19 Jan 2010 09:07:14 +0000 (10:07 +0100)]
server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.
metze
Stefan Metzmacher [Thu, 7 Jan 2010 08:21:56 +0000 (09:21 +0100)]
server: setup fault handler to get the build-in backtrace support
The panic action feature will be added later.
metze
Stefan Metzmacher [Tue, 12 Jan 2010 11:17:00 +0000 (12:17 +0100)]
lib/util: add pre and post panic action hooks
metze
Stefan Metzmacher [Fri, 18 Dec 2009 11:32:38 +0000 (12:32 +0100)]
lib/util: import fault/backtrace handling from samba.
metze
Stefan Metzmacher [Fri, 18 Dec 2009 11:14:28 +0000 (12:14 +0100)]
configure: don't overwrite AC_CHECK_FUNC_EXT and AC_CHECK_LIB_EXT
This has curently no affect on the generated configure and config.h.in files.
metze
Stefan Metzmacher [Sat, 19 Dec 2009 10:40:06 +0000 (11:40 +0100)]
move DEBUG* macros to one place
metze
Stefan Metzmacher [Mon, 21 Dec 2009 12:34:21 +0000 (13:34 +0100)]
tools/ctdb: display INACTIVE status in "ctdb status" and "ctdb status -Y"
metze
Stefan Metzmacher [Tue, 19 Jan 2010 07:38:53 +0000 (08:38 +0100)]
server: add missing goto again after do_recovery()
metze
Stefan Metzmacher [Mon, 18 Jan 2010 12:19:29 +0000 (13:19 +0100)]
lib/events: finish "Run only one event for each epoll_wait/select call"
This finished commit
a78b8ea7168e5fdb2d62379ad3112008b2748576.
The logic was missing in events_standard (the one that's used by default).
metze
Ronnie Sahlberg [Tue, 19 Jan 2010 23:35:02 +0000 (10:35 +1100)]
source the nfs sysconfig file from the 61.nfstickles script
Ronnie Sahlberg [Fri, 15 Jan 2010 05:01:51 +0000 (16:01 +1100)]
document the in-memory ringbuffer for logging and the commands
used to set it up and manage it.
Ronnie Sahlberg [Fri, 15 Jan 2010 04:38:56 +0000 (15:38 +1100)]
Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.
Add an entry in the sysconfig file to set this persistently.
Ronnie Sahlberg [Tue, 12 Jan 2010 20:12:08 +0000 (07:12 +1100)]
new version 1.0.113
Ronnie Sahlberg [Tue, 12 Jan 2010 20:01:40 +0000 (07:01 +1100)]
Merge commit 'metze/master-for-ronnie'
Stefan Metzmacher [Thu, 7 Jan 2010 12:29:09 +0000 (13:29 +0100)]
server: call event_add_fd at the end of ctdb_set_child_logging()
metze
Stefan Metzmacher [Thu, 7 Jan 2010 12:47:46 +0000 (13:47 +0100)]
ctdb_logging: simplify ctdb_fork_with_logging a lot and reduce the syscall usage
metze
Martin Schwenke [Tue, 12 Jan 2010 10:07:45 +0000 (21:07 +1100)]
New version 1.0.112.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 12 Jan 2010 10:02:44 +0000 (21:02 +1100)]
Revert "Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state."
This reverts commit
7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6.
wbinfo --ping-dc is proving too unreliable.
Martin Schwenke [Tue, 12 Jan 2010 10:02:11 +0000 (21:02 +1100)]
Revert "events/50.samba: only use wbinfo --ping-dc if available"
This reverts commit
7b73834ba3ac197cc8a3020c111f9bb2c567e70b.
wbinfo --ping-dc is proving too unreliable.
Martin Schwenke [Thu, 7 Jan 2010 01:46:26 +0000 (12:46 +1100)]
Merge commit 'origin/master'
Ronnie Sahlberg [Fri, 18 Dec 2009 04:16:04 +0000 (15:16 +1100)]
New version 1.0.111
Rusty Russell [Fri, 18 Dec 2009 03:43:09 +0000 (14:13 +1030)]
eventscript: fix bug when script is aborted
Another corner case when we terminate running monitor scripts to run
something else: logging can flush the output and we write to a NULL
pointer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 18 Dec 2009 03:24:40 +0000 (13:54 +1030)]
eventscript: remove cb_status, fix uninitialized bug when monitoring aborted
(Reapplied with merge after accidental revert)
Previously we updated cb_status a each script finished. Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.
In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized. In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't. So we use the last status (normally,
this will be the just-saved current status).
In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Martin Schwenke [Fri, 18 Dec 2009 03:44:25 +0000 (14:44 +1100)]
Merge commit 'origin/master'
Martin Schwenke [Fri, 18 Dec 2009 03:43:45 +0000 (14:43 +1100)]
Test suite: Add an optimisation in the getvar test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 18 Dec 2009 03:42:58 +0000 (14:42 +1100)]
Test suite: allow settign of timeout triggers for all events not just monitor.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Fri, 18 Dec 2009 01:32:58 +0000 (12:32 +1100)]
Version 1.0.110
Rusty Russell [Fri, 18 Dec 2009 01:24:24 +0000 (11:54 +1030)]
eventscript: fix cleanup path when setting up script list
We shouldn't set ctdb->current_monitor until we set destructor: that's
what cleans it up.
Also, free state->scripts on no-scripts exit path: it's not a child of
state because we need it in the destructor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Stefan Metzmacher [Thu, 17 Dec 2009 12:04:27 +0000 (13:04 +0100)]
server: add set_close_on_exec() on more fds
metze
Stefan Metzmacher [Thu, 17 Dec 2009 12:03:42 +0000 (13:03 +0100)]
server: fix fd leaks in the new logging code
metze
Ronnie Sahlberg [Thu, 17 Dec 2009 04:49:01 +0000 (15:49 +1100)]
version 1.0.109
Rusty Russell [Thu, 17 Dec 2009 04:08:15 +0000 (14:38 +1030)]
eventscript: remove cb_status, fix uninitialized bug when monitoring aborted
Previously we updated cb_status a each script finished. Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.
In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized. In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't. So we use the last status (normally,
this will be the just-saved current status).
In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ronnie Sahlberg [Wed, 16 Dec 2009 21:18:04 +0000 (08:18 +1100)]
fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'
Conflicts:
server/ctdb_vacuum.c
Rusty Russell [Wed, 16 Dec 2009 10:27:20 +0000 (20:57 +1030)]
ctdb: use mlockall, cautiously
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays. But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.
This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB. We warn, but don't exit, if it fails.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Wed, 16 Dec 2009 08:56:22 +0000 (19:26 +1030)]
Remove RT priority, use niceness.
1) It's buggy. Code needs to be carefully written (ie. no busy
loops) to handle running with it, and we fork and run scripts.[1]
2) It makes debugging harder. If ctdbd loops (as has happened recently)
it can be extremely hard to get in and see what's happening. We've already
seen the valgrind hacks.
3) We have seen recent scheduler problems. Perhaps they are unrelated,
but removing this very unusual setup is unlikely to hurt.
4) It doesn't make anything faster. Under all but the most perverse of
circumstances, 99% of the cpu gives the same performance as 100%, and
we will always preempt normal processes anyway.
[1] I made this worse in
0fafdcb8d353 "eventscript: fork() a child for
each script" by removing the switch_from_server_to_client() which
restored it, but even that was only for monitor scripts. Others were
run with RT priority.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Wed, 16 Dec 2009 10:29:15 +0000 (20:59 +1030)]
Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit. We can also happily move the kill-child eventscript hack under
this flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ronnie Sahlberg [Wed, 16 Dec 2009 07:34:40 +0000 (18:34 +1100)]
fix conflict in merge from metze
Merge commit 'metze/master-tdb-check'
Conflicts:
server/ctdb_vacuum.c
Stefan Metzmacher [Fri, 20 Nov 2009 20:17:59 +0000 (21:17 +0100)]
ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls
metze
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Stefan Metzmacher [Mon, 7 Dec 2009 12:02:59 +0000 (13:02 +0100)]
doc: regenerate manpages
metze
Stefan Metzmacher [Tue, 8 Dec 2009 11:28:38 +0000 (12:28 +0100)]
doc: fix docbook warnings for ctdb.1 and onnode.1 manpages
metze
Stefan Metzmacher [Wed, 9 Dec 2009 10:29:52 +0000 (11:29 +0100)]
doc/ctdb.1: update example "ctdb listvars" output
metze
Stefan Metzmacher [Tue, 8 Dec 2009 11:44:13 +0000 (12:44 +0100)]
doc/ctdb.1: make clear the database is specified by name for "ctdb backupdb"
metze
Stefan Metzmacher [Tue, 8 Dec 2009 11:43:33 +0000 (12:43 +0100)]
doc/ctdb.1: document "ctdb getdbstatus <dbname>"
metze
Stefan Metzmacher [Mon, 7 Dec 2009 09:19:20 +0000 (10:19 +0100)]
doc/ctdb.1: add "See also" for ctdb getdbmap
metze
Stefan Metzmacher [Tue, 8 Dec 2009 11:08:27 +0000 (12:08 +0100)]
doc/ctdb.1: document "ctdb dumpdbbackup <file>"
metze
Stefan Metzmacher [Mon, 7 Dec 2009 09:18:39 +0000 (10:18 +0100)]
doc/ctdb.1: document -Y output fot ctdb getdbmap
metze
Stefan Metzmacher [Mon, 7 Dec 2009 09:10:05 +0000 (10:10 +0100)]
doc/ctdb.1: document UNHEALTHY for "ctdb getdbmap"
metze
Stefan Metzmacher [Mon, 7 Dec 2009 09:00:52 +0000 (10:00 +0100)]
doc/ctdb.1: document "ctdb wipedb"
metze
Stefan Metzmacher [Mon, 7 Dec 2009 09:53:31 +0000 (10:53 +0100)]
config: add CTDB_MAX_PERSISTENT_CHECK_ERRORS option
metze
Stefan Metzmacher [Mon, 7 Dec 2009 09:46:10 +0000 (10:46 +0100)]
config: try to use tdbtool <tdb> check instead of tdbdump for persistent db checks
metze