Ronnie Sahlberg [Fri, 24 Apr 2009 08:23:48 +0000 (18:23 +1000)]
add a tuneable RecoveryDropAllIPs so it is possible to control after how long a node that has been stuck in recovery will wait until it will yield all public addresses.
this now defaults to 60 seconds
This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses
Ronnie Sahlberg [Fri, 24 Apr 2009 08:09:51 +0000 (18:09 +1000)]
increase the loglevel for the message we print when we automatically release all ips when we have been in recovery for too long
Ronnie Sahlberg [Fri, 24 Apr 2009 04:41:21 +0000 (14:41 +1000)]
tweak some timeouts so that we do trigger a banning even if the control hangs/timesout
Ronnie Sahlberg [Fri, 24 Apr 2009 03:58:32 +0000 (13:58 +1000)]
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned.
Andrew Tridgell [Thu, 23 Apr 2009 01:35:42 +0000 (11:35 +1000)]
change shutdown level for ctdb to be 01
We want ctdb to shutdown first, as it manages many other
services. With the old level of 32 the NFS service would shutdown
first, and that would trigger ctdb to do a recovery. Then ctdb itself
would be shutdown a few seconds later, which causes a lot of error
messages in the other nodes logs
Andrew Tridgell [Thu, 23 Apr 2009 01:00:16 +0000 (11:00 +1000)]
Merge commit 'ronnie/master'
Ronnie Sahlberg [Wed, 8 Apr 2009 02:56:52 +0000 (12:56 +1000)]
new version 1.0.79
Ronnie Sahlberg [Wed, 8 Apr 2009 02:49:28 +0000 (12:49 +1000)]
create a function "remote_ip" which can be used from scripts to remove a single ip from an interface.
use this fucntion from the natgw eventscript
Ronnie Sahlberg [Wed, 8 Apr 2009 00:45:00 +0000 (10:45 +1000)]
set libdir to ../lib64 on x86-64 platforms
Ronnie Sahlberg [Tue, 7 Apr 2009 23:34:20 +0000 (09:34 +1000)]
install ctdb.pc from the RPM
Ronnie Sahlberg [Tue, 7 Apr 2009 23:21:11 +0000 (09:21 +1000)]
From Mathieu Parent <math.parent@gmail.com>
Install the pkgconfig file
Mathieu Parent [Tue, 7 Apr 2009 23:14:20 +0000 (09:14 +1000)]
Ronnie Sahlberg [Tue, 7 Apr 2009 22:48:55 +0000 (08:48 +1000)]
install /etc/ctdb/notify.sh as executable.
this addresses bug 6250
Andrew Tridgell [Tue, 7 Apr 2009 07:07:41 +0000 (17:07 +1000)]
Merge commit 'ronnie/master'
Ronnie Sahlberg [Mon, 6 Apr 2009 04:03:09 +0000 (14:03 +1000)]
we only need to switch into client mode from the eventscript child if we are running the monitor event
Ronnie Sahlberg [Mon, 6 Apr 2009 04:00:41 +0000 (14:00 +1000)]
increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell)
Ronnie Sahlberg [Mon, 6 Apr 2009 03:16:36 +0000 (13:16 +1000)]
use _exit() and not exit() when we terminate a failed eventscript child process
Ronnie Sahlberg [Mon, 6 Apr 2009 02:00:22 +0000 (12:00 +1000)]
We dont need to verify the nodemap on remote nodes that are banned
Ronnie Sahlberg [Thu, 2 Apr 2009 03:50:43 +0000 (14:50 +1100)]
if we cant pull the remote nodemap off a node we should mark it as a culprit so it eventually becomes banned.
Ronnie Sahlberg [Wed, 1 Apr 2009 06:21:38 +0000 (17:21 +1100)]
Change the (dodgy) seqnumfrequency variable to have ms resolution instead of second resolution.
Rename the variable to SeqnumInterval for
1, it is an interval and not a 1/interval unit
2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable
this is a real dodgy variable
Ronnie Sahlberg [Wed, 1 Apr 2009 06:13:48 +0000 (17:13 +1100)]
remove a prototype for a function no longer used
Ronnie Sahlberg [Tue, 31 Mar 2009 09:04:45 +0000 (20:04 +1100)]
new release 1.0.78
Ronnie Sahlberg [Tue, 31 Mar 2009 09:00:00 +0000 (20:00 +1100)]
we should also install the 11.natgw eventscript if we want to be able to use it
Ronnie Sahlberg [Tue, 31 Mar 2009 03:38:52 +0000 (14:38 +1100)]
install a default /etc/ctdb/notify.sh script as example on how to use
snmptrap/email to notify that a node has changed health status
Ronnie Sahlberg [Tue, 31 Mar 2009 03:23:31 +0000 (14:23 +1100)]
add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state.
This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes.
Ronnie Sahlberg [Tue, 31 Mar 2009 00:42:10 +0000 (11:42 +1100)]
new version 1.0.77
Ronnie Sahlberg [Tue, 31 Mar 2009 00:33:28 +0000 (11:33 +1100)]
we must also try to set the routes when we release an ip since during the release/10.interfaces there can actually be a window where the kernel decides to remove all addresses (before we manually add them back in 10.interfaces) during which the kernel may also decide to delete all routes since there are no gateways reachable through this interface anymore.
Ronnie Sahlberg [Wed, 25 Mar 2009 03:52:08 +0000 (14:52 +1100)]
new version 1.0.76
Ronnie Sahlberg [Wed, 25 Mar 2009 03:46:05 +0000 (14:46 +1100)]
change the ctdb command table to allow us to describe commands which can be run independtly of the ctdb daemon.
create a new debugging command xpnn which discovers the pnn of the local node and which works even if the local daemon is not running
Ronnie Sahlberg [Wed, 25 Mar 2009 02:46:41 +0000 (13:46 +1100)]
iupdate the documentation for NATGW to reflect that you can now use
multiple natgw groups in one cluster
Ronnie Sahlberg [Wed, 25 Mar 2009 02:37:57 +0000 (13:37 +1100)]
update how the NATGW configuration works.
allow the cluster to be partitioned into multiple disjoint natgw subsets
Ronnie Sahlberg [Tue, 24 Mar 2009 08:02:00 +0000 (19:02 +1100)]
web: fix typo
Conflicts:
web/index.html
Ronnie Sahlberg [Tue, 24 Mar 2009 07:59:27 +0000 (18:59 +1100)]
update the documentatio n with all the new commands we supprot in the
ctdb tool
Ronnie Sahlberg [Tue, 24 Mar 2009 07:23:56 +0000 (18:23 +1100)]
fix the html so that mine and obnox names are shown
Ronnie Sahlberg [Tue, 24 Mar 2009 06:49:55 +0000 (17:49 +1100)]
Merge branch 'obnox'
Ronnie Sahlberg [Tue, 24 Mar 2009 03:08:57 +0000 (14:08 +1100)]
new version 1.0.75
Ronnie Sahlberg [Tue, 24 Mar 2009 03:05:31 +0000 (14:05 +1100)]
create a varient of kill_tcp_connections that only kills off the local side of a connection
Ronnie Sahlberg [Tue, 24 Mar 2009 02:51:32 +0000 (13:51 +1100)]
set --single-public-ip when lvs is used
Ronnie Sahlberg [Tue, 24 Mar 2009 02:45:11 +0000 (13:45 +1100)]
we need to set the port properly in the parse_ip helper
Ronnie Sahlberg [Mon, 23 Mar 2009 10:44:35 +0000 (21:44 +1100)]
add michael adams as one of the ctdb developers on the main ctdb webpage
Michael Adam [Mon, 23 Mar 2009 09:07:44 +0000 (10:07 +0100)]
Merge commit 'ctdb-ronnie/master'
root [Mon, 23 Mar 2009 08:07:45 +0000 (19:07 +1100)]
add a new command "ctdb scriptstatus"
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.
If an eventscript timedout or returned an error we also
show the output from the eventscript.
Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009
OUTPUT:ERROR: Samba tcp port 445 is not responding
Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.
Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.
root [Mon, 23 Mar 2009 01:37:30 +0000 (12:37 +1100)]
create a helper function that converts a ctdb instance in daemon mode to become
a ctdb client instance.
use this from the recovery daemon child process to switch to client mode
and connect back to the main daemon
Ronnie Sahlberg [Wed, 18 Mar 2009 23:43:57 +0000 (10:43 +1100)]
The wbinfo --sequence command has been depreciated in favor of the new
--online-status command
Ronnie Sahlberg [Wed, 18 Mar 2009 23:17:44 +0000 (10:17 +1100)]
update the natgw eventscript and documentation
root [Wed, 18 Mar 2009 08:19:49 +0000 (19:19 +1100)]
redo how the natgw is done. just use a default route with a high metric instead of fancy policyrouting
Ronnie Sahlberg [Tue, 17 Mar 2009 23:05:00 +0000 (10:05 +1100)]
add documentation for the NAT-GW feature
root [Tue, 17 Mar 2009 22:33:58 +0000 (09:33 +1100)]
change the NATGW_ example in sysconfig to make it more realistic
root [Mon, 16 Mar 2009 20:35:53 +0000 (07:35 +1100)]
NAT-GW updates. Describe the functionality in the sysconfig file
Ronnie Sahlberg [Sun, 15 Mar 2009 22:27:56 +0000 (09:27 +1100)]
new version 1.0.74
Ronnie Sahlberg [Sun, 15 Mar 2009 22:21:24 +0000 (09:21 +1100)]
From C Cowan, AIX needs to set sockaddr.sa_len to a consistent value for
the address type used or the connect() call will fail.
root [Thu, 12 Mar 2009 01:33:19 +0000 (12:33 +1100)]
make sure we can collect proper mmfs data
Michael Adam [Mon, 9 Mar 2009 23:21:04 +0000 (00:21 +0100)]
ctdb.sysconfig: add CTDB_MANAGES_HTTPD comment section
Michael
Michael Adam [Sun, 8 Mar 2009 23:20:30 +0000 (00:20 +0100)]
events.d/50.samba: allow CTDB_SERVICE_{SMB,NMB,WINBIND} to be overriden from sysconfig
Michael
Michael Adam [Sun, 8 Mar 2009 23:08:26 +0000 (00:08 +0100)]
ctdb.sysconfig: add CTDB_INIT_STYLE with explanation
Michael
Andrew Tridgell [Fri, 6 Mar 2009 00:26:20 +0000 (11:26 +1100)]
Merge commit 'ronnie/master'
Michael Adam [Wed, 4 Mar 2009 20:26:25 +0000 (21:26 +0100)]
Merge commit 'ctdb-ronnie/master'
Ronnie Sahlberg [Tue, 3 Mar 2009 20:25:26 +0000 (07:25 +1100)]
new version 1.0.73
root [Tue, 3 Mar 2009 20:21:55 +0000 (07:21 +1100)]
Add a variable CTDB_NFS_SKIP_SHARE_CHECK to sysconfig that can disable the check that all shares are accessable.
This can take very long if there are very many shares and is in that case better to implement in a separate cronjob than in ctdb eventscript
Michael Adam [Sat, 28 Feb 2009 02:09:13 +0000 (03:09 +0100)]
Merge commit 'ctdb-ronnie/master'
Michael Adam [Sat, 28 Feb 2009 02:08:31 +0000 (03:08 +0100)]
move common code of system_linux.c and system_aix.c into new system_common.c
Michael
Ronnie Sahlberg [Tue, 24 Feb 2009 22:13:16 +0000 (09:13 +1100)]
From Sumit Bose <sbose@redhat.com>
Fix to the makefile dependencies for smnotify so that make -j works
root [Thu, 19 Feb 2009 23:58:34 +0000 (10:58 +1100)]
make it possible to disable checking all samba shares.
this is a timeconsuming process and might not be feasible to perform if there are very many thousand shares
Michael Adam [Thu, 19 Feb 2009 22:51:23 +0000 (23:51 +0100)]
Merge commit 'ctdb-ronnie/master'
Ronnie Sahlberg [Wed, 18 Feb 2009 02:22:26 +0000 (13:22 +1100)]
new version 1.0.72
Ronnie Sahlberg [Wed, 18 Feb 2009 02:10:03 +0000 (13:10 +1100)]
Merge branch 'martins'
Michael Adam [Mon, 9 Feb 2009 23:28:08 +0000 (00:28 +0100)]
Merge commit 'ctdb-ronnie/master'
Andrew Tridgell [Sun, 8 Feb 2009 23:53:47 +0000 (10:53 +1100)]
Merge commit 'ronnie/master'
Ronnie Sahlberg [Fri, 6 Feb 2009 21:10:34 +0000 (08:10 +1100)]
add a licence file
root [Thu, 5 Feb 2009 03:44:46 +0000 (14:44 +1100)]
use netstat to check first and only fall back to netcat if netstat is unavailable
Mathieu PARENT [Tue, 3 Feb 2009 23:50:46 +0000 (00:50 +0100)]
correct ctdbd(1) manpage warning
Signed-off-by: Michael Adam <obnox@samba.org>
Mathieu PARENT [Tue, 3 Feb 2009 23:48:56 +0000 (00:48 +0100)]
smnotify: fix popt.h include to allow use of system lib
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Tue, 3 Feb 2009 23:42:33 +0000 (00:42 +0100)]
events 41.httpd: support suse and ubuntu/debian systems for managing apache
The httpd service on suse and ubuntu/debian systems is usually
called "apache2" nowadays.
Note: There are older installs with Apache 1.3 out there, in which case
the service is called "apache". An extra check for these installs could
be useful as a sequel to this patch...
Michael
Michael Adam [Tue, 3 Feb 2009 23:28:16 +0000 (00:28 +0100)]
build: print default in help for --with-logdir
Michael
Michael Adam [Tue, 3 Feb 2009 23:22:01 +0000 (00:22 +0100)]
make: add a "showlayout" target for diagnostics
Michael
Mathieu PARENT [Tue, 3 Feb 2009 23:15:57 +0000 (00:15 +0100)]
build: Make log-directory configurable indepently of VARDIR
This adds a new configure option "--with-logdir".
logdir defaults to "${localstatedir}/log" .
It is important to have logdir configurable for debian systems,
where localstatedir is set to "/var/lib" and not "/var".
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Tue, 3 Feb 2009 23:01:15 +0000 (00:01 +0100)]
events.d/41.httpd: fix a typo in the fix of the comment typo
This is embarassing...
Michael
Ronnie Sahlberg [Mon, 2 Feb 2009 03:04:40 +0000 (14:04 +1100)]
new version 1.0.71
Michael Adam [Fri, 30 Jan 2009 17:30:25 +0000 (18:30 +0100)]
add a simple test script to test ctdb_check_tcp_ports
Michael
Michael Adam [Fri, 30 Jan 2009 17:14:41 +0000 (18:14 +0100)]
ctdb_check_tcp_ports: correctly detect listeners on ipv6 :::<port> w/out netcat
The netstat test only grepped for the ipv4 wildcard address.
Now the ipv6 wildcard listener is correctly detected as well.
Michael
Michael Adam [Fri, 30 Jan 2009 15:41:37 +0000 (16:41 +0100)]
ctdb_check_tcp_ports: fail the check if neither netstat nor netcat/nc is found
Michael
Michael Adam [Fri, 30 Jan 2009 15:10:05 +0000 (16:10 +0100)]
ctdb_check_tcp_ports: cope with multiple locations of netcat or nc
This fixes tcp port monitor events on systems, where netcat or nc
is not found in /usr/bin/, Debian, for instance.
The patch also separates the process of finding the binaries and
calling them, moving the detection outside of the loop over the
ports list.
Michael
Michael Adam [Thu, 29 Jan 2009 12:22:02 +0000 (13:22 +0100)]
remove include <netinet/in.h> from public ctdb.h
This is not portable.
The ctdb build includes the necessary headers from includes.h.
And users of ctdb should cope with including the necessary
prerequisite headers themselves.
Michael
Michael Adam [Thu, 29 Jan 2009 10:46:04 +0000 (11:46 +0100)]
packaging: add a maketarball script
The script extracts the version number from the spec file.
It takes an extra argument, that can be appended to the
version in the tar ball name and directory prefix.
Michael
Michael Adam [Wed, 28 Jan 2009 16:40:24 +0000 (17:40 +0100)]
Fix the build on AIX: sys/socket.h needs to be included before ctdb.h
(for struct sockaddr to be defined)
Thanks to William Jojo <w.jojo@hvcc.edu> for reporting.
Michael
Michael Adam [Thu, 29 Jan 2009 09:22:02 +0000 (10:22 +0100)]
autoconf: Make sure the result of the mkdir_has_mode test gets cached.
This fixes the autoconf 2.63 warning
"suspicious cache-id, must contain _cv_ to be cached".
Thanks to William Jojo <w.jojo@hvcc.edu> for reporting.
Michael
Michael Adam [Tue, 27 Jan 2009 16:17:58 +0000 (17:17 +0100)]
events.d/41.httpd: fix a comment typo
Michael
Michael Adam [Mon, 19 Jan 2009 14:33:24 +0000 (15:33 +0100)]
Fix treatment of link local ipv6 addresses: set the scope id.
metze / Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 13:14:07 +0000 (14:14 +0100)]
ctdb_util: use the parse_ip() function - avoid code duplication
Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 18:08:37 +0000 (19:08 +0100)]
ctdb_sys_have_ip: fix ipv6 support for aix, too.
Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Stefan Metzmacher [Mon, 19 Jan 2009 12:24:09 +0000 (13:24 +0100)]
ctdb_sys_have_ip: don't overwrite input data (setting port to 0)
metze
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 11:02:18 +0000 (12:02 +0100)]
Fix verification of IP allocation with ipv6 addresses on Linux.
Set sin_port or sin6_port to 0, depending on sa_family.
Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 20:22:58 +0000 (21:22 +0100)]
events 50.samba: fix control of nmbd without separate nmb service script.
protect all potentially empty $CTDB_SERVICE_* script names
Michael
Michael Adam [Mon, 19 Jan 2009 13:46:30 +0000 (14:46 +0100)]
packaging(RPM): detect and use ccache if available
Michael
Michael Adam [Mon, 19 Jan 2009 08:42:48 +0000 (09:42 +0100)]
Makefile: remove extra "/" in paths
Michael
Michael Adam [Sat, 17 Jan 2009 15:18:02 +0000 (16:18 +0100)]
makerpms: fix detection of support for --rsyncable flag in gzip.
Michael
Michael Adam [Fri, 16 Jan 2009 13:01:37 +0000 (14:01 +0100)]
ctdb.init: fix typo
Michael
Michael Adam [Fri, 16 Jan 2009 12:33:13 +0000 (13:33 +0100)]
events 50.samba: also support suse and ubuntu/debain systems
for managing samba and winbind
This uses CTDB_INIT_STYLE as exported by ctdb.init.
suse systems usually have separate init scripts for
smb for smbd and nmb for nmbd, and the ubuntu/debian
start script for smbd and nmbd is called samba instead
of smb (on redhat).
Michael
Michael Adam [Fri, 16 Jan 2009 12:31:02 +0000 (13:31 +0100)]
funcions: make (nice_)service a noop for empty service name
Michael
Michael Adam [Fri, 16 Jan 2009 12:28:19 +0000 (13:28 +0100)]
ctdb.init: use detect_init_style() in the init script
and export CTDB_INIT_STYLE, so that event scripts
as called by ctdbd can use it.
Michael