sahlberg/ctdb.git
16 years agoneed to specicy tree to git archive
Andrew Tridgell [Fri, 9 May 2008 23:35:13 +0000 (09:35 +1000)]
need to specicy tree to git archive

16 years agouse git archive to create tarball
Andrew Tridgell [Fri, 9 May 2008 23:24:51 +0000 (09:24 +1000)]
use git archive to create tarball

16 years agoupdate to new version
Ronnie Sahlberg [Fri, 9 May 2008 03:47:38 +0000 (13:47 +1000)]
update to new version

16 years agofix a bug where the public ip addresses of the cluster would not be redistributed...
Ronnie Sahlberg [Fri, 9 May 2008 03:41:31 +0000 (13:41 +1000)]
fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed.

Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses.
This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file   but is invalid today where we can have different public addresses configs on all nodes  and even have some nodes that do NOT use public addresses at all.

16 years agofixed realloc bug
Andrew Tridgell [Thu, 8 May 2008 09:59:24 +0000 (19:59 +1000)]
fixed realloc bug

Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t

16 years agofix merge corruption
Ronnie Sahlberg [Thu, 8 May 2008 09:52:27 +0000 (19:52 +1000)]
fix merge corruption

16 years agoMerge git://git.samba.org/tridge/ctdb
Ronnie Sahlberg [Thu, 8 May 2008 07:49:48 +0000 (17:49 +1000)]
Merge git://git.samba.org/tridge/ctdb

16 years agoMerge commit 'sofs1/tridge'
Andrew Tridgell [Thu, 8 May 2008 07:15:41 +0000 (17:15 +1000)]
Merge commit 'sofs1/tridge'

16 years agolisten_fd is auto-closed
root [Thu, 8 May 2008 07:14:00 +0000 (17:14 +1000)]
listen_fd is auto-closed

Closing it here just causes an epoll error, and may close a fd in use by
another structure to be closed. This caused a infinite recovery loop

16 years agoMerge branch 'master' of git://git.samba.org/sahlberg/ctdb
Andrew Tridgell [Thu, 8 May 2008 06:58:34 +0000 (16:58 +1000)]
Merge branch 'master' of git://git.samba.org/sahlberg/ctdb

16 years agoMerge commit 'ronnie-ctdb/master' into tridge tridge/ronnie
root [Thu, 8 May 2008 06:46:23 +0000 (16:46 +1000)]
Merge commit 'ronnie-ctdb/master' into tridge

16 years agoFrom Mathias Dietz
Ronnie Sahlberg [Wed, 7 May 2008 20:52:53 +0000 (06:52 +1000)]
From Mathias Dietz

Make the 60.nfs eventscript more forgiving when using non-us/english
characters in sharenames

16 years agoupdate to version .35
Ronnie Sahlberg [Wed, 7 May 2008 01:31:37 +0000 (11:31 +1000)]
update to version .35

16 years agoExpand the client async framework so that it can take a callback function.
Ronnie Sahlberg [Tue, 6 May 2008 05:42:59 +0000 (15:42 +1000)]
Expand the client async framework so that it can take a callback function.
This allows us to use the async framework also for controls that return
outdata.

Add a "capabilities" field to the ctdb_node structure. This field is
only initialized and kept valid inside the recovery daemon context and not
inside the main ctdb daemon.

change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable.

When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes.
when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap.
Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list)

16 years agomake sure we lose all elections for recmaster role if we do not have the recmaster...
Ronnie Sahlberg [Tue, 6 May 2008 03:56:56 +0000 (13:56 +1000)]
make sure we lose all elections for recmaster role if we do not have the recmaster capability.

(unless there are no other node at all available with this capability)

16 years agoclose and reopen the reclock pnn file at regular intervals.
Ronnie Sahlberg [Tue, 6 May 2008 03:27:17 +0000 (13:27 +1000)]
close and reopen the reclock pnn file at regular intervals.

handle failure to get/hold the reclock pnn file better and just
treat it as a transient backend filesystem error and try again later
instead of shutting down the recovery daemon

when we have lost the pnn file   and if we are recmaster
release the recmaster role so that someone else can become recmaster isntead

16 years agoMonitor that the recovery daemon is still running from the main ctdb daemon
Ronnie Sahlberg [Tue, 6 May 2008 01:19:17 +0000 (11:19 +1000)]
Monitor that the recovery daemon is still running from the main ctdb daemon
and if it has terminated, then we shut down the main daemon as well

16 years agoAdd ability to disable recmaster and lmaster roles through sysconfig file and
Ronnie Sahlberg [Tue, 6 May 2008 00:41:22 +0000 (10:41 +1000)]
Add ability to disable recmaster and lmaster roles through sysconfig file and
command line arguments

16 years agoAdd a capabilities field to the ctdb structure
Ronnie Sahlberg [Tue, 6 May 2008 00:02:27 +0000 (10:02 +1000)]
Add a capabilities field to the ctdb structure

Define two capabilities :
can be recmaster
can be lmaster
Default both capabilities to YES

Update the ctdb tool to read capabilities off a node

16 years agoUse DEBUG_ERR and not DEBUG_WARNING when we get a connection
Ronnie Sahlberg [Mon, 5 May 2008 21:57:43 +0000 (07:57 +1000)]
Use DEBUG_ERR and not DEBUG_WARNING when we get a connection
attempt from a non-ctdb host

16 years agoupdate version to .34
Ronnie Sahlberg [Thu, 24 Apr 2008 12:06:04 +0000 (22:06 +1000)]
update version to .34

16 years agowhen deleting a public ip from a node that is currently hosting this ip, try to move...
Ronnie Sahlberg [Thu, 24 Apr 2008 11:51:08 +0000 (21:51 +1000)]
when deleting a public ip from a node that is currently hosting this ip, try to move the ip address to a different node first

16 years agomake 'ctdb catdb' produce output that resembles the output of tdbdump
Ronnie Sahlberg [Wed, 23 Apr 2008 11:49:52 +0000 (21:49 +1000)]
make 'ctdb catdb' produce output that resembles the output of tdbdump

16 years agowhen adding a new public ip address to a running node using the 'ctdb addip' command,
Ronnie Sahlberg [Wed, 23 Apr 2008 11:05:36 +0000 (21:05 +1000)]
when adding a new public ip address to a running node using the 'ctdb addip' command,
If no other node is hosting this public ip at the moment, then assign it immediately to the current node.

16 years agoRevert "- accept an optional set of tdb_flags from clients on open a database,"
Ronnie Sahlberg [Thu, 10 Apr 2008 04:45:45 +0000 (14:45 +1000)]
Revert "- accept an optional set of tdb_flags from clients on open a database,"

This reverts commit 49330f97c78ca0669615297ac3d8498651831214.

16 years agoRevert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
Ronnie Sahlberg [Thu, 10 Apr 2008 04:57:41 +0000 (14:57 +1000)]
Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""

This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603.

16 years agoRevert "Revert "Revert "- accept an optional set of tdb_flags from clients on open...
Ronnie Sahlberg [Thu, 10 Apr 2008 05:59:51 +0000 (15:59 +1000)]
Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""

remove the transaction stuff and push   so that the git tree will work

This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b.

16 years agomake ctdb eventrscipt accept the -n all argument to run the event script on all conne...
Ronnie Sahlberg [Tue, 15 Apr 2008 08:24:48 +0000 (18:24 +1000)]
make ctdb eventrscipt accept the -n all argument to run the event script on all connected nodes

16 years agowhen a node disgrees with us re who is recmaster
Ronnie Sahlberg [Mon, 21 Apr 2008 14:56:27 +0000 (00:56 +1000)]
when a node disgrees with us re who is recmaster
make it mark that node as a lcuprit so it eventually gets banned

16 years agoadd support for -n all in "ctdb -n all ip"
Ronnie Sahlberg [Tue, 22 Apr 2008 14:55:57 +0000 (00:55 +1000)]
add support for -n all   in "ctdb -n all ip"
this collects all public addresses from all nodes and presents the public ips
for the entire cluster

16 years agoadd support for -n all in "ctdb -n all ip"
Ronnie Sahlberg [Tue, 22 Apr 2008 14:55:57 +0000 (00:55 +1000)]
add support for -n all   in "ctdb -n all ip"
this collects all public addresses from all nodes and presents the public ips
for the entire cluster

16 years agofixed permissions on configure.rpm
Andrew Tridgell [Tue, 22 Apr 2008 14:48:25 +0000 (16:48 +0200)]
fixed permissions on configure.rpm

16 years agowhen a node disgrees with us re who is recmaster
Ronnie Sahlberg [Mon, 21 Apr 2008 14:56:27 +0000 (00:56 +1000)]
when a node disgrees with us re who is recmaster
make it mark that node as a lcuprit so it eventually gets banned

16 years agomake ctdb eventrscipt accept the -n all argument to run the event script on all conne...
Ronnie Sahlberg [Tue, 15 Apr 2008 08:24:48 +0000 (18:24 +1000)]
make ctdb eventrscipt accept the -n all argument to run the event script on all connected nodes

16 years agoRevert "Revert "Revert "- accept an optional set of tdb_flags from clients on open...
Ronnie Sahlberg [Thu, 10 Apr 2008 05:59:51 +0000 (15:59 +1000)]
Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""

remove the transaction stuff and push   so that the git tree will work

This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b.

16 years ago- accept an optional set of tdb_flags from clients on open a database,
Andrew Tridgell [Thu, 10 Apr 2008 05:25:48 +0000 (15:25 +1000)]
- accept an optional set of tdb_flags from clients on open a database,
  thus allowing the client to pass through the TDB_NOSYNC flag

- ensure that tdb_store() operations on persistent databases that don't
  have TDB_NOSYNC set happen inside a transaction wrapper, thus making
  them crash safe

16 years agoRevert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
Ronnie Sahlberg [Thu, 10 Apr 2008 04:57:41 +0000 (14:57 +1000)]
Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""

This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603.

16 years agoRevert "- accept an optional set of tdb_flags from clients on open a database,"
Ronnie Sahlberg [Thu, 10 Apr 2008 04:45:45 +0000 (14:45 +1000)]
Revert "- accept an optional set of tdb_flags from clients on open a database,"

This reverts commit 49330f97c78ca0669615297ac3d8498651831214.

16 years agofix compiler warning during a fatal error failing to lock down the socket
Ronnie Sahlberg [Wed, 9 Apr 2008 23:56:49 +0000 (09:56 +1000)]
fix compiler warning during a fatal error failing to lock down the socket

16 years agoshell scripts need extra spaces sometime
Ronnie Sahlberg [Wed, 9 Apr 2008 21:01:22 +0000 (07:01 +1000)]
shell scripts need extra spaces sometime

16 years agoupdate to version .33
Ronnie Sahlberg [Wed, 9 Apr 2008 20:55:31 +0000 (06:55 +1000)]
update to version .33

16 years agoFrom Chris Cowan
Ronnie Sahlberg [Wed, 9 Apr 2008 20:51:53 +0000 (06:51 +1000)]
From Chris Cowan
secure the domain socket and set permissions properly

16 years agoadd possibility to provide site local modifications to the event system
Ronnie Sahlberg [Wed, 9 Apr 2008 20:50:12 +0000 (06:50 +1000)]
add possibility to provide site local modifications to the event system
through a /etc/ctdb/rc.local script that is sources by /etc/ctdb/functions

16 years agoadd a ctdb command to print the ctdb version
Ronnie Sahlberg [Thu, 3 Apr 2008 06:07:00 +0000 (17:07 +1100)]
add a ctdb command to print the ctdb version

16 years agowe allocated one byte too little in the blob we need to send as the control to the...
Ronnie Sahlberg [Thu, 3 Apr 2008 05:35:23 +0000 (16:35 +1100)]
we allocated one byte too little in the blob we need to send as the control to the server.

16 years agoFrom Chris Cowan
Ronnie Sahlberg [Wed, 2 Apr 2008 23:58:51 +0000 (10:58 +1100)]
From Chris Cowan

Add support in AIX to track the PID of a client that connects to the unix domain socket

16 years agobump version to .32
Ronnie Sahlberg [Wed, 2 Apr 2008 01:09:27 +0000 (12:09 +1100)]
bump version to .32

16 years agoadd a mechanism to force a node to run the eventscripts with arbitrary arguments
Ronnie Sahlberg [Wed, 2 Apr 2008 00:13:30 +0000 (11:13 +1100)]
add a mechanism to force a node to run the eventscripts with arbitrary arguments

ctdb eventscript "command argument argument ..."

16 years agodecorate the memdump output with a nice field for ctdb_client structures to show...
Ronnie Sahlberg [Tue, 1 Apr 2008 06:17:21 +0000 (17:17 +1100)]
decorate the memdump output with a nice field for ctdb_client structures to show the pid of the client that attached

16 years agoadd improvements to tracking memory usage in ctdbd adn the recovery daemon
Ronnie Sahlberg [Tue, 1 Apr 2008 04:34:54 +0000 (15:34 +1100)]
add improvements to tracking memory usage in ctdbd adn the recovery daemon

and a ctdb command to pull the talloc memory map from a recovery daemon
ctdb rddumpmemory

16 years agofrom tridge: decorate dumpmemory output so that packets that are queued show up with...
Ronnie Sahlberg [Tue, 1 Apr 2008 00:31:42 +0000 (11:31 +1100)]
from tridge: decorate dumpmemory output so that packets that are queued show up with a little more information to make memory leak debugging easier

16 years agoreturn 0 if iscsi is disabled
Ronnie Sahlberg [Mon, 31 Mar 2008 01:58:20 +0000 (12:58 +1100)]
return 0 if iscsi is disabled

16 years agomake sure the iface string is nullterminated in the addip control packet
Ronnie Sahlberg [Mon, 31 Mar 2008 01:49:39 +0000 (12:49 +1100)]
make sure the iface string is nullterminated in the addip control packet

16 years agoupdate the iscis support under RHEL5 to allow one iscsi target to be defined for...
Ronnie Sahlberg [Mon, 31 Mar 2008 00:00:08 +0000 (11:00 +1100)]
update the iscis support under RHEL5 to allow one iscsi target to be defined for each public address in the cluster.

update the documentation for iscsi

16 years agoAdd two new controls to add/delete public ip address from a node at runtime.
Ronnie Sahlberg [Wed, 26 Mar 2008 22:23:27 +0000 (09:23 +1100)]
Add two new controls to add/delete public ip address from a node at runtime.

The controls only modify the runtime setting of which public addresses a node
can server and does not modify /etc/ctdb/public_addresses.
To make the change permanent you also need to edit /etc/ctdb/public_addresses
manually.

After ip addresses have been added/deleted you need to invoke a recovery
for the ip addresses to be redistributed.

16 years agofix a memory leak
Ronnie Sahlberg [Tue, 25 Mar 2008 00:11:13 +0000 (11:11 +1100)]
fix a memory leak

allocate the memory to the 'call' context and not off the 'ctdb' context

16 years agoupdate to version 1.0.31
Ronnie Sahlberg [Mon, 24 Mar 2008 22:43:47 +0000 (09:43 +1100)]
update to version 1.0.31

16 years agoFrom M Dietz,
Ronnie Sahlberg [Mon, 24 Mar 2008 21:27:38 +0000 (08:27 +1100)]
From M Dietz,
Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago

16 years agoin ctdb_call_local() we can not talloc_steal() the returned data and hang it off...
Ronnie Sahlberg [Wed, 19 Mar 2008 02:54:17 +0000 (13:54 +1100)]
in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb.
This can cause a memory leak if the call is terminated before we have managed to respond to the client.
(and the call is talloc_free()d but the data is still hanging off ctdb)

instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak.

In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc().

This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so
we must change all creations of a ctdb_call into explicitely creating it through talloc()

16 years agodont steal reply_data.dptr to ctdb if there is no data, since then we would leak
Ronnie Sahlberg [Wed, 19 Mar 2008 01:08:29 +0000 (12:08 +1100)]
dont steal reply_data.dptr to ctdb if there is no data, since then we would leak
memory

16 years agochange the log level for the message when someone connects to a non-public ip
Ronnie Sahlberg [Wed, 12 Mar 2008 20:54:55 +0000 (07:54 +1100)]
change the log level for the message when someone connects to a non-public ip

16 years agoRedo the vacukming process to mkake it scalable.
Ronnie Sahlberg [Wed, 12 Mar 2008 20:53:29 +0000 (07:53 +1100)]
Redo the vacukming process to mkake it scalable.

Vacumming used to delete one record at a time on all nodes, that was
m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all.

The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted.

16 years agoupdate to version 1.0.30
Ronnie Sahlberg [Tue, 4 Mar 2008 02:40:29 +0000 (13:40 +1100)]
update to version 1.0.30

16 years agoUpdate ctdb uptime to provide machinereadable output
Ronnie Sahlberg [Tue, 4 Mar 2008 02:29:48 +0000 (13:29 +1100)]
Update ctdb uptime to provide machinereadable output

16 years agoprovide machinereadble -Y output for 'ctdb getdebug'
Ronnie Sahlberg [Tue, 4 Mar 2008 02:23:06 +0000 (13:23 +1100)]
provide machinereadble -Y output for 'ctdb getdebug'

16 years agomake 'ctdb ip' provide machinereadble output using '-Y'
Ronnie Sahlberg [Tue, 4 Mar 2008 02:18:27 +0000 (13:18 +1100)]
make 'ctdb ip' provide machinereadble output using '-Y'

16 years agodocument some public tunables
Ronnie Sahlberg [Tue, 4 Mar 2008 02:06:46 +0000 (13:06 +1100)]
document some public tunables

16 years agodocument some new ctdb command
Ronnie Sahlberg [Tue, 4 Mar 2008 01:37:24 +0000 (12:37 +1100)]
document some new ctdb command

16 years agoA new command to 'ctdb'
Ronnie Sahlberg [Tue, 4 Mar 2008 01:20:23 +0000 (12:20 +1100)]
A new command to 'ctdb'

ctdb moveip <IPADDRESS> <NODE>

which can be used to manually fail an ip address over to a specific node.

This can only be used if DeteministicIPs are disabled and also only if NoIPFailback is enabled.

16 years agoadd a new tunable 'NoIPFailback'
Ronnie Sahlberg [Mon, 3 Mar 2008 01:52:16 +0000 (12:52 +1100)]
add a new tunable 'NoIPFailback'
when this tunable is set, ip addresses will only be failed over when a node
fails. And only those ip addresses held by the failed node will be reallocated
in the cluster.

When a node becomes active again, this will not lead to any failback of ip addresses.

This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes.

This tuneable can NOT be active at the same time as DeterministicIPs are used.

16 years agowhen we reallocate the ip addresses for nodes, we must make sure that
Ronnie Sahlberg [Sun, 2 Mar 2008 23:53:23 +0000 (10:53 +1100)]
when we reallocate the ip addresses for nodes, we must make sure that
a node that has been allocated to server an ip actually CAN serve that ip
(if we use differing public_addresses files on each node)

16 years agoadd a num_connected field to the rec structure that holds the number
Ronnie Sahlberg [Sun, 2 Mar 2008 23:24:17 +0000 (10:24 +1100)]
add a num_connected field to the rec structure that holds the number
of connected nodes

num_active only contains the number of active nodes and would thus not count
banned nodes

16 years agoadd a new tunable : reclockpingperiod
Ronnie Sahlberg [Sun, 2 Mar 2008 22:19:30 +0000 (09:19 +1100)]
add a new tunable : reclockpingperiod

once every such interval :
* the recovery master on each node will uppdate the "connected" count in the
reclock count file (ctdb getreclock)
* if the node thinks it is a recovery master but it detects another node
  that is DISCONNECTED but which still holds a lock to the reclock count file
  this may mean that we have a split cluster.
  if that other node that is DISCONNECTED but still holds the lock on hte reclock
  pnn count file, is MORE connected than the local node,
  yield the recmaster role and let the other half of the lcuster take over

this add a second, last chance mechanism to detect split clusters.
IF the cluster is split but GPFS is not yet split, this mechanism makes
the largest half of the cluster become the active half.

16 years agochange recmaster from being a local variable in monitor_cluster() to be a member...
Ronnie Sahlberg [Sun, 2 Mar 2008 20:53:46 +0000 (07:53 +1100)]
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure

16 years agoupdate the reclock pnn count for how many nodes are connected to the current node...
Ronnie Sahlberg [Fri, 29 Feb 2008 02:14:47 +0000 (13:14 +1100)]
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds

16 years agostore the num_active variable (number of connected/active nodes) inside the rec
Ronnie Sahlberg [Fri, 29 Feb 2008 01:55:20 +0000 (12:55 +1100)]
store the num_active variable (number of connected/active nodes) inside the rec
structure and avoid passing this as an extra parameter to do_recovery()

16 years agoadd a new file <reclock>.pnn where each recovery daemon can lock that byte at offset...
Ronnie Sahlberg [Fri, 29 Feb 2008 01:37:42 +0000 (12:37 +1100)]
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate.

16 years agoadd a control to get the name of the reclock file from the daemon
Ronnie Sahlberg [Thu, 28 Feb 2008 23:03:39 +0000 (10:03 +1100)]
add a control to get the name of the reclock file from the daemon

16 years agoadd a new tunable DisableWhenUnhealthy which when set will cause a node to automatica...
Ronnie Sahlberg [Thu, 21 Feb 2008 23:33:09 +0000 (10:33 +1100)]
add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY.

Use with caution.

16 years agodocument the --start-as-disabled argument
Ronnie Sahlberg [Thu, 21 Feb 2008 23:01:15 +0000 (10:01 +1100)]
document the --start-as-disabled argument

16 years agoAdd debug output to indicate why a node starts up in DISABLED state
Ronnie Sahlberg [Thu, 21 Feb 2008 22:52:57 +0000 (09:52 +1100)]
Add debug output to indicate why a node starts up in DISABLED state

16 years agoAdd a new parameter to /etc/sysconfig/ctdb
Ronnie Sahlberg [Thu, 21 Feb 2008 22:42:52 +0000 (09:42 +1100)]
Add a new parameter to /etc/sysconfig/ctdb
CTDB_START_AS_DISABLED="yes"

and command line argument
--start-as-disabled

When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses.
The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses.

Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster.

16 years agomonitor the amount of free memory and if this treshold is crossed, monitoring will...
Ronnie Sahlberg [Thu, 21 Feb 2008 02:29:28 +0000 (13:29 +1100)]
monitor the amount of free memory and if this treshold is crossed, monitoring will log an OOM memory in the ctdb log and shut down ctdb on the node.

by default ctdb does not monitor for OOM.
to enable this you need to uncomment the CTDB_MONITOR_FREE_MEMORY line in /etc/sysconfig/ctdb and specify the amount in MByte free that will trigger OOM and cause ctdb to shutdown the node

16 years agoupdate version to 1.0.29
Ronnie Sahlberg [Wed, 20 Feb 2008 21:37:29 +0000 (08:37 +1100)]
update version to 1.0.29

16 years agomake the ctdb reloadnodes reload the nodes file on all nodes and restart the transport
Ronnie Sahlberg [Wed, 20 Feb 2008 21:25:01 +0000 (08:25 +1100)]
make the ctdb reloadnodes reload the nodes file on all nodes and restart the transport

16 years agoto make it easier/less disruptive to add nodes to a running cluster
Ronnie Sahlberg [Tue, 19 Feb 2008 03:44:48 +0000 (14:44 +1100)]
to make it easier/less disruptive to add nodes to a running cluster

add a new control that causes the node to drop the current nodes list
and reread it from the nodes file.
During this operation, the node will also drop the tcp layer and restart it.

When we drop the tcp layer, by talloc_free()ing the ctcp structure
add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer

add two new commands for the ctdb tool.
one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file

16 years agothe ctdb structure must make its own copy of the ->address field and not just
Ronnie Sahlberg [Tue, 19 Feb 2008 03:35:15 +0000 (14:35 +1100)]
the ctdb structure must make its own copy of the ->address field and not just
copy the content of the nodes structure.

this ctdb_address structure contains a pointer which is talloced hanging off the structure itself.
If we copy the content of this structure as we did in assigning to ctdb->address from nodes[i]
then if we talloc_free() the node structure we end up with a wild pointer in ctdb->address

16 years agoread the current debuglevel in each loop in the recovery daemon so that we
Ronnie Sahlberg [Mon, 18 Feb 2008 08:38:04 +0000 (19:38 +1100)]
read the current debuglevel in each loop in the recovery daemon so that we
pick up when they change in the parent daemon

16 years agofrom Mathieu PARENT <math.parent@gmail.com>
Ronnie Sahlberg [Tue, 12 Feb 2008 21:20:20 +0000 (08:20 +1100)]
from Mathieu PARENT <math.parent@gmail.com>

Simulate "nice service" on systems that do not have "service"

16 years agoFrom Mathieu PARENT <math.parent@gmail.com>
Ronnie Sahlberg [Tue, 12 Feb 2008 21:17:53 +0000 (08:17 +1100)]
From Mathieu PARENT <math.parent@gmail.com>

Set the correct permissions for events.d/README

16 years agoadd helpers to stop/start nfs lockmanager on different platforms
Ronnie Sahlberg [Sun, 10 Feb 2008 22:52:09 +0000 (09:52 +1100)]
add helpers to stop/start nfs lockmanager on different platforms

16 years agocreate a startstop_nfs function that can start/stop the nfs service of different...
Ronnie Sahlberg [Sun, 10 Feb 2008 22:35:37 +0000 (09:35 +1100)]
create a startstop_nfs function that can start/stop the nfs service of different platforms

16 years agoupdate to revision 28 tridge/ctdb-1.0.28 tridge/test
Ronnie Sahlberg [Fri, 8 Feb 2008 04:12:06 +0000 (15:12 +1100)]
update to revision 28

16 years agocarefully step around the recovery area when doing a tdb_wipe_all. This prevents
Andrew Tridgell [Fri, 8 Feb 2008 03:10:54 +0000 (14:10 +1100)]
carefully step around the recovery area when doing a tdb_wipe_all. This prevents
problems with wipe_all on databases that may need crash recovery

16 years agodon't ship the .git directory in the srpm
Andrew Tridgell [Fri, 8 Feb 2008 02:22:47 +0000 (13:22 +1100)]
don't ship the .git directory in the srpm

16 years agoMerge git://git.samba.org/tridge/ctdb
Ronnie Sahlberg [Thu, 7 Feb 2008 21:21:03 +0000 (08:21 +1100)]
Merge git://git.samba.org/tridge/ctdb

16 years agofixed a problem with tdb growing after each recovery
Andrew Tridgell [Thu, 7 Feb 2008 12:01:06 +0000 (23:01 +1100)]
fixed a problem with tdb growing after each recovery

16 years agodont use absolute pathnames for the netstat tool
Ronnie Sahlberg [Thu, 7 Feb 2008 04:41:48 +0000 (15:41 +1100)]
dont use absolute pathnames for the netstat tool
it can be either in /bin or /usr/bin

16 years agodont use an absolute pathname for the touch command
Ronnie Sahlberg [Thu, 7 Feb 2008 04:38:59 +0000 (15:38 +1100)]
dont use an absolute pathname for the touch command

16 years agodont use an absolute pathname for the iptables tool
Ronnie Sahlberg [Thu, 7 Feb 2008 04:36:26 +0000 (15:36 +1100)]
dont use an absolute pathname for the iptables tool