NEWS

   1 Changes in CTDB 2.4
   2 ===================
   3
   4 User-visible changes
   5 --------------------
   6
   7 * A missing network interface now causes monitoring to fail and the
   8   node to become unhealthy.
   9
  10 * Changed ctdb command's default control timeout from 3s to 10s.
  11
  12 * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
  13   to provide more information.
  14
  15 Important bug fixes
  16 -------------------
  17
  18 * Starting CTDB daemon by running ctdbd directly should not remove
  19   existing unix socket unconditionally.
  20
  21 * ctdbd once again successfully kills client processes on releasing
  22   public IPs.  It was checking for them as tracked child processes
  23   and not finding them, so wasn't killing them.
  24
  25 * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
  26   ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
  27
  28 * Always use Jenkins hash when creating volatile databases.  There
  29   were a few places where TDBs would be attached with the wrong flags.
  30
  31 * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
  32   which led to header corruption for empty records.  This resulted
  33   in inconsistent headers on two nodes and a request for such a record
  34   keeps bouncing between nodes indefinitely and logs "High hopcount"
  35   messages in the log. This also caused performance degradation.
  36
  37 * ctdbd was losing log messages at shutdown because they weren't being
  38   given time to flush.  ctdbd now sleeps for a second during shutdown
  39   to allow time to flush log messages.
  40
  41 * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
  42   process a large number of packets available on single FD before
  43   polling other FDs.  Use fixed size queue buffers to allow fair
  44   scheduling across multiple FDs.
  45
  46 Important internal changes
  47 --------------------------
  48
  49 * A node that fails to take/release multiple IPs will only incur a
  50   single banning credit.  This makes a brief failure less likely to
  51   cause node to be banned.
  52
  53 * ctdb killtcp has been changed to read connections from stdin and
  54   10.interface now uses this feature to improve the time taken to kill
  55   connections.
  56
  57 * Improvements to hot records statistics in ctdb dbstatistics.
  58
  59 * Recovery daemon now assembles up-to-date node flags information
  60   from remote nodes before checking if any flags are inconsistent and
  61   forcing a recovery.
  62
  63 * ctdbd no longer creates multiple lock sub-processes for the same
  64   key.  This reduces the number of lock sub-processes substantially.
  65
  66 * Changed the nfsd RPC check failure policy to failover quickly
  67   instead of trying to repair a node first by restarting NFS.  Such
  68   restarts would often hang if the cause of the RPC check failure was
  69   the cluster filesystem or storage.
  70
  71 * Logging improvements relating to high hopcounts and sticky records.
  72
  73 * Make sure lower level tdb messages are logged correctly.
  74
  75 * CTDB commands disable/enable/stop/continue are now resilient to
  76   individual control failures and retry in case of failures.
  77
  78
  79 Changes in CTDB 2.3
  80 ===================
  81
  82 User-visible changes
  83 --------------------
  84
  85 * 2 new configuration variables for 60.nfs eventscript:
  86
  87   - CTDB_MONITOR_NFS_THREAD_COUNT
  88   - CTDB_NFS_DUMP_STUCK_THREADS
  89
  90   See ctdb.sysconfig for details.
  91
  92 * Removed DeadlockTimeout tunable.  To enable debug of locking issues set
  93
  94    CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
  95
  96 * In overall statistics and database statistics, lock buckets have been
  97   updated to use following timings:
  98
  99    < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
 100
 101 * Initscript is now simplified with most CTDB-specific functionality
 102   split out to ctdbd_wrapper, which is used to start and stop ctdbd.
 103
 104 * Add systemd support.
 105
 106 * CTDB subprocesses are now given informative names to allow them to
 107   be easily distinguished when using programs like "top" or "perf".
 108
 109 Important bug fixes
 110 -------------------
 111
 112 * ctdb tool should not exit from a retry loop if a control times out
 113   (e.g. under high load).  This simple fix will stop an exit from the
 114   retry loop on any error.
 115
 116 * When updating flags on all nodes, use the correct updated flags.  This
 117   should avoid wrong flag change messages in the logs.
 118
 119 * The recovery daemon will not ban other nodes if the current node
 120   is banned.
 121
 122 * ctdb dbstatistics command now correctly outputs database statistics.
 123
 124 * Fixed a panic with overlapping shutdowns (regression in 2.2).
 125
 126 * Fixed 60.ganesha "monitor" event (regression in 2.2).
 127
 128 * Fixed a buffer overflow in the "reloadips" implementation.
 129
 130 * Fixed segmentation faults in ping_pong (called with incorrect
 131   argument) and test binaries (called when ctdbd not running).
 132
 133 Important internal changes
 134 --------------------------
 135
 136 * The recovery daemon on stopped or banned node will stop participating in any
 137   cluster activity.
 138
 139 * Improve cluster wide database traverse by sending the records directly from
 140   traverse child process to requesting node.
 141
 142 * TDB checking and dropping of all IPs moved from initscript to "init"
 143   event in 00.ctdb.
 144
 145 * To avoid "rogue IPs" the release IP callback now fails if the
 146   released IP is still present on an interface.
 147
 148
 149 Changes in CTDB 2.2
 150 ===================
 151
 152 User-visible changes
 153 --------------------
 154
 155 * The "stopped" event has been removed.
 156
 157   The "ipreallocated" event is now run when a node is stopped.  Use
 158   this instead of "stopped".
 159
 160 * New --pidfile option for ctdbd, used by initscript
 161
 162 * The 60.nfs eventscript now uses configuration files in
 163   /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
 164   hardcoding them into the script.
 165
 166 * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
 167
 168 * The NoIPTakeoverOnDisabled tunable has been renamed to
 169   NoIPHostOnAllDisabled and now works properly when set on individual
 170   nodes.
 171
 172 * New ctdb subcommand "runstate" prints the current internal runstate.
 173   Runstates are used for serialising startup.
 174
 175 Important bug fixes
 176 -------------------
 177
 178 * The Unix domain socket is now set to non-blocking after the
 179   connection succeeds.  This avoids connections failing with EAGAIN
 180   and not being retried.
 181
 182 * Fetching from the log ringbuffer now succeeds if the buffer is full.
 183
 184 * Fix a severe recovery bug that can lead to data corruption for SMB clients.
 185
 186 * The statd-callout script now runs as root via sudo.
 187
 188 * "ctdb delip" no longer fails if it is unable to move the IP.
 189
 190 * A race in the ctdb tool's ipreallocate code was fixed.  This fixes
 191   potential bugs in the "disable", "enable", "stop", "continue",
 192   "ban", "unban", "ipreallocate" and "sync" commands.
 193
 194 * The monitor cancellation code could sometimes hang indefinitely.
 195   This could cause "ctdb stop" and "ctdb shutdown" to fail.
 196
 197 Important internal changes
 198 --------------------------
 199
 200 * The socket I/O handling has been optimised to improve performance.
 201
 202 * IPs will not be assigned to nodes during CTDB initialisation.  They
 203   will only be assigned to nodes that are in the "running" runstate.
 204
 205 * Improved database locking code.  One improvement is to use a
 206   standalone locking helper executable - the avoids creating many
 207   forked copies of ctdbd and potentially running a node out of memory.
 208
 209 * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
 210   "ipreallocated" events.
 211
 212 * Message handlers are now indexed, providing a significant
 213   performance improvement.