NEWS

   1 Changes in CTDB 2.5.2
   2 =====================
   3
   4 User-visible changes
   5 --------------------
   6
   7 * Much improved manpages from CTDB 2.5 are now installed and packaged.
   8
   9 Important bug fixes
  10 -------------------
  11
  12 * "ctdb reloadips" now waits for replies to addip/delip controls
  13   before returning.
  14
  15 Important internal changes
  16 --------------------------
  17
  18 * The event scripts are now executed using vfork(2) and a helper
  19   binary instead of fork(2) providing a performance improvement.
  20
  21 * "ctdb reloadips" will now works if some nodes are inactive.  This
  22   means that public IP addresses can be reconfigured even if nodes
  23   are stopped.
  24
  25
  26 Changes in CTDB 2.5.1
  27 =====================
  28
  29 Important bug fixes
  30 -------------------
  31
  32 * The locking code now correctly implements a per-database active
  33   locks limit.  Whole database lock requests can no longer be denied
  34   because there are too many active locks - this is particularly
  35   important for freezing databases during recovery.
  36
  37 * The debug_locks.sh script locks against itself.  If it is already
  38   running then subsequent invocations will exit immediately.
  39
  40 * ctdb tool commands that operate on databases now work correctly when
  41   a database ID is given.
  42
  43 * Various code fixes for issues found by Coverity.
  44
  45 Important internal changes
  46 --------------------------
  47
  48 * statd-callout has been updated so that statd client information is
  49   always up-to-date across the cluster.  This is implemented by
  50   storing the client information in a persistent database using a new
  51   "ctdb ptrans" command.
  52
  53 * The transaction code for persistent databases now retries until it
  54   is able to take the transaction lock.  This makes the transation
  55   semantics compatible with Samba's implementation.
  56
  57 * Locking helpers are created with vfork(2) instead of fork(2),
  58   providing a performance improvement.
  59
  60 * config.guess has been updated to the latest upstream version so CTDB
  61   should build on more platforms.
  62
  63
  64 Changes in CTDB 2.5
  65 ===================
  66
  67 User-visible changes
  68 --------------------
  69
  70 * The default location of the ctdbd socket is now:
  71
  72     /var/run/ctdb/ctdbd.socket
  73
  74   If you currently set CTDB_SOCKET in configuration then unsetting it
  75   will probably do what you want.
  76
  77 * The default location of CTDB TDB databases is now:
  78
  79     /var/lib/ctdb
  80
  81   If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
  82   you probably want to move your databases to /var/lib/ctdb, drop your
  83   setting of CTDB_DBDIR and just use the default.
  84
  85   To maintain the database files in /var/ctdb you will need to set
  86   CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
  87   these have moved.
  88
  89 * Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
  90   supported.  Please use individual configuration variables instead.
  91
  92 * Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
  93   VacuumMaxInterval have been removed.  Setting them had no effect but
  94   if you now try to set them in a configuration files via CTDB_SET_X=Y
  95   then CTDB will not start.
  96
  97 * Much improved manual pages.  Added new manpages ctdb(7),
  98   ctdbd.conf(5), ctdb-tunables(7).  Still some work to do.
  99
 100 * Most CTDB-specific configuration can now be set in
 101   /etc/ctdb/ctdbd.conf.
 102
 103   This avoids cluttering distribution-specific configuration files,
 104   such as /etc/sysconfig/ctdb.  It also means that we can say: see
 105   ctdbd.conf(5) for more details.  :-)
 106
 107 * Configuration variable NFS_SERVER_MODE is deprecated and has been
 108   replaced by CTDB_NFS_SERVER_MODE.  See ctdbd.conf(5) for more
 109   details.
 110
 111 * "ctdb reloadips" is much improved and should be used for reloading
 112   the public IP configuration.
 113
 114   This commands attempts to yield much more predictable IP allocations
 115   than using sequences of delip and addip commands.  See ctdb(1) for
 116   details.
 117
 118 * Ability to pass comma-separated string to ctdb(1) tool commands via
 119   the -n option is now documented and works for most commands.  See
 120   ctdb(1) for details.
 121
 122 * "ctdb rebalancenode" is now a debugging command and should not be
 123   used in normal operation.  See ctdb(1) for details.
 124
 125 * "ctdb ban 0" is now invalid.
 126
 127   This was documented as causing a permanent ban.  However, this was
 128   not implemented and caused an "unban" instead.  To avoid confusion,
 129   0 is now an invalid ban duration.  To administratively "ban" a node
 130   use "ctdb stop" instead.
 131
 132 * The systemd configuration now puts the PID file in /run/ctdb (rather
 133   than /run/ctdbd) for consistency with the initscript and other uses
 134   of /var/run/ctdb.
 135
 136 Important bug fixes
 137 -------------------
 138
 139 * Traverse regression fixed.
 140
 141 * The default recovery method for persistent databases has been
 142   changed to use database sequence numbers instead of doing
 143   record-by-record recovery (using record sequence numbers).  This
 144   fixes issues including registry corruption.
 145
 146 * Banned nodes are no longer told to run the "ipreallocated" event
 147   during a takeover run, when in fallback mode with nodes that don't
 148   support the IPREALLOCATED control.
 149
 150 Important internal changes
 151 --------------------------
 152
 153 * Persistent transactions are now compatible with Samba and work
 154   reliably.
 155
 156 * The recovery master role has been made more stable by resetting the
 157   priority time each time a node becomes inactive.  This means that
 158   nodes that are active for a long time are more likely to retain the
 159   recovery master role.
 160
 161 * The incomplete libctdb library has been removed.
 162
 163 * Test suite now starts ctdbd with the --sloppy-start option to speed
 164   up startup.  However, this should not be done in production.
 165
 166
 167 Changes in CTDB 2.4
 168 ===================
 169
 170 User-visible changes
 171 --------------------
 172
 173 * A missing network interface now causes monitoring to fail and the
 174   node to become unhealthy.
 175
 176 * Changed ctdb command's default control timeout from 3s to 10s.
 177
 178 * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
 179   to provide more information.
 180
 181 Important bug fixes
 182 -------------------
 183
 184 * Starting CTDB daemon by running ctdbd directly should not remove
 185   existing unix socket unconditionally.
 186
 187 * ctdbd once again successfully kills client processes on releasing
 188   public IPs.  It was checking for them as tracked child processes
 189   and not finding them, so wasn't killing them.
 190
 191 * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
 192   ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
 193
 194 * Always use Jenkins hash when creating volatile databases.  There
 195   were a few places where TDBs would be attached with the wrong flags.
 196
 197 * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
 198   which led to header corruption for empty records.  This resulted
 199   in inconsistent headers on two nodes and a request for such a record
 200   keeps bouncing between nodes indefinitely and logs "High hopcount"
 201   messages in the log. This also caused performance degradation.
 202
 203 * ctdbd was losing log messages at shutdown because they weren't being
 204   given time to flush.  ctdbd now sleeps for a second during shutdown
 205   to allow time to flush log messages.
 206
 207 * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
 208   process a large number of packets available on single FD before
 209   polling other FDs.  Use fixed size queue buffers to allow fair
 210   scheduling across multiple FDs.
 211
 212 Important internal changes
 213 --------------------------
 214
 215 * A node that fails to take/release multiple IPs will only incur a
 216   single banning credit.  This makes a brief failure less likely to
 217   cause node to be banned.
 218
 219 * ctdb killtcp has been changed to read connections from stdin and
 220   10.interface now uses this feature to improve the time taken to kill
 221   connections.
 222
 223 * Improvements to hot records statistics in ctdb dbstatistics.
 224
 225 * Recovery daemon now assembles up-to-date node flags information
 226   from remote nodes before checking if any flags are inconsistent and
 227   forcing a recovery.
 228
 229 * ctdbd no longer creates multiple lock sub-processes for the same
 230   key.  This reduces the number of lock sub-processes substantially.
 231
 232 * Changed the nfsd RPC check failure policy to failover quickly
 233   instead of trying to repair a node first by restarting NFS.  Such
 234   restarts would often hang if the cause of the RPC check failure was
 235   the cluster filesystem or storage.
 236
 237 * Logging improvements relating to high hopcounts and sticky records.
 238
 239 * Make sure lower level tdb messages are logged correctly.
 240
 241 * CTDB commands disable/enable/stop/continue are now resilient to
 242   individual control failures and retry in case of failures.
 243
 244
 245 Changes in CTDB 2.3
 246 ===================
 247
 248 User-visible changes
 249 --------------------
 250
 251 * 2 new configuration variables for 60.nfs eventscript:
 252
 253   - CTDB_MONITOR_NFS_THREAD_COUNT
 254   - CTDB_NFS_DUMP_STUCK_THREADS
 255
 256   See ctdb.sysconfig for details.
 257
 258 * Removed DeadlockTimeout tunable.  To enable debug of locking issues set
 259
 260    CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
 261
 262 * In overall statistics and database statistics, lock buckets have been
 263   updated to use following timings:
 264
 265    < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
 266
 267 * Initscript is now simplified with most CTDB-specific functionality
 268   split out to ctdbd_wrapper, which is used to start and stop ctdbd.
 269
 270 * Add systemd support.
 271
 272 * CTDB subprocesses are now given informative names to allow them to
 273   be easily distinguished when using programs like "top" or "perf".
 274
 275 Important bug fixes
 276 -------------------
 277
 278 * ctdb tool should not exit from a retry loop if a control times out
 279   (e.g. under high load).  This simple fix will stop an exit from the
 280   retry loop on any error.
 281
 282 * When updating flags on all nodes, use the correct updated flags.  This
 283   should avoid wrong flag change messages in the logs.
 284
 285 * The recovery daemon will not ban other nodes if the current node
 286   is banned.
 287
 288 * ctdb dbstatistics command now correctly outputs database statistics.
 289
 290 * Fixed a panic with overlapping shutdowns (regression in 2.2).
 291
 292 * Fixed 60.ganesha "monitor" event (regression in 2.2).
 293
 294 * Fixed a buffer overflow in the "reloadips" implementation.
 295
 296 * Fixed segmentation faults in ping_pong (called with incorrect
 297   argument) and test binaries (called when ctdbd not running).
 298
 299 Important internal changes
 300 --------------------------
 301
 302 * The recovery daemon on stopped or banned node will stop participating in any
 303   cluster activity.
 304
 305 * Improve cluster wide database traverse by sending the records directly from
 306   traverse child process to requesting node.
 307
 308 * TDB checking and dropping of all IPs moved from initscript to "init"
 309   event in 00.ctdb.
 310
 311 * To avoid "rogue IPs" the release IP callback now fails if the
 312   released IP is still present on an interface.
 313
 314
 315 Changes in CTDB 2.2
 316 ===================
 317
 318 User-visible changes
 319 --------------------
 320
 321 * The "stopped" event has been removed.
 322
 323   The "ipreallocated" event is now run when a node is stopped.  Use
 324   this instead of "stopped".
 325
 326 * New --pidfile option for ctdbd, used by initscript
 327
 328 * The 60.nfs eventscript now uses configuration files in
 329   /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
 330   hardcoding them into the script.
 331
 332 * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
 333
 334 * The NoIPTakeoverOnDisabled tunable has been renamed to
 335   NoIPHostOnAllDisabled and now works properly when set on individual
 336   nodes.
 337
 338 * New ctdb subcommand "runstate" prints the current internal runstate.
 339   Runstates are used for serialising startup.
 340
 341 Important bug fixes
 342 -------------------
 343
 344 * The Unix domain socket is now set to non-blocking after the
 345   connection succeeds.  This avoids connections failing with EAGAIN
 346   and not being retried.
 347
 348 * Fetching from the log ringbuffer now succeeds if the buffer is full.
 349
 350 * Fix a severe recovery bug that can lead to data corruption for SMB clients.
 351
 352 * The statd-callout script now runs as root via sudo.
 353
 354 * "ctdb delip" no longer fails if it is unable to move the IP.
 355
 356 * A race in the ctdb tool's ipreallocate code was fixed.  This fixes
 357   potential bugs in the "disable", "enable", "stop", "continue",
 358   "ban", "unban", "ipreallocate" and "sync" commands.
 359
 360 * The monitor cancellation code could sometimes hang indefinitely.
 361   This could cause "ctdb stop" and "ctdb shutdown" to fail.
 362
 363 Important internal changes
 364 --------------------------
 365
 366 * The socket I/O handling has been optimised to improve performance.
 367
 368 * IPs will not be assigned to nodes during CTDB initialisation.  They
 369   will only be assigned to nodes that are in the "running" runstate.
 370
 371 * Improved database locking code.  One improvement is to use a
 372   standalone locking helper executable - the avoids creating many
 373   forked copies of ctdbd and potentially running a node out of memory.
 374
 375 * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
 376   "ipreallocated" events.
 377
 378 * Message handlers are now indexed, providing a significant
 379   performance improvement.