NEWS

   1 Changes in CTDB 2.5.4
   2 =====================
   3
   4 User-visible changes
   5 --------------------
   6
   7 * New command "ctdb detach" to detach a database.
   8
   9 * Support for TDB robust mutexes.  To enable set TDBMutexEnabled=1.
  10   The setting is per node.
  11
  12 * New manual page ctdb-statistics.7.
  13
  14 Important bug fixes
  15 -------------------
  16
  17 * Verify policy routing configuration when starting up to make sure that policy
  18   routing tables do not override default routing tables.
  19
  20 * "ctdb scriptstatus" should correctly list the number of scripts executed.
  21
  22 * Do not run eventscripts at real-time priority.
  23
  24 * Make sure "ctdb restoredb" and "ctdb wipedb" cannot affect an ongoing
  25   recovery.
  26
  27 * If a readonly record revokation fails, CTDB does not abort anymore.  It will
  28   retry revoke.
  29
  30 * pending_calls statistic now gets updated correctly.
  31
  32 Important internal changes
  33 --------------------------
  34
  35 * Vacuuming performance has been improved.
  36
  37 * Fix the order of setting recovery mode and freezing databases.
  38
  39 * Remove NAT gateway "monitor" event.
  40
  41 * Add per database queue for lock requests.  This improves the lock
  42   scheduling performance.
  43
  44 * When processing dmaster packets (DMASTER_REQUEST and DMASTER_REPLY) defer all
  45   call processing for that record.  This avoids the temporary inconsistency in
  46   dmaster information which causes rapid bouncing of call request between two
  47   nodes.
  48
  49 * Correctly capture the output from lock helper processes, so it can be logged.
  50
  51 * Many test improvements and additions.
  52
  53
  54 Changes in CTDB 2.5.3
  55 =====================
  56
  57 User-visible changes
  58 --------------------
  59
  60 * New configuration variable CTDB_NATGW_STATIC_ROUTES allows NAT
  61   gateway feature to create static host/network routes instead of
  62   default routes.  See the documentation.  Use with care.
  63
  64 Important bug fixes
  65 -------------------
  66
  67 * ctdbd no longer crashes when tickles are processed after reloading
  68   the nodes file.
  69
  70 * "ctdb reloadips" works as expected because the DEL_PUBLIC_IP control
  71   now waits until public IP addresses are released before returning.
  72
  73 Important internal changes
  74 --------------------------
  75
  76 * Vacuuming performance has been improved.
  77
  78 * Record locking now compares records based on their hashes to avoid
  79   scheduling multiple requests for records on the same hashchain.
  80
  81 * An internal timeout for revoking read-only record relegations has
  82   been changed from hard-coded 5 seconds to the value of the
  83   ControlTimeout tunable.  This makes it less likely that ctdbd will
  84   abort.
  85
  86 * Many test improvements and additions.
  87
  88
  89 Changes in CTDB 2.5.2
  90 =====================
  91
  92 User-visible changes
  93 --------------------
  94
  95 * Much improved manpages from CTDB 2.5 are now installed and packaged.
  96
  97 Important bug fixes
  98 -------------------
  99
 100 * "ctdb reloadips" now waits for replies to addip/delip controls
 101   before returning.
 102
 103 Important internal changes
 104 --------------------------
 105
 106 * The event scripts are now executed using vfork(2) and a helper
 107   binary instead of fork(2) providing a performance improvement.
 108
 109 * "ctdb reloadips" will now works if some nodes are inactive.  This
 110   means that public IP addresses can be reconfigured even if nodes
 111   are stopped.
 112
 113
 114 Changes in CTDB 2.5.1
 115 =====================
 116
 117 Important bug fixes
 118 -------------------
 119
 120 * The locking code now correctly implements a per-database active
 121   locks limit.  Whole database lock requests can no longer be denied
 122   because there are too many active locks - this is particularly
 123   important for freezing databases during recovery.
 124
 125 * The debug_locks.sh script locks against itself.  If it is already
 126   running then subsequent invocations will exit immediately.
 127
 128 * ctdb tool commands that operate on databases now work correctly when
 129   a database ID is given.
 130
 131 * Various code fixes for issues found by Coverity.
 132
 133 Important internal changes
 134 --------------------------
 135
 136 * statd-callout has been updated so that statd client information is
 137   always up-to-date across the cluster.  This is implemented by
 138   storing the client information in a persistent database using a new
 139   "ctdb ptrans" command.
 140
 141 * The transaction code for persistent databases now retries until it
 142   is able to take the transaction lock.  This makes the transation
 143   semantics compatible with Samba's implementation.
 144
 145 * Locking helpers are created with vfork(2) instead of fork(2),
 146   providing a performance improvement.
 147
 148 * config.guess has been updated to the latest upstream version so CTDB
 149   should build on more platforms.
 150
 151
 152 Changes in CTDB 2.5
 153 ===================
 154
 155 User-visible changes
 156 --------------------
 157
 158 * The default location of the ctdbd socket is now:
 159
 160     /var/run/ctdb/ctdbd.socket
 161
 162   If you currently set CTDB_SOCKET in configuration then unsetting it
 163   will probably do what you want.
 164
 165 * The default location of CTDB TDB databases is now:
 166
 167     /var/lib/ctdb
 168
 169   If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
 170   you probably want to move your databases to /var/lib/ctdb, drop your
 171   setting of CTDB_DBDIR and just use the default.
 172
 173   To maintain the database files in /var/ctdb you will need to set
 174   CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
 175   these have moved.
 176
 177 * Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
 178   supported.  Please use individual configuration variables instead.
 179
 180 * Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
 181   VacuumMaxInterval have been removed.  Setting them had no effect but
 182   if you now try to set them in a configuration files via CTDB_SET_X=Y
 183   then CTDB will not start.
 184
 185 * Much improved manual pages.  Added new manpages ctdb(7),
 186   ctdbd.conf(5), ctdb-tunables(7).  Still some work to do.
 187
 188 * Most CTDB-specific configuration can now be set in
 189   /etc/ctdb/ctdbd.conf.
 190
 191   This avoids cluttering distribution-specific configuration files,
 192   such as /etc/sysconfig/ctdb.  It also means that we can say: see
 193   ctdbd.conf(5) for more details.  :-)
 194
 195 * Configuration variable NFS_SERVER_MODE is deprecated and has been
 196   replaced by CTDB_NFS_SERVER_MODE.  See ctdbd.conf(5) for more
 197   details.
 198
 199 * "ctdb reloadips" is much improved and should be used for reloading
 200   the public IP configuration.
 201
 202   This commands attempts to yield much more predictable IP allocations
 203   than using sequences of delip and addip commands.  See ctdb(1) for
 204   details.
 205
 206 * Ability to pass comma-separated string to ctdb(1) tool commands via
 207   the -n option is now documented and works for most commands.  See
 208   ctdb(1) for details.
 209
 210 * "ctdb rebalancenode" is now a debugging command and should not be
 211   used in normal operation.  See ctdb(1) for details.
 212
 213 * "ctdb ban 0" is now invalid.
 214
 215   This was documented as causing a permanent ban.  However, this was
 216   not implemented and caused an "unban" instead.  To avoid confusion,
 217   0 is now an invalid ban duration.  To administratively "ban" a node
 218   use "ctdb stop" instead.
 219
 220 * The systemd configuration now puts the PID file in /run/ctdb (rather
 221   than /run/ctdbd) for consistency with the initscript and other uses
 222   of /var/run/ctdb.
 223
 224 Important bug fixes
 225 -------------------
 226
 227 * Traverse regression fixed.
 228
 229 * The default recovery method for persistent databases has been
 230   changed to use database sequence numbers instead of doing
 231   record-by-record recovery (using record sequence numbers).  This
 232   fixes issues including registry corruption.
 233
 234 * Banned nodes are no longer told to run the "ipreallocated" event
 235   during a takeover run, when in fallback mode with nodes that don't
 236   support the IPREALLOCATED control.
 237
 238 Important internal changes
 239 --------------------------
 240
 241 * Persistent transactions are now compatible with Samba and work
 242   reliably.
 243
 244 * The recovery master role has been made more stable by resetting the
 245   priority time each time a node becomes inactive.  This means that
 246   nodes that are active for a long time are more likely to retain the
 247   recovery master role.
 248
 249 * The incomplete libctdb library has been removed.
 250
 251 * Test suite now starts ctdbd with the --sloppy-start option to speed
 252   up startup.  However, this should not be done in production.
 253
 254
 255 Changes in CTDB 2.4
 256 ===================
 257
 258 User-visible changes
 259 --------------------
 260
 261 * A missing network interface now causes monitoring to fail and the
 262   node to become unhealthy.
 263
 264 * Changed ctdb command's default control timeout from 3s to 10s.
 265
 266 * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
 267   to provide more information.
 268
 269 Important bug fixes
 270 -------------------
 271
 272 * Starting CTDB daemon by running ctdbd directly should not remove
 273   existing unix socket unconditionally.
 274
 275 * ctdbd once again successfully kills client processes on releasing
 276   public IPs.  It was checking for them as tracked child processes
 277   and not finding them, so wasn't killing them.
 278
 279 * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
 280   ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
 281
 282 * Always use Jenkins hash when creating volatile databases.  There
 283   were a few places where TDBs would be attached with the wrong flags.
 284
 285 * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
 286   which led to header corruption for empty records.  This resulted
 287   in inconsistent headers on two nodes and a request for such a record
 288   keeps bouncing between nodes indefinitely and logs "High hopcount"
 289   messages in the log. This also caused performance degradation.
 290
 291 * ctdbd was losing log messages at shutdown because they weren't being
 292   given time to flush.  ctdbd now sleeps for a second during shutdown
 293   to allow time to flush log messages.
 294
 295 * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
 296   process a large number of packets available on single FD before
 297   polling other FDs.  Use fixed size queue buffers to allow fair
 298   scheduling across multiple FDs.
 299
 300 Important internal changes
 301 --------------------------
 302
 303 * A node that fails to take/release multiple IPs will only incur a
 304   single banning credit.  This makes a brief failure less likely to
 305   cause node to be banned.
 306
 307 * ctdb killtcp has been changed to read connections from stdin and
 308   10.interface now uses this feature to improve the time taken to kill
 309   connections.
 310
 311 * Improvements to hot records statistics in ctdb dbstatistics.
 312
 313 * Recovery daemon now assembles up-to-date node flags information
 314   from remote nodes before checking if any flags are inconsistent and
 315   forcing a recovery.
 316
 317 * ctdbd no longer creates multiple lock sub-processes for the same
 318   key.  This reduces the number of lock sub-processes substantially.
 319
 320 * Changed the nfsd RPC check failure policy to failover quickly
 321   instead of trying to repair a node first by restarting NFS.  Such
 322   restarts would often hang if the cause of the RPC check failure was
 323   the cluster filesystem or storage.
 324
 325 * Logging improvements relating to high hopcounts and sticky records.
 326
 327 * Make sure lower level tdb messages are logged correctly.
 328
 329 * CTDB commands disable/enable/stop/continue are now resilient to
 330   individual control failures and retry in case of failures.
 331
 332
 333 Changes in CTDB 2.3
 334 ===================
 335
 336 User-visible changes
 337 --------------------
 338
 339 * 2 new configuration variables for 60.nfs eventscript:
 340
 341   - CTDB_MONITOR_NFS_THREAD_COUNT
 342   - CTDB_NFS_DUMP_STUCK_THREADS
 343
 344   See ctdb.sysconfig for details.
 345
 346 * Removed DeadlockTimeout tunable.  To enable debug of locking issues set
 347
 348    CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
 349
 350 * In overall statistics and database statistics, lock buckets have been
 351   updated to use following timings:
 352
 353    < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
 354
 355 * Initscript is now simplified with most CTDB-specific functionality
 356   split out to ctdbd_wrapper, which is used to start and stop ctdbd.
 357
 358 * Add systemd support.
 359
 360 * CTDB subprocesses are now given informative names to allow them to
 361   be easily distinguished when using programs like "top" or "perf".
 362
 363 Important bug fixes
 364 -------------------
 365
 366 * ctdb tool should not exit from a retry loop if a control times out
 367   (e.g. under high load).  This simple fix will stop an exit from the
 368   retry loop on any error.
 369
 370 * When updating flags on all nodes, use the correct updated flags.  This
 371   should avoid wrong flag change messages in the logs.
 372
 373 * The recovery daemon will not ban other nodes if the current node
 374   is banned.
 375
 376 * ctdb dbstatistics command now correctly outputs database statistics.
 377
 378 * Fixed a panic with overlapping shutdowns (regression in 2.2).
 379
 380 * Fixed 60.ganesha "monitor" event (regression in 2.2).
 381
 382 * Fixed a buffer overflow in the "reloadips" implementation.
 383
 384 * Fixed segmentation faults in ping_pong (called with incorrect
 385   argument) and test binaries (called when ctdbd not running).
 386
 387 Important internal changes
 388 --------------------------
 389
 390 * The recovery daemon on stopped or banned node will stop participating in any
 391   cluster activity.
 392
 393 * Improve cluster wide database traverse by sending the records directly from
 394   traverse child process to requesting node.
 395
 396 * TDB checking and dropping of all IPs moved from initscript to "init"
 397   event in 00.ctdb.
 398
 399 * To avoid "rogue IPs" the release IP callback now fails if the
 400   released IP is still present on an interface.
 401
 402
 403 Changes in CTDB 2.2
 404 ===================
 405
 406 User-visible changes
 407 --------------------
 408
 409 * The "stopped" event has been removed.
 410
 411   The "ipreallocated" event is now run when a node is stopped.  Use
 412   this instead of "stopped".
 413
 414 * New --pidfile option for ctdbd, used by initscript
 415
 416 * The 60.nfs eventscript now uses configuration files in
 417   /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
 418   hardcoding them into the script.
 419
 420 * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
 421
 422 * The NoIPTakeoverOnDisabled tunable has been renamed to
 423   NoIPHostOnAllDisabled and now works properly when set on individual
 424   nodes.
 425
 426 * New ctdb subcommand "runstate" prints the current internal runstate.
 427   Runstates are used for serialising startup.
 428
 429 Important bug fixes
 430 -------------------
 431
 432 * The Unix domain socket is now set to non-blocking after the
 433   connection succeeds.  This avoids connections failing with EAGAIN
 434   and not being retried.
 435
 436 * Fetching from the log ringbuffer now succeeds if the buffer is full.
 437
 438 * Fix a severe recovery bug that can lead to data corruption for SMB clients.
 439
 440 * The statd-callout script now runs as root via sudo.
 441
 442 * "ctdb delip" no longer fails if it is unable to move the IP.
 443
 444 * A race in the ctdb tool's ipreallocate code was fixed.  This fixes
 445   potential bugs in the "disable", "enable", "stop", "continue",
 446   "ban", "unban", "ipreallocate" and "sync" commands.
 447
 448 * The monitor cancellation code could sometimes hang indefinitely.
 449   This could cause "ctdb stop" and "ctdb shutdown" to fail.
 450
 451 Important internal changes
 452 --------------------------
 453
 454 * The socket I/O handling has been optimised to improve performance.
 455
 456 * IPs will not be assigned to nodes during CTDB initialisation.  They
 457   will only be assigned to nodes that are in the "running" runstate.
 458
 459 * Improved database locking code.  One improvement is to use a
 460   standalone locking helper executable - the avoids creating many
 461   forked copies of ctdbd and potentially running a node out of memory.
 462
 463 * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
 464   "ipreallocated" events.
 465
 466 * Message handlers are now indexed, providing a significant
 467   performance improvement.