NEWS

   1 Changes in CTDB 2.5.5
   2 =====================
   3
   4 User-visible changes
   5 --------------------
   6
   7 * Dump stack traces for hung RPC processes (mountd, rquotad, statd)
   8
   9 * Add vaccuming latency to database statistics
  10
  11 * Add -X option to ctdb tool that uses '|' as a separator to correctly handle
  12   IPv6 addresses
  13
  14 * Configuration variable VerifyRecoveryLock is now marked obsolete
  15
  16 * Improved log messages when trying to set obsolete tunables
  17
  18 Important bug fixes
  19 -------------------
  20
  21 * Fix handling of IPv6 addresses
  22
  23 * Use correct tdb flags when using robust mutex feature
  24
  25 * Fix regression in client socket handling code
  26
  27 * Fix regression in statd callout and make it more scalable
  28
  29 * Change default log level to NOTICE, so the messages are correctly
  30   displayed from CTDB tool
  31
  32 Important internal changes
  33 --------------------------
  34
  35 * Vacuuming performance improvements
  36    - stagger multiple vacuuming child processes
  37    - process all vacuum fetch requests in a loop
  38
  39 * Improve handling of recovery lock
  40
  41 * Many test improvements and additions.
  42
  43 * Avoid logging every 10 seconds for locks that could not be obtained to prevent
  44   flooding logs
  45
  46
  47 Changes in CTDB 2.5.4
  48 =====================
  49
  50 User-visible changes
  51 --------------------
  52
  53 * New command "ctdb detach" to detach a database.
  54
  55 * Support for TDB robust mutexes.  To enable set TDBMutexEnabled=1.
  56   The setting is per node.
  57
  58 * New manual page ctdb-statistics.7.
  59
  60 Important bug fixes
  61 -------------------
  62
  63 * Verify policy routing configuration when starting up to make sure that policy
  64   routing tables do not override default routing tables.
  65
  66 * "ctdb scriptstatus" should correctly list the number of scripts executed.
  67
  68 * Do not run eventscripts at real-time priority.
  69
  70 * Make sure "ctdb restoredb" and "ctdb wipedb" cannot affect an ongoing
  71   recovery.
  72
  73 * If a readonly record revokation fails, CTDB does not abort anymore.  It will
  74   retry revoke.
  75
  76 * pending_calls statistic now gets updated correctly.
  77
  78 Important internal changes
  79 --------------------------
  80
  81 * Vacuuming performance has been improved.
  82
  83 * Fix the order of setting recovery mode and freezing databases.
  84
  85 * Remove NAT gateway "monitor" event.
  86
  87 * Add per database queue for lock requests.  This improves the lock
  88   scheduling performance.
  89
  90 * When processing dmaster packets (DMASTER_REQUEST and DMASTER_REPLY) defer all
  91   call processing for that record.  This avoids the temporary inconsistency in
  92   dmaster information which causes rapid bouncing of call request between two
  93   nodes.
  94
  95 * Correctly capture the output from lock helper processes, so it can be logged.
  96
  97 * Many test improvements and additions.
  98
  99
 100 Changes in CTDB 2.5.3
 101 =====================
 102
 103 User-visible changes
 104 --------------------
 105
 106 * New configuration variable CTDB_NATGW_STATIC_ROUTES allows NAT
 107   gateway feature to create static host/network routes instead of
 108   default routes.  See the documentation.  Use with care.
 109
 110 Important bug fixes
 111 -------------------
 112
 113 * ctdbd no longer crashes when tickles are processed after reloading
 114   the nodes file.
 115
 116 * "ctdb reloadips" works as expected because the DEL_PUBLIC_IP control
 117   now waits until public IP addresses are released before returning.
 118
 119 Important internal changes
 120 --------------------------
 121
 122 * Vacuuming performance has been improved.
 123
 124 * Record locking now compares records based on their hashes to avoid
 125   scheduling multiple requests for records on the same hashchain.
 126
 127 * An internal timeout for revoking read-only record relegations has
 128   been changed from hard-coded 5 seconds to the value of the
 129   ControlTimeout tunable.  This makes it less likely that ctdbd will
 130   abort.
 131
 132 * Many test improvements and additions.
 133
 134
 135 Changes in CTDB 2.5.2
 136 =====================
 137
 138 User-visible changes
 139 --------------------
 140
 141 * Much improved manpages from CTDB 2.5 are now installed and packaged.
 142
 143 Important bug fixes
 144 -------------------
 145
 146 * "ctdb reloadips" now waits for replies to addip/delip controls
 147   before returning.
 148
 149 Important internal changes
 150 --------------------------
 151
 152 * The event scripts are now executed using vfork(2) and a helper
 153   binary instead of fork(2) providing a performance improvement.
 154
 155 * "ctdb reloadips" will now works if some nodes are inactive.  This
 156   means that public IP addresses can be reconfigured even if nodes
 157   are stopped.
 158
 159
 160 Changes in CTDB 2.5.1
 161 =====================
 162
 163 Important bug fixes
 164 -------------------
 165
 166 * The locking code now correctly implements a per-database active
 167   locks limit.  Whole database lock requests can no longer be denied
 168   because there are too many active locks - this is particularly
 169   important for freezing databases during recovery.
 170
 171 * The debug_locks.sh script locks against itself.  If it is already
 172   running then subsequent invocations will exit immediately.
 173
 174 * ctdb tool commands that operate on databases now work correctly when
 175   a database ID is given.
 176
 177 * Various code fixes for issues found by Coverity.
 178
 179 Important internal changes
 180 --------------------------
 181
 182 * statd-callout has been updated so that statd client information is
 183   always up-to-date across the cluster.  This is implemented by
 184   storing the client information in a persistent database using a new
 185   "ctdb ptrans" command.
 186
 187 * The transaction code for persistent databases now retries until it
 188   is able to take the transaction lock.  This makes the transation
 189   semantics compatible with Samba's implementation.
 190
 191 * Locking helpers are created with vfork(2) instead of fork(2),
 192   providing a performance improvement.
 193
 194 * config.guess has been updated to the latest upstream version so CTDB
 195   should build on more platforms.
 196
 197
 198 Changes in CTDB 2.5
 199 ===================
 200
 201 User-visible changes
 202 --------------------
 203
 204 * The default location of the ctdbd socket is now:
 205
 206     /var/run/ctdb/ctdbd.socket
 207
 208   If you currently set CTDB_SOCKET in configuration then unsetting it
 209   will probably do what you want.
 210
 211 * The default location of CTDB TDB databases is now:
 212
 213     /var/lib/ctdb
 214
 215   If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
 216   you probably want to move your databases to /var/lib/ctdb, drop your
 217   setting of CTDB_DBDIR and just use the default.
 218
 219   To maintain the database files in /var/ctdb you will need to set
 220   CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
 221   these have moved.
 222
 223 * Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
 224   supported.  Please use individual configuration variables instead.
 225
 226 * Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
 227   VacuumMaxInterval have been removed.  Setting them had no effect but
 228   if you now try to set them in a configuration files via CTDB_SET_X=Y
 229   then CTDB will not start.
 230
 231 * Much improved manual pages.  Added new manpages ctdb(7),
 232   ctdbd.conf(5), ctdb-tunables(7).  Still some work to do.
 233
 234 * Most CTDB-specific configuration can now be set in
 235   /etc/ctdb/ctdbd.conf.
 236
 237   This avoids cluttering distribution-specific configuration files,
 238   such as /etc/sysconfig/ctdb.  It also means that we can say: see
 239   ctdbd.conf(5) for more details.  :-)
 240
 241 * Configuration variable NFS_SERVER_MODE is deprecated and has been
 242   replaced by CTDB_NFS_SERVER_MODE.  See ctdbd.conf(5) for more
 243   details.
 244
 245 * "ctdb reloadips" is much improved and should be used for reloading
 246   the public IP configuration.
 247
 248   This commands attempts to yield much more predictable IP allocations
 249   than using sequences of delip and addip commands.  See ctdb(1) for
 250   details.
 251
 252 * Ability to pass comma-separated string to ctdb(1) tool commands via
 253   the -n option is now documented and works for most commands.  See
 254   ctdb(1) for details.
 255
 256 * "ctdb rebalancenode" is now a debugging command and should not be
 257   used in normal operation.  See ctdb(1) for details.
 258
 259 * "ctdb ban 0" is now invalid.
 260
 261   This was documented as causing a permanent ban.  However, this was
 262   not implemented and caused an "unban" instead.  To avoid confusion,
 263   0 is now an invalid ban duration.  To administratively "ban" a node
 264   use "ctdb stop" instead.
 265
 266 * The systemd configuration now puts the PID file in /run/ctdb (rather
 267   than /run/ctdbd) for consistency with the initscript and other uses
 268   of /var/run/ctdb.
 269
 270 Important bug fixes
 271 -------------------
 272
 273 * Traverse regression fixed.
 274
 275 * The default recovery method for persistent databases has been
 276   changed to use database sequence numbers instead of doing
 277   record-by-record recovery (using record sequence numbers).  This
 278   fixes issues including registry corruption.
 279
 280 * Banned nodes are no longer told to run the "ipreallocated" event
 281   during a takeover run, when in fallback mode with nodes that don't
 282   support the IPREALLOCATED control.
 283
 284 Important internal changes
 285 --------------------------
 286
 287 * Persistent transactions are now compatible with Samba and work
 288   reliably.
 289
 290 * The recovery master role has been made more stable by resetting the
 291   priority time each time a node becomes inactive.  This means that
 292   nodes that are active for a long time are more likely to retain the
 293   recovery master role.
 294
 295 * The incomplete libctdb library has been removed.
 296
 297 * Test suite now starts ctdbd with the --sloppy-start option to speed
 298   up startup.  However, this should not be done in production.
 299
 300
 301 Changes in CTDB 2.4
 302 ===================
 303
 304 User-visible changes
 305 --------------------
 306
 307 * A missing network interface now causes monitoring to fail and the
 308   node to become unhealthy.
 309
 310 * Changed ctdb command's default control timeout from 3s to 10s.
 311
 312 * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
 313   to provide more information.
 314
 315 Important bug fixes
 316 -------------------
 317
 318 * Starting CTDB daemon by running ctdbd directly should not remove
 319   existing unix socket unconditionally.
 320
 321 * ctdbd once again successfully kills client processes on releasing
 322   public IPs.  It was checking for them as tracked child processes
 323   and not finding them, so wasn't killing them.
 324
 325 * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
 326   ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
 327
 328 * Always use Jenkins hash when creating volatile databases.  There
 329   were a few places where TDBs would be attached with the wrong flags.
 330
 331 * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
 332   which led to header corruption for empty records.  This resulted
 333   in inconsistent headers on two nodes and a request for such a record
 334   keeps bouncing between nodes indefinitely and logs "High hopcount"
 335   messages in the log. This also caused performance degradation.
 336
 337 * ctdbd was losing log messages at shutdown because they weren't being
 338   given time to flush.  ctdbd now sleeps for a second during shutdown
 339   to allow time to flush log messages.
 340
 341 * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
 342   process a large number of packets available on single FD before
 343   polling other FDs.  Use fixed size queue buffers to allow fair
 344   scheduling across multiple FDs.
 345
 346 Important internal changes
 347 --------------------------
 348
 349 * A node that fails to take/release multiple IPs will only incur a
 350   single banning credit.  This makes a brief failure less likely to
 351   cause node to be banned.
 352
 353 * ctdb killtcp has been changed to read connections from stdin and
 354   10.interface now uses this feature to improve the time taken to kill
 355   connections.
 356
 357 * Improvements to hot records statistics in ctdb dbstatistics.
 358
 359 * Recovery daemon now assembles up-to-date node flags information
 360   from remote nodes before checking if any flags are inconsistent and
 361   forcing a recovery.
 362
 363 * ctdbd no longer creates multiple lock sub-processes for the same
 364   key.  This reduces the number of lock sub-processes substantially.
 365
 366 * Changed the nfsd RPC check failure policy to failover quickly
 367   instead of trying to repair a node first by restarting NFS.  Such
 368   restarts would often hang if the cause of the RPC check failure was
 369   the cluster filesystem or storage.
 370
 371 * Logging improvements relating to high hopcounts and sticky records.
 372
 373 * Make sure lower level tdb messages are logged correctly.
 374
 375 * CTDB commands disable/enable/stop/continue are now resilient to
 376   individual control failures and retry in case of failures.
 377
 378
 379 Changes in CTDB 2.3
 380 ===================
 381
 382 User-visible changes
 383 --------------------
 384
 385 * 2 new configuration variables for 60.nfs eventscript:
 386
 387   - CTDB_MONITOR_NFS_THREAD_COUNT
 388   - CTDB_NFS_DUMP_STUCK_THREADS
 389
 390   See ctdb.sysconfig for details.
 391
 392 * Removed DeadlockTimeout tunable.  To enable debug of locking issues set
 393
 394    CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
 395
 396 * In overall statistics and database statistics, lock buckets have been
 397   updated to use following timings:
 398
 399    < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
 400
 401 * Initscript is now simplified with most CTDB-specific functionality
 402   split out to ctdbd_wrapper, which is used to start and stop ctdbd.
 403
 404 * Add systemd support.
 405
 406 * CTDB subprocesses are now given informative names to allow them to
 407   be easily distinguished when using programs like "top" or "perf".
 408
 409 Important bug fixes
 410 -------------------
 411
 412 * ctdb tool should not exit from a retry loop if a control times out
 413   (e.g. under high load).  This simple fix will stop an exit from the
 414   retry loop on any error.
 415
 416 * When updating flags on all nodes, use the correct updated flags.  This
 417   should avoid wrong flag change messages in the logs.
 418
 419 * The recovery daemon will not ban other nodes if the current node
 420   is banned.
 421
 422 * ctdb dbstatistics command now correctly outputs database statistics.
 423
 424 * Fixed a panic with overlapping shutdowns (regression in 2.2).
 425
 426 * Fixed 60.ganesha "monitor" event (regression in 2.2).
 427
 428 * Fixed a buffer overflow in the "reloadips" implementation.
 429
 430 * Fixed segmentation faults in ping_pong (called with incorrect
 431   argument) and test binaries (called when ctdbd not running).
 432
 433 Important internal changes
 434 --------------------------
 435
 436 * The recovery daemon on stopped or banned node will stop participating in any
 437   cluster activity.
 438
 439 * Improve cluster wide database traverse by sending the records directly from
 440   traverse child process to requesting node.
 441
 442 * TDB checking and dropping of all IPs moved from initscript to "init"
 443   event in 00.ctdb.
 444
 445 * To avoid "rogue IPs" the release IP callback now fails if the
 446   released IP is still present on an interface.
 447
 448
 449 Changes in CTDB 2.2
 450 ===================
 451
 452 User-visible changes
 453 --------------------
 454
 455 * The "stopped" event has been removed.
 456
 457   The "ipreallocated" event is now run when a node is stopped.  Use
 458   this instead of "stopped".
 459
 460 * New --pidfile option for ctdbd, used by initscript
 461
 462 * The 60.nfs eventscript now uses configuration files in
 463   /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
 464   hardcoding them into the script.
 465
 466 * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
 467
 468 * The NoIPTakeoverOnDisabled tunable has been renamed to
 469   NoIPHostOnAllDisabled and now works properly when set on individual
 470   nodes.
 471
 472 * New ctdb subcommand "runstate" prints the current internal runstate.
 473   Runstates are used for serialising startup.
 474
 475 Important bug fixes
 476 -------------------
 477
 478 * The Unix domain socket is now set to non-blocking after the
 479   connection succeeds.  This avoids connections failing with EAGAIN
 480   and not being retried.
 481
 482 * Fetching from the log ringbuffer now succeeds if the buffer is full.
 483
 484 * Fix a severe recovery bug that can lead to data corruption for SMB clients.
 485
 486 * The statd-callout script now runs as root via sudo.
 487
 488 * "ctdb delip" no longer fails if it is unable to move the IP.
 489
 490 * A race in the ctdb tool's ipreallocate code was fixed.  This fixes
 491   potential bugs in the "disable", "enable", "stop", "continue",
 492   "ban", "unban", "ipreallocate" and "sync" commands.
 493
 494 * The monitor cancellation code could sometimes hang indefinitely.
 495   This could cause "ctdb stop" and "ctdb shutdown" to fail.
 496
 497 Important internal changes
 498 --------------------------
 499
 500 * The socket I/O handling has been optimised to improve performance.
 501
 502 * IPs will not be assigned to nodes during CTDB initialisation.  They
 503   will only be assigned to nodes that are in the "running" runstate.
 504
 505 * Improved database locking code.  One improvement is to use a
 506   standalone locking helper executable - the avoids creating many
 507   forked copies of ctdbd and potentially running a node out of memory.
 508
 509 * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
 510   "ipreallocated" events.
 511
 512 * Message handlers are now indexed, providing a significant
 513   performance improvement.