7 * Dump stack traces for hung RPC processes (mountd, rquotad, statd)
9 * Add vaccuming latency to database statistics
11 * Add -X option to ctdb tool that uses '|' as a separator to correctly handle
14 * Configuration variable VerifyRecoveryLock is now marked obsolete
16 * Improved log messages when trying to set obsolete tunables
21 * Fix handling of IPv6 addresses
23 * Use correct tdb flags when using robust mutex feature
25 * Fix regression in client socket handling code
27 * Fix regression in statd callout and make it more scalable
29 * Change default log level to NOTICE, so the messages are correctly
30 displayed from CTDB tool
32 Important internal changes
33 --------------------------
35 * Vacuuming performance improvements
36 - stagger multiple vacuuming child processes
37 - process all vacuum fetch requests in a loop
39 * Improve handling of recovery lock
41 * Many test improvements and additions.
43 * Avoid logging every 10 seconds for locks that could not be obtained to prevent
53 * New command "ctdb detach" to detach a database.
55 * Support for TDB robust mutexes. To enable set TDBMutexEnabled=1.
56 The setting is per node.
58 * New manual page ctdb-statistics.7.
63 * Verify policy routing configuration when starting up to make sure that policy
64 routing tables do not override default routing tables.
66 * "ctdb scriptstatus" should correctly list the number of scripts executed.
68 * Do not run eventscripts at real-time priority.
70 * Make sure "ctdb restoredb" and "ctdb wipedb" cannot affect an ongoing
73 * If a readonly record revokation fails, CTDB does not abort anymore. It will
76 * pending_calls statistic now gets updated correctly.
78 Important internal changes
79 --------------------------
81 * Vacuuming performance has been improved.
83 * Fix the order of setting recovery mode and freezing databases.
85 * Remove NAT gateway "monitor" event.
87 * Add per database queue for lock requests. This improves the lock
88 scheduling performance.
90 * When processing dmaster packets (DMASTER_REQUEST and DMASTER_REPLY) defer all
91 call processing for that record. This avoids the temporary inconsistency in
92 dmaster information which causes rapid bouncing of call request between two
95 * Correctly capture the output from lock helper processes, so it can be logged.
97 * Many test improvements and additions.
100 Changes in CTDB 2.5.3
101 =====================
106 * New configuration variable CTDB_NATGW_STATIC_ROUTES allows NAT
107 gateway feature to create static host/network routes instead of
108 default routes. See the documentation. Use with care.
113 * ctdbd no longer crashes when tickles are processed after reloading
116 * "ctdb reloadips" works as expected because the DEL_PUBLIC_IP control
117 now waits until public IP addresses are released before returning.
119 Important internal changes
120 --------------------------
122 * Vacuuming performance has been improved.
124 * Record locking now compares records based on their hashes to avoid
125 scheduling multiple requests for records on the same hashchain.
127 * An internal timeout for revoking read-only record relegations has
128 been changed from hard-coded 5 seconds to the value of the
129 ControlTimeout tunable. This makes it less likely that ctdbd will
132 * Many test improvements and additions.
135 Changes in CTDB 2.5.2
136 =====================
141 * Much improved manpages from CTDB 2.5 are now installed and packaged.
146 * "ctdb reloadips" now waits for replies to addip/delip controls
149 Important internal changes
150 --------------------------
152 * The event scripts are now executed using vfork(2) and a helper
153 binary instead of fork(2) providing a performance improvement.
155 * "ctdb reloadips" will now works if some nodes are inactive. This
156 means that public IP addresses can be reconfigured even if nodes
160 Changes in CTDB 2.5.1
161 =====================
166 * The locking code now correctly implements a per-database active
167 locks limit. Whole database lock requests can no longer be denied
168 because there are too many active locks - this is particularly
169 important for freezing databases during recovery.
171 * The debug_locks.sh script locks against itself. If it is already
172 running then subsequent invocations will exit immediately.
174 * ctdb tool commands that operate on databases now work correctly when
175 a database ID is given.
177 * Various code fixes for issues found by Coverity.
179 Important internal changes
180 --------------------------
182 * statd-callout has been updated so that statd client information is
183 always up-to-date across the cluster. This is implemented by
184 storing the client information in a persistent database using a new
185 "ctdb ptrans" command.
187 * The transaction code for persistent databases now retries until it
188 is able to take the transaction lock. This makes the transation
189 semantics compatible with Samba's implementation.
191 * Locking helpers are created with vfork(2) instead of fork(2),
192 providing a performance improvement.
194 * config.guess has been updated to the latest upstream version so CTDB
195 should build on more platforms.
204 * The default location of the ctdbd socket is now:
206 /var/run/ctdb/ctdbd.socket
208 If you currently set CTDB_SOCKET in configuration then unsetting it
209 will probably do what you want.
211 * The default location of CTDB TDB databases is now:
215 If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
216 you probably want to move your databases to /var/lib/ctdb, drop your
217 setting of CTDB_DBDIR and just use the default.
219 To maintain the database files in /var/ctdb you will need to set
220 CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
223 * Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
224 supported. Please use individual configuration variables instead.
226 * Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
227 VacuumMaxInterval have been removed. Setting them had no effect but
228 if you now try to set them in a configuration files via CTDB_SET_X=Y
229 then CTDB will not start.
231 * Much improved manual pages. Added new manpages ctdb(7),
232 ctdbd.conf(5), ctdb-tunables(7). Still some work to do.
234 * Most CTDB-specific configuration can now be set in
235 /etc/ctdb/ctdbd.conf.
237 This avoids cluttering distribution-specific configuration files,
238 such as /etc/sysconfig/ctdb. It also means that we can say: see
239 ctdbd.conf(5) for more details. :-)
241 * Configuration variable NFS_SERVER_MODE is deprecated and has been
242 replaced by CTDB_NFS_SERVER_MODE. See ctdbd.conf(5) for more
245 * "ctdb reloadips" is much improved and should be used for reloading
246 the public IP configuration.
248 This commands attempts to yield much more predictable IP allocations
249 than using sequences of delip and addip commands. See ctdb(1) for
252 * Ability to pass comma-separated string to ctdb(1) tool commands via
253 the -n option is now documented and works for most commands. See
256 * "ctdb rebalancenode" is now a debugging command and should not be
257 used in normal operation. See ctdb(1) for details.
259 * "ctdb ban 0" is now invalid.
261 This was documented as causing a permanent ban. However, this was
262 not implemented and caused an "unban" instead. To avoid confusion,
263 0 is now an invalid ban duration. To administratively "ban" a node
264 use "ctdb stop" instead.
266 * The systemd configuration now puts the PID file in /run/ctdb (rather
267 than /run/ctdbd) for consistency with the initscript and other uses
273 * Traverse regression fixed.
275 * The default recovery method for persistent databases has been
276 changed to use database sequence numbers instead of doing
277 record-by-record recovery (using record sequence numbers). This
278 fixes issues including registry corruption.
280 * Banned nodes are no longer told to run the "ipreallocated" event
281 during a takeover run, when in fallback mode with nodes that don't
282 support the IPREALLOCATED control.
284 Important internal changes
285 --------------------------
287 * Persistent transactions are now compatible with Samba and work
290 * The recovery master role has been made more stable by resetting the
291 priority time each time a node becomes inactive. This means that
292 nodes that are active for a long time are more likely to retain the
293 recovery master role.
295 * The incomplete libctdb library has been removed.
297 * Test suite now starts ctdbd with the --sloppy-start option to speed
298 up startup. However, this should not be done in production.
307 * A missing network interface now causes monitoring to fail and the
308 node to become unhealthy.
310 * Changed ctdb command's default control timeout from 3s to 10s.
312 * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
313 to provide more information.
318 * Starting CTDB daemon by running ctdbd directly should not remove
319 existing unix socket unconditionally.
321 * ctdbd once again successfully kills client processes on releasing
322 public IPs. It was checking for them as tracked child processes
323 and not finding them, so wasn't killing them.
325 * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
326 ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
328 * Always use Jenkins hash when creating volatile databases. There
329 were a few places where TDBs would be attached with the wrong flags.
331 * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
332 which led to header corruption for empty records. This resulted
333 in inconsistent headers on two nodes and a request for such a record
334 keeps bouncing between nodes indefinitely and logs "High hopcount"
335 messages in the log. This also caused performance degradation.
337 * ctdbd was losing log messages at shutdown because they weren't being
338 given time to flush. ctdbd now sleeps for a second during shutdown
339 to allow time to flush log messages.
341 * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
342 process a large number of packets available on single FD before
343 polling other FDs. Use fixed size queue buffers to allow fair
344 scheduling across multiple FDs.
346 Important internal changes
347 --------------------------
349 * A node that fails to take/release multiple IPs will only incur a
350 single banning credit. This makes a brief failure less likely to
351 cause node to be banned.
353 * ctdb killtcp has been changed to read connections from stdin and
354 10.interface now uses this feature to improve the time taken to kill
357 * Improvements to hot records statistics in ctdb dbstatistics.
359 * Recovery daemon now assembles up-to-date node flags information
360 from remote nodes before checking if any flags are inconsistent and
363 * ctdbd no longer creates multiple lock sub-processes for the same
364 key. This reduces the number of lock sub-processes substantially.
366 * Changed the nfsd RPC check failure policy to failover quickly
367 instead of trying to repair a node first by restarting NFS. Such
368 restarts would often hang if the cause of the RPC check failure was
369 the cluster filesystem or storage.
371 * Logging improvements relating to high hopcounts and sticky records.
373 * Make sure lower level tdb messages are logged correctly.
375 * CTDB commands disable/enable/stop/continue are now resilient to
376 individual control failures and retry in case of failures.
385 * 2 new configuration variables for 60.nfs eventscript:
387 - CTDB_MONITOR_NFS_THREAD_COUNT
388 - CTDB_NFS_DUMP_STUCK_THREADS
390 See ctdb.sysconfig for details.
392 * Removed DeadlockTimeout tunable. To enable debug of locking issues set
394 CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
396 * In overall statistics and database statistics, lock buckets have been
397 updated to use following timings:
399 < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
401 * Initscript is now simplified with most CTDB-specific functionality
402 split out to ctdbd_wrapper, which is used to start and stop ctdbd.
404 * Add systemd support.
406 * CTDB subprocesses are now given informative names to allow them to
407 be easily distinguished when using programs like "top" or "perf".
412 * ctdb tool should not exit from a retry loop if a control times out
413 (e.g. under high load). This simple fix will stop an exit from the
414 retry loop on any error.
416 * When updating flags on all nodes, use the correct updated flags. This
417 should avoid wrong flag change messages in the logs.
419 * The recovery daemon will not ban other nodes if the current node
422 * ctdb dbstatistics command now correctly outputs database statistics.
424 * Fixed a panic with overlapping shutdowns (regression in 2.2).
426 * Fixed 60.ganesha "monitor" event (regression in 2.2).
428 * Fixed a buffer overflow in the "reloadips" implementation.
430 * Fixed segmentation faults in ping_pong (called with incorrect
431 argument) and test binaries (called when ctdbd not running).
433 Important internal changes
434 --------------------------
436 * The recovery daemon on stopped or banned node will stop participating in any
439 * Improve cluster wide database traverse by sending the records directly from
440 traverse child process to requesting node.
442 * TDB checking and dropping of all IPs moved from initscript to "init"
445 * To avoid "rogue IPs" the release IP callback now fails if the
446 released IP is still present on an interface.
455 * The "stopped" event has been removed.
457 The "ipreallocated" event is now run when a node is stopped. Use
458 this instead of "stopped".
460 * New --pidfile option for ctdbd, used by initscript
462 * The 60.nfs eventscript now uses configuration files in
463 /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
464 hardcoding them into the script.
466 * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
468 * The NoIPTakeoverOnDisabled tunable has been renamed to
469 NoIPHostOnAllDisabled and now works properly when set on individual
472 * New ctdb subcommand "runstate" prints the current internal runstate.
473 Runstates are used for serialising startup.
478 * The Unix domain socket is now set to non-blocking after the
479 connection succeeds. This avoids connections failing with EAGAIN
480 and not being retried.
482 * Fetching from the log ringbuffer now succeeds if the buffer is full.
484 * Fix a severe recovery bug that can lead to data corruption for SMB clients.
486 * The statd-callout script now runs as root via sudo.
488 * "ctdb delip" no longer fails if it is unable to move the IP.
490 * A race in the ctdb tool's ipreallocate code was fixed. This fixes
491 potential bugs in the "disable", "enable", "stop", "continue",
492 "ban", "unban", "ipreallocate" and "sync" commands.
494 * The monitor cancellation code could sometimes hang indefinitely.
495 This could cause "ctdb stop" and "ctdb shutdown" to fail.
497 Important internal changes
498 --------------------------
500 * The socket I/O handling has been optimised to improve performance.
502 * IPs will not be assigned to nodes during CTDB initialisation. They
503 will only be assigned to nodes that are in the "running" runstate.
505 * Improved database locking code. One improvement is to use a
506 standalone locking helper executable - the avoids creating many
507 forked copies of ctdbd and potentially running a node out of memory.
509 * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
510 "ipreallocated" events.
512 * Message handlers are now indexed, providing a significant
513 performance improvement.