martins/samba.git
4 months agoMerge branch 'ctdb-statd-callout-sm-notify' into ctdb-merge ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:45:05 +0000 (12:45 +1100)]
Merge branch 'ctdb-statd-callout-sm-notify' into ctdb-merge

4 months agoMerge branch 'ctdb-wip' into ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:44:51 +0000 (12:44 +1100)]
Merge branch 'ctdb-wip' into ctdb-merge

4 months agoMerge branch 'ctdb-tickles' into ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:44:38 +0000 (12:44 +1100)]
Merge branch 'ctdb-tickles' into ctdb-merge

4 months agoMerge branch 'ctdb-host-monitoring' into ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:44:05 +0000 (12:44 +1100)]
Merge branch 'ctdb-host-monitoring' into ctdb-merge

4 months agoMerge branch 'ctdb-killtcp' into ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:44:05 +0000 (12:44 +1100)]
Merge branch 'ctdb-killtcp' into ctdb-merge

4 months agoMerge branch 'ctdb-ganesha' into ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:44:05 +0000 (12:44 +1100)]
Merge branch 'ctdb-ganesha' into ctdb-merge

4 months agoMerge branch 'ctdb-tunnel' into ctdb-merge
Martin Schwenke [Fri, 12 Jan 2024 01:44:05 +0000 (12:44 +1100)]
Merge branch 'ctdb-tunnel' into ctdb-merge

4 months agoctdb-tests: Update statd-callout tests to handle both modes
Martin Schwenke [Fri, 30 Jun 2023 11:50:10 +0000 (21:50 +1000)]
ctdb-tests: Update statd-callout tests to handle both modes

Add support for shared_dir mode.  Add tests hooks to statd-callout
itself.

Currently run with:

  CTDB_NFS_SHARED_STATE_DIR=/clusterfs \
  CTDB_STATD_CALLOUT_SHARED_STORAGE=shared_dir \
    ./tests/run_tests.sh ./tests/UNIT/eventscripts/statd-callout.*

or similar.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Support storing statd-callout state in filesystem
Martin Schwenke [Wed, 28 Jun 2023 04:01:44 +0000 (14:01 +1000)]
ctdb-scripts: Support storing statd-callout state in filesystem

CTDB_STATD_CALLOUT_SHARED_STORAGE is a new configuration variable
indicating where statd-callout should store its NFS client locking
data.  See the update to ctdb-script.options(5) for details.

This adds back functionality that was removed in commit
12cc82623150ca4a83482f1b7165401cbdecd3de.  The commit message doesn't
say why this was changed but it was most likely due to a cluster
filesystem hanging at inopportune times.  Hence, this is re-added as a
non-default option.

Note that this could create the files for sm-notify in add-client.
However, this would require an alternate implementation of
send_notifies() (or a change to the implementation for persistent_db
too).  It seems better to leave add-client lightweight and do the work
in notify, since add-client is a more frequent operation.

In test mode, the shared storage location is prefixed by a shared
directory location within the test environment.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-tests: Default PNN is 0
Martin Schwenke [Fri, 30 Jun 2023 02:24:30 +0000 (12:24 +1000)]
ctdb-tests: Default PNN is 0

This is called in a couple of places without an argument, so give it a
default.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Move state directory creation to "startup" action
Martin Schwenke [Thu, 29 Jun 2023 03:25:03 +0000 (13:25 +1000)]
ctdb-scripts: Move state directory creation to "startup" action

Now that there is a startup action, directory creation can be
unconditionally done there.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Move ctdb.tdb attach to statd-callout
Martin Schwenke [Thu, 29 Jun 2023 03:11:46 +0000 (13:11 +1000)]
ctdb-scripts: Move ctdb.tdb attach to statd-callout

All of the other uses of ctdb.tdb are in statd-callout.  This will
also allow an alternate database name to be configured.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Factor out some statd-callout functions
Martin Schwenke [Tue, 27 Jun 2023 03:37:56 +0000 (13:37 +1000)]
ctdb-scripts: Factor out some statd-callout functions

This captures all of the ctdb.tdb implementation specific details in
functions.  Alternate implementations can now be added.

While factoring, remove the need to cd to the queue directory by
explicitly setting and using $statd_callout_queue_dir.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Use nfs-utils' sm-notify instead of CTDB's smnotify
Martin Schwenke [Fri, 3 Mar 2017 04:44:08 +0000 (15:44 +1100)]
ctdb-scripts: Use nfs-utils' sm-notify instead of CTDB's smnotify

CTDB's smnotify does not support IPv6 and is difficult to maintain.

So, create directories of files and pass them to NFS util's sm-notify.

There is an implied change here, because NFS utils sm-notify stopped
sending IP addresses as mon_name back in 2010:

  http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commitdiff;h=900df0e7c0b9006d72d8459b30dc2cd69ce495a5

This will change advice given in the wiki to use a hostname for the
cluster with round-robin DNS, since this is what is best supported.

Another behavioural change is that sm-notify only sends "up"
notifications with an odd state.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-failover: Add ctdb_smnotify_helper
Martin Schwenke [Wed, 10 May 2023 02:21:07 +0000 (12:21 +1000)]
ctdb-failover: Add ctdb_smnotify_helper

There are endian-specific outputs in the files used by NFS utils'
sm-notify, which will subsequently be used, so create a tiny helper
instead of depending on complicated shell commands or calling out to
Python.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: No longer run statd-callout under sudo
Martin Schwenke [Mon, 19 Jun 2023 00:39:29 +0000 (10:39 +1000)]
ctdb-scripts: No longer run statd-callout under sudo

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Use find_statd_sm_dir() in one more place
Martin Schwenke [Wed, 2 Aug 2023 03:37:03 +0000 (13:37 +1000)]
ctdb-scripts: Use find_statd_sm_dir() in one more place

Take advantage of new function find_statd_sm_dir() when clearing of
the local statd state directory so it uses the correct directory when
running on a non-RH distro.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Set ownership of statd-callout state directory
Martin Schwenke [Mon, 19 Jun 2023 02:17:44 +0000 (12:17 +1000)]
ctdb-scripts: Set ownership of statd-callout state directory

For add-client and del-client, statd-callout is called by rpc.statd,
which runs as rpcuser, statd or some other non-root system user.  Find
the local statd state directory and use it as a reference to set the
ownership of statd-callout's state directory so add-client and
del-client can write to it.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Avoid connecting to ctdbd in add-client/del-client
Martin Schwenke [Sun, 18 Jun 2023 23:52:21 +0000 (09:52 +1000)]
ctdb-scripts: Avoid connecting to ctdbd in add-client/del-client

rpc.statd runs statd-callout as a non-root user, which is currently
hacked around using some sudo logic that fails to work in some
contexts (e.g. in a container).

Remove the local ctdb_get_my_public_ip_addresses() so that the new
caching version is now used in add-client and del-client.  This avoids
connecting to ctdbd when called from rpc.statd.

Use ctdb_get_my_public_ip_addresses() in other places where it makes
sense.

Connections to ctdbd are still made in the "notify" action, but this
is always run as root.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Add caching function for public IPs
Martin Schwenke [Thu, 15 Jun 2023 06:21:19 +0000 (16:21 +1000)]
ctdb-scripts: Add caching function for public IPs

This is way more complicated than I would like but, as per the
comment, this is due to complexities in the way public IPs work.  The
main consumer will be statd-callout, which will then be able to run as
a non-root user.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Avoid ShellCheck warning SC2162
Martin Schwenke [Sun, 18 Jun 2023 23:43:33 +0000 (09:43 +1000)]
ctdb-scripts: Avoid ShellCheck warning SC2162

  SC2162 read without -r will mangle backslashes.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Improve update code
Martin Schwenke [Wed, 2 Aug 2023 03:23:58 +0000 (13:23 +1000)]
ctdb-scripts: Improve update code

Drop the complexity associated with using awk to escape dots in IPv4
addresses to protect them from sed, and generate a grep -F filter
instead.  Use temporary files for the grep filter and a file list
constructed by find to avoid command-line length limits.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Factor out handling of hosted public IPs
Martin Schwenke [Thu, 29 Jun 2023 00:12:44 +0000 (10:12 +1000)]
ctdb-scripts: Factor out handling of hosted public IPs

This is done differently in different places, so make it consistent.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Improve documentation
Martin Schwenke [Tue, 13 Jun 2023 00:39:37 +0000 (10:39 +1000)]
ctdb-scripts: Improve documentation

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"
Martin Schwenke [Fri, 16 Jun 2023 01:09:02 +0000 (11:09 +1000)]
ctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"

Best reviewed with "git show -w".

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Improve documentation comment
Martin Schwenke [Tue, 18 Jul 2023 21:22:08 +0000 (07:22 +1000)]
ctdb-scripts: Improve documentation comment

There is some confusion around how the GlusterFS support should be
used.  If nobody chimes in with hints then that support may be removed
in future.  The general scheme should work on all filesystems.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Construct state subdirectory for IP address
Martin Schwenke [Wed, 19 Jul 2023 00:27:45 +0000 (10:27 +1000)]
ctdb-scripts: Construct state subdirectory for IP address

Using "grace_period 5:<IP>" without a subdirectory for <IP> causes an
error.

Construct a recovery subdirectory for the <IP> being taken.  Find
client IDs (CIDs) matching <IP> (according to NFS-Ganesha's
recovery_fs.c implementation, these will be empty directories) and
effectively copy them to the subdirectory (using mkdir, since they are
directories).  After running grace_period, remove all entries in the
subdirectory.

This ensures that NFS-Ganesha only adds CIDs to the in-memory CID list
for clients connecting to <IP> that will reclaim locks on the takeover
node.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Switch takeip handling into takeip-pre
Martin Schwenke [Mon, 24 Jul 2023 23:09:46 +0000 (09:09 +1000)]
ctdb-scripts: Switch takeip handling into takeip-pre

The NFS-Ganesha lock manager needs to be in grace before any attempts
to reclaim locks.  This needs to happen before the IP is on the
interface via 10.interface, which allows clients to reconnect.  So,
use takeip-pre for this, since it is run from 06.nfs.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Support Lustre
Martin Schwenke [Tue, 18 Jul 2023 21:22:54 +0000 (07:22 +1000)]
ctdb-scripts: Support Lustre

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Change NFS-Ganesha PID file location
Martin Schwenke [Thu, 6 Jul 2023 10:28:30 +0000 (20:28 +1000)]
ctdb-scripts: Change NFS-Ganesha PID file location

This is the current default.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Drop reference to historical commit
Martin Schwenke [Thu, 6 Jul 2023 03:37:55 +0000 (13:37 +1000)]
ctdb-scripts: Drop reference to historical commit

No novel contributions from that commit remain.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Fix usage message
Martin Schwenke [Thu, 6 Jul 2023 03:37:03 +0000 (13:37 +1000)]
ctdb-scripts: Fix usage message

An IP address is passed to these actions.

Reported-by: Arnab Tah <atah@ddn.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Drop GPFS special cases
Martin Schwenke [Thu, 6 Jul 2023 03:24:22 +0000 (13:24 +1000)]
ctdb-scripts: Drop GPFS special cases

Use the simple GPFS code by default for non-GlusterFS filesystems.

Drop variable GANRECDIR, which is only used once for an obvious
purpose and isn't a configuration variable.

This makes it possible to add support for a new filesystem type by
adding it to the initial case statement.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Drop get_cluster_fs_state()
Martin Schwenke [Thu, 6 Jul 2023 03:16:02 +0000 (13:16 +1000)]
ctdb-scripts: Drop get_cluster_fs_state()

It is unclear why NFS monitoring should succeed if the filesystem
containing the shared NFS state directory is not active.  Nothing else
in the current monitor code depends on the shared directory, and any
other operation that uses the filesystem may still behave badly.

When commit 28cbe527d47822f870e8252495ab2a1c8fddd12f introduced this
function, there was more monitoring logic.  However, that logic didn't
seem to use the shared directory and it has since been removed anyway.

Additionally, checking the state of the filesystem here seems like a
layering violation, where failure in a lower layer is ignored at its
own level, so it then needs to be ignored here in a higher layer.  It
should be checked in a previous event script, though it is unclear
what should be done if the filesystem is failing over.  It doesn't
seem sane to mark all nodes unhealthy there, but that isn't an NFS
problem.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Use CTDB_NFS_SHARED_STATE_DIR in nfs-ganesha-callout
Martin Schwenke [Wed, 5 Jul 2023 22:20:37 +0000 (08:20 +1000)]
ctdb-scripts: Use CTDB_NFS_SHARED_STATE_DIR in nfs-ganesha-callout

Rename CTDB_NFS_STATE_MNT to CTDB_NFS_SHARED_STATE_DIR.  It doesn't
have to be a mount but can be any directory in a cluster filesystem.
There are plans to also use CTDB_NFS_SHARED_STATE_DIR for
statd-callout, so the name might as well be better.

CTDB_NFS_SHARED_STATE_DIR is now mandatory when GPFS is used.  This is
much saner that choosing the first GPFS filesystem.

Drop CTDB_NFS_STATE_FS_TYPE.  The filesystem type is now determined
from CTDB_NFS_SHARED_STATE_DIR and it is now checked against supported
filesystems.  This will catch the case when the filesystem for the
specified directory has not been mounted and the filesystem for the
mountpoint (e.g. ext4) is not a supported filesystem for shared state.

A side-effect is that the filesystem containing
CTDB_NFS_SHARED_STATE_DIR must be mounted when nfs-ganesha-callout is
first run.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"
Martin Schwenke [Thu, 6 Jul 2023 03:13:06 +0000 (13:13 +1000)]
ctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"

Best reviewed with "git show -w".

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoutils: Add error checking to log level parsing
Martin Schwenke [Tue, 9 Jan 2024 22:28:45 +0000 (09:28 +1100)]
utils: Add error checking to log level parsing

There is an additional use of atoi() in the backend setup that is not
handled by this change.  It is in a void function, where error
handling is more difficult.  However, the goal here is to detect
errors in the parsing of the "log level" parameter, not "logging".

Don't attempt to fix testparm (i.e.
https://bugzilla.samba.org/show_bug.cgi?id=11301).  That is a
different problem, requiring a lot of detangling.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoWIP ctdb-scripts: Use nftables for IP blocking
Martin Schwenke [Fri, 10 Nov 2023 00:17:13 +0000 (11:17 +1100)]
WIP ctdb-scripts: Use nftables for IP blocking

TODO:

* Consider calling ip_block_init() if an add element fails.
* 11.natgw
* 70.iscsi
* Drop iptables_wrapper

Don't bother cleaning up the sets on shutdown or similar.  The whole
point of sets is that it is easy to delete elements from them.  Rules
are difficult to delete because they can only be referenced by handle.
nft is better than iptables for defining an initial setup, but it is
harder for scripting.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-daemon: Add "max log size" configuration parameter
Martin Schwenke [Tue, 31 May 2022 03:36:39 +0000 (13:36 +1000)]
ctdb-daemon: Add "max log size" configuration parameter

Passing the limit via an environment variable is questionable.
However, the dependencies (logging-conf depends on logging) make other
ways difficult.  It would be possible to extend logging_init(), but
not sure it is right for a feature that only applies to file
logging...

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agodocs: Add some design notes
Martin Schwenke [Fri, 23 Feb 2018 09:16:18 +0000 (20:16 +1100)]
docs: Add some design notes

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-tests: Add some debug on an important failure
Martin Schwenke [Wed, 4 Jan 2017 03:25:14 +0000 (14:25 +1100)]
ctdb-tests: Add some debug on an important failure

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-util: Add D_TRACE
Martin Schwenke [Wed, 4 Jan 2017 03:14:01 +0000 (14:14 +1100)]
ctdb-util: Add D_TRACE

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-scripts: Remove superseded compatibility code
Martin Schwenke [Mon, 23 Oct 2023 03:23:45 +0000 (14:23 +1100)]
ctdb-scripts: Remove superseded compatibility code

Since commit 224e99804efef960ef4ce2ff2f4f6dced1e74146, square brackets
have been parsed by daemon and tool code, so drop the compatibility
code from here.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Track connections for all ports for public IPs
Martin Schwenke [Mon, 23 Oct 2023 03:17:36 +0000 (14:17 +1100)]
ctdb-scripts: Track connections for all ports for public IPs

Currently TCP ports like NFS lock manager are not tracked.  It is
easier to track all connections than to add a configuration system to
try to track specified ports, so do that.

Note that this is not exactly all connections.  smbd connections are
excluded because ctdbd tracks them directly.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Move connection tracking to 10.interface
Martin Schwenke [Mon, 23 Oct 2023 03:05:21 +0000 (14:05 +1100)]
ctdb-scripts: Move connection tracking to 10.interface

This should really be done for all connections to public IP
addresses.  This is the first step.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-tests: Extend ss stub to handle excluded ports
Martin Schwenke [Mon, 6 Nov 2023 03:12:47 +0000 (14:12 +1100)]
ctdb-tests: Extend ss stub to handle excluded ports

This involves input like:

  (sport != :139 && sport != :445)

Add appropriate error checking to ensure that for sport lists:

* == is always used with ||
* != is always used with &&

Add similar checking for src.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-tests: Ensure square brackets are handled around IP addresses
Martin Schwenke [Fri, 27 Oct 2023 00:06:23 +0000 (11:06 +1100)]
ctdb-tests: Ensure square brackets are handled around IP addresses

It isn't unreasonable for unit test cases to use square brackets in
their input.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Add host monitoring ctdb-host-monitoring
Martin Schwenke [Wed, 1 Mar 2023 22:29:31 +0000 (09:29 +1100)]
ctdb-scripts: Add host monitoring

It can be difficult to diaagnose certain event script timeouts.  So,
at the risk of re-inventing Nagios, this provides some limited host
monitoring to ensure that nodes are able to reach required
infrastructure.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-tests: Enhance timeout stub
Martin Schwenke [Fri, 17 Mar 2023 00:05:19 +0000 (11:05 +1100)]
ctdb-tests: Enhance timeout stub

Allow it to trigger on a particular command instead of
unconditionally.  Wrap setting and clearing the variable.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Improve some comments
Martin Schwenke [Wed, 1 Mar 2023 01:14:04 +0000 (12:14 +1100)]
ctdb-scripts: Improve some comments

In particular, the word "passing" is ambiguous, since it could refer
to whether a value is passed to the function.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Add configuration variable CTDB_KILLTCP_USE_SS_KILL
Martin Schwenke [Tue, 22 Aug 2023 02:13:44 +0000 (12:13 +1000)]
ctdb-scripts: Add configuration variable CTDB_KILLTCP_USE_SS_KILL

This allows CTDB to be configured to use "ss -K" to reset TCP
connections on "releaseip".  This is only supported when the kernel is
configured with CONFIG_INET_DIAG_DESTROY enabled.

From the documentation:

   ss -K has been supported in ss since iproute 4.5 in March 2016 and
   in the Linux kernel since 4.4 in December 2015.  However, the
   required kernel configuration item CONFIG_INET_DIAG_DESTROY is
   disabled by default.  Although enabled in Debian kernels since
   ~2017 and in Ubuntu since at least 18.04,, this has only recently
   been enabled in distributions such as RHEL.  There seems to be no
   way, including running ss -K, to determine if this is supported, so
   use of this feature needs to be configurable.  When available, it
   should be the fastest, most reliable way of killing connections.

For RHEL and derivatives, this was enabled as follows:

* RHEL 8 via https://bugzilla.redhat.com/show_bug.cgi?id=2230213,
  arriving in version kernel-4.18.0-513.5.1.el8_9

* RHEL 9 via https://issues.redhat.com/browse/RHEL-212, arriving in
  kernel-5.14.0-360.el9

Enabling this option results in a small behaviour change because ss -K
always does a 2-way kill (i.e. it also sends a RST to the client).
Only a 1-way kill is done for SMB connections when ctdb_killtcp is
used - the reasons for this are shrouded in history and the 2-way kill
seems to work fine.

For the summary that is logged, when CTDB_KILLTCP_USE_SS_KILL is "yes"
or "try", always log the method used, even the fallback to
ctdb_killtcp.  However, when set to "no", maintain the existing
output.

The decision to use -K rather than --kill is because short options are
trivial to implement in test stubs.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Factor out function kill_tcp_summarise()
Martin Schwenke [Fri, 25 Aug 2023 00:00:57 +0000 (10:00 +1000)]
ctdb-scripts: Factor out function kill_tcp_summarise()

This will be used in a slightly different context in a subsequent
commit.  In that case, the number of killed connections will be passed
instead of the total number of connections, so support this here via
different modes instead of churning later.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-doc: Improve description of 10.interface event script
Martin Schwenke [Tue, 22 Aug 2023 02:12:50 +0000 (12:12 +1000)]
ctdb-doc: Improve description of 10.interface event script

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Improve documentation comment
Martin Schwenke [Tue, 18 Jul 2023 21:22:08 +0000 (07:22 +1000)]
ctdb-scripts: Improve documentation comment

There is some confusion around how the GlusterFS support should be
used.  If nobody chimes in with hints then that support may be removed
in future.  The general scheme should work on all filesystems.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Construct state subdirectory for IP address
Martin Schwenke [Wed, 19 Jul 2023 00:27:45 +0000 (10:27 +1000)]
ctdb-scripts: Construct state subdirectory for IP address

Using "grace_period 5:<IP>" without a subdirectory for <IP> causes an
error.

Construct a recovery subdirectory for the <IP> being taken.  Find
client IDs (CIDs) matching <IP> (according to NFS-Ganesha's
recovery_fs.c implementation, these will be empty directories) and
effectively copy them to the subdirectory (using mkdir, since they are
directories).  After running grace_period, remove all entries in the
subdirectory.

This ensures that NFS-Ganesha only adds CIDs to the in-memory CID list
for clients connecting to <IP> that will reclaim locks on the takeover
node.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Switch takeip handling into takeip-pre
Martin Schwenke [Mon, 24 Jul 2023 23:09:46 +0000 (09:09 +1000)]
ctdb-scripts: Switch takeip handling into takeip-pre

The NFS-Ganesha lock manager needs to be in grace before any attempts
to reclaim locks.  This needs to happen before the IP is on the
interface via 10.interface, which allows clients to reconnect.  So,
use takeip-pre for this, since it is run from 06.nfs.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Support Lustre
Martin Schwenke [Tue, 18 Jul 2023 21:22:54 +0000 (07:22 +1000)]
ctdb-scripts: Support Lustre

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Change NFS-Ganesha PID file location
Martin Schwenke [Thu, 6 Jul 2023 10:28:30 +0000 (20:28 +1000)]
ctdb-scripts: Change NFS-Ganesha PID file location

This is the current default.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Drop reference to historical commit
Martin Schwenke [Thu, 6 Jul 2023 03:37:55 +0000 (13:37 +1000)]
ctdb-scripts: Drop reference to historical commit

No novel contributions from that commit remain.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Fix usage message
Martin Schwenke [Thu, 6 Jul 2023 03:37:03 +0000 (13:37 +1000)]
ctdb-scripts: Fix usage message

An IP address is passed to these actions.

Reported-by: Arnab Tah <atah@ddn.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Drop GPFS special cases
Martin Schwenke [Thu, 6 Jul 2023 03:24:22 +0000 (13:24 +1000)]
ctdb-scripts: Drop GPFS special cases

Use the simple GPFS code by default for non-GlusterFS filesystems.

Drop variable GANRECDIR, which is only used once for an obvious
purpose and isn't a configuration variable.

This makes it possible to add support for a new filesystem type by
adding it to the initial case statement.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Drop get_cluster_fs_state()
Martin Schwenke [Thu, 6 Jul 2023 03:16:02 +0000 (13:16 +1000)]
ctdb-scripts: Drop get_cluster_fs_state()

It is unclear why NFS monitoring should succeed if the filesystem
containing the shared NFS state directory is not active.  Nothing else
in the current monitor code depends on the shared directory, and any
other operation that uses the filesystem may still behave badly.

When commit 28cbe527d47822f870e8252495ab2a1c8fddd12f introduced this
function, there was more monitoring logic.  However, that logic didn't
seem to use the shared directory and it has since been removed anyway.

Additionally, checking the state of the filesystem here seems like a
layering violation, where failure in a lower layer is ignored at its
own level, so it then needs to be ignored here in a higher layer.  It
should be checked in a previous event script, though it is unclear
what should be done if the filesystem is failing over.  It doesn't
seem sane to mark all nodes unhealthy there, but that isn't an NFS
problem.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Use CTDB_NFS_SHARED_STATE_DIR in nfs-ganesha-callout
Martin Schwenke [Wed, 5 Jul 2023 22:20:37 +0000 (08:20 +1000)]
ctdb-scripts: Use CTDB_NFS_SHARED_STATE_DIR in nfs-ganesha-callout

Rename CTDB_NFS_STATE_MNT to CTDB_NFS_SHARED_STATE_DIR.  It doesn't
have to be a mount but can be any directory in a cluster filesystem.
There are plans to also use CTDB_NFS_SHARED_STATE_DIR for
statd-callout, so the name might as well be better.

CTDB_NFS_SHARED_STATE_DIR is now mandatory when GPFS is used.  This is
much saner that choosing the first GPFS filesystem.

Drop CTDB_NFS_STATE_FS_TYPE.  The filesystem type is now determined
from CTDB_NFS_SHARED_STATE_DIR and it is now checked against supported
filesystems.  This will catch the case when the filesystem for the
specified directory has not been mounted and the filesystem for the
mountpoint (e.g. ext4) is not a supported filesystem for shared state.

A side-effect is that the filesystem containing
CTDB_NFS_SHARED_STATE_DIR must be mounted when nfs-ganesha-callout is
first run.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"
Martin Schwenke [Thu, 6 Jul 2023 03:13:06 +0000 (13:13 +1000)]
ctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"

Best reviewed with "git show -w".

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
4 months agoctdb-tests: Extend tunnel test to cover more cases ctdb-tunnel
Martin Schwenke [Fri, 16 Jul 2021 07:21:07 +0000 (17:21 +1000)]
ctdb-tests: Extend tunnel test to cover more cases

Test heterogeneous tunnels where the tunnel IDs are different at
either end.  Also test tunnels on the same node - the heterogeneous
case is the useful one, while homogeneous is just loopback.

Drop the time for each run to 15s, given the increase in test cases.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-tests: Factor out running the tunnel test
Martin Schwenke [Fri, 16 Jul 2021 03:19:29 +0000 (13:19 +1000)]
ctdb-tests: Factor out running the tunnel test

Do this for both the script and the C code.  This will tests with
different options to be run.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb: Allow a destination tunnel ID different to that of the sender
Martin Schwenke [Fri, 16 Jul 2021 02:55:32 +0000 (12:55 +1000)]
ctdb: Allow a destination tunnel ID different to that of the sender

This will allow a general transport API to implemented on top of
tunnels.

Testing will be subsequently enhanced.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-transport: Add new transport protocol
Amitay Isaacs [Fri, 28 Jun 2019 13:34:56 +0000 (23:34 +1000)]
ctdb-transport: Add new transport protocol

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
4 months agoctdb-failover: Use command-line option for NoIPTakeover ctdb-failover-ipalloc
Martin Schwenke [Wed, 20 Jun 2018 02:09:01 +0000 (12:09 +1000)]
ctdb-failover: Use command-line option for NoIPTakeover

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Use command-line option to select algorithm
Martin Schwenke [Mon, 18 Jun 2018 11:31:10 +0000 (21:31 +1000)]
ctdb-failover: Use command-line option to select algorithm

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Convert IP allocation helper option handling to use popt
Martin Schwenke [Mon, 18 Jun 2018 11:32:58 +0000 (21:32 +1000)]
ctdb-failover: Convert IP allocation helper option handling to use popt

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Merge function ctdb_test_init() into run_ipalloc()
Martin Schwenke [Tue, 19 Jun 2018 20:53:32 +0000 (06:53 +1000)]
ctdb-failover: Merge function ctdb_test_init() into run_ipalloc()

This provides a nice linear code flow.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Improve error handling in IP allocation helper
Martin Schwenke [Tue, 19 Jun 2018 20:39:23 +0000 (06:39 +1000)]
ctdb-failover: Improve error handling in IP allocation helper

Rename ctdb_test_ipalloc() to run_ipalloc() while touching the
signature.  Drop function read_ctdb_public_ip_info() because it isn't
adding value.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Improve logging in IP allocation helper
Martin Schwenke [Mon, 18 Jun 2018 09:03:55 +0000 (19:03 +1000)]
ctdb-failover: Improve logging in IP allocation helper

ERROR level is used most frequently, so make it the default.  Use
standard CTDB_DEBUGLEVEL in testcases instead of something more
obscure.  Also initialise logging correctly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Don't pass node states to IP allocation helper
Martin Schwenke [Wed, 27 Jun 2018 12:26:35 +0000 (22:26 +1000)]
ctdb-failover: Don't pass node states to IP allocation helper

This helper is doing IP allocation and simply needs to know which
nodes each IP addresses are available on.  It doesn't need to know
node states.  So just pass the number of nodes instead of the node
states.

Update tests accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-failover: Use takeover test helper as IP allocation helper
Martin Schwenke [Wed, 20 Jun 2018 03:39:42 +0000 (13:39 +1000)]
ctdb-failover: Use takeover test helper as IP allocation helper

After a good clean up this will become a helper in the failover
component.  The same unit tests are still run.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-tests: Drop checking of runstate in takeover test
Martin Schwenke [Mon, 9 Jul 2018 07:28:48 +0000 (17:28 +1000)]
ctdb-tests: Drop checking of runstate in takeover test

Daemons do not return available IPs for nodes that are not RUNNING.
Indicate the IP addresses as not available on relevant nodes instead
of doing runstate checking.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-tests: Add function to read IPs, use ipalloc_set_public_ip_info
Martin Schwenke [Mon, 9 Jul 2018 06:16:00 +0000 (16:16 +1000)]
ctdb-tests: Add function to read IPs, use ipalloc_set_public_ip_info

The input format contains per-address lists of node/available nodes.
ipalloc_read_known_ips() inverts the structure into per-node arrays of
IP lists.  Then ipalloc_set_public_ips() converts it back to a
per-address merged IP list containing bitmaps of nodes.  This is
needlessly complex.

Instead, add a local function to read the addresses, building merged
IP list and bitmaps directly.  Then use ipalloc_set_public_ip_info()
to set this in the state.

Sorting the input list adds some unwelcome complexity.  This is partly
to avoid churn in the test cases.  However, continuing to have the
input sorted keeps the multi-stage tests sane, since output from one
stage is fed back in as input.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-tests: Drop ability to read per-node IP states
Martin Schwenke [Mon, 9 Jul 2018 06:50:44 +0000 (16:50 +1000)]
ctdb-tests: Drop ability to read per-node IP states

This is no longer used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-tests: Drop IP allocation test that reads input in "multi" mode
Martin Schwenke [Mon, 9 Jul 2018 06:44:49 +0000 (16:44 +1000)]
ctdb-tests: Drop IP allocation test that reads input in "multi" mode

This tests an inconsistency of IP state between nodes and actually
tests the create_merged_ip_list() function rather than IP allocation.
The bug has been fixed for years and soon there won't be a possibility
of inconsistency between nodes (due to state being stored in a
database), so drop this test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Add function ipalloc_set_public_ip_info()
Martin Schwenke [Mon, 9 Jul 2018 02:33:50 +0000 (12:33 +1000)]
ctdb-ipalloc: Add function ipalloc_set_public_ip_info()

This is an alternative way of setting the public IP list, including
the bitmaps and available counts.  It is a much more natural form for
providing the data when it is read in.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-takeover: Free known and available IPs when finished with them
Martin Schwenke [Mon, 9 Jul 2018 09:23:02 +0000 (19:23 +1000)]
ctdb-takeover: Free known and available IPs when finished with them

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Allow ipalloc() to return an error
Martin Schwenke [Tue, 10 Jul 2018 07:30:45 +0000 (17:30 +1000)]
ctdb-ipalloc: Allow ipalloc() to return an error

Allows the caller to distinguish between an error and an empty out
list (due to empty input list).

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Don't store known and available public addresses
Martin Schwenke [Mon, 9 Jul 2018 09:19:56 +0000 (19:19 +1000)]
ctdb-ipalloc: Don't store known and available public addresses

They are only used by the setup functions, so pass them to those
functions instead of storing them.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Optimise ipalloc_can_host_ips()
Martin Schwenke [Mon, 9 Jul 2018 02:30:39 +0000 (12:30 +1000)]
ctdb-ipalloc: Optimise ipalloc_can_host_ips()

With the merged IP list in hand, including the number of nodes each IP
address is available on, this becomes trivial.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Count the number of nodes each IP address is available on
Martin Schwenke [Mon, 9 Jul 2018 02:25:24 +0000 (12:25 +1000)]
ctdb-ipalloc: Count the number of nodes each IP address is available on

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Call setup functions when setting public IPs
Martin Schwenke [Mon, 9 Jul 2018 02:18:05 +0000 (12:18 +1000)]
ctdb-ipalloc: Call setup functions when setting public IPs

This simplifies the actual IP allocation calculation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Allow ipalloc_set_public_ips() to return an error
Martin Schwenke [Mon, 9 Jul 2018 02:16:12 +0000 (12:16 +1000)]
ctdb-ipalloc: Allow ipalloc_set_public_ips() to return an error

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agoctdb-ipalloc: Allow create_merged_ip_list() to return an error
Martin Schwenke [Mon, 9 Jul 2018 01:54:48 +0000 (11:54 +1000)]
ctdb-ipalloc: Allow create_merged_ip_list() to return an error

Right now it is not possible to tell the difference between a memory
allocation failure and no known public IP addresses.  Fix this by
having the function return an error code separate from the resulting
public IP list.

While touching this, fix the variable name(s) in ipalloc() and a
couple of debug messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>
4 months agopython:gp: Print a nice message if cepces-submit can't be found
Andreas Schneider [Tue, 9 Jan 2024 07:50:01 +0000 (08:50 +0100)]
python:gp: Print a nice message if cepces-submit can't be found

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15552

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: David Mulder <dmulder@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Wed Jan 10 09:54:34 UTC 2024 on atb-devel-224

4 months agos3:rpc_server: Mark _lsa_CreateTrustedDomainEx as NOT_IMPLMENTED
Andreas Schneider [Mon, 8 Jan 2024 15:15:03 +0000 (16:15 +0100)]
s3:rpc_server: Mark _lsa_CreateTrustedDomainEx as NOT_IMPLMENTED

There is no PDB backend supporting this.

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Tue Jan  9 14:17:40 UTC 2024 on atb-devel-224

4 months agos3:rpc_server: Mark _lsa_CreateTrustedDomain as NOT_IMPLMENTED
Andreas Schneider [Mon, 8 Jan 2024 15:13:52 +0000 (16:13 +0100)]
s3:rpc_server: Mark _lsa_CreateTrustedDomain as NOT_IMPLMENTED

There is no PDB backend which is supporting this.

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
4 months agodcesrv_reply: just drop responses if the connection is already terminating
Stefan Metzmacher [Fri, 24 Nov 2023 13:42:35 +0000 (14:42 +0100)]
dcesrv_reply: just drop responses if the connection is already terminating

There's no reason to waste resources...

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Tue Jan  9 11:26:55 UTC 2024 on atb-devel-224

4 months agodcesrv_core: add dcesrv_call_state->subreq in order to allow tevent_req_cancel()...
Stefan Metzmacher [Fri, 24 Nov 2023 13:02:02 +0000 (14:02 +0100)]
dcesrv_core: add dcesrv_call_state->subreq in order to allow tevent_req_cancel() on termination

Requests might be cancelled if the connection got disconnected,
we got an ORPHANED or CO_CANCEL pdu.

But this is all opt-in for the backends to choose.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
4 months agowitness.idl: add flag(NDR_PAHEX) to some hex based enums
Stefan Metzmacher [Fri, 29 Dec 2023 09:20:02 +0000 (10:20 +0100)]
witness.idl: add flag(NDR_PAHEX) to some hex based enums

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
4 months agowitness.idl: make some types public in order to be used elsewhere
Stefan Metzmacher [Fri, 24 Nov 2023 15:38:06 +0000 (16:38 +0100)]
witness.idl: make some types public in order to be used elsewhere

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
4 months agowitness.idl: Set cifs as auth service name for the witness interface
Samuel Cabrero [Wed, 21 Oct 2020 16:30:29 +0000 (18:30 +0200)]
witness.idl: Set cifs as auth service name for the witness interface

Windows clients use the 'cifs' service name to bind to the witness interface.

Signed-off-by: Samuel Cabrero <scabrero@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
4 months agotdb: fix python/tdbdump.py example
Stefan Metzmacher [Fri, 24 Nov 2023 15:28:38 +0000 (16:28 +0100)]
tdb: fix python/tdbdump.py example

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
4 months agoexamples/scripts: add smbXsrvdump
Ralph Boehme [Sun, 28 Jan 2018 14:35:44 +0000 (15:35 +0100)]
examples/scripts: add smbXsrvdump

A simple python tool to dump smbXsrv TDB databases.

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
4 months agosmbXsrv.idl: add python bindings
Stefan Metzmacher [Fri, 24 Nov 2023 15:09:58 +0000 (16:09 +0100)]
smbXsrv.idl: add python bindings

This is useful for some scripting examples and debugging...

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>