messages_dgm: Properly handle receiver re-initialization
authorVolker Lendecke <vl@samba.org>
Thu, 7 Feb 2019 15:15:46 +0000 (16:15 +0100)
committerAndrew Bartlett <abartlet@samba.org>
Thu, 14 Feb 2019 01:18:28 +0000 (02:18 +0100)
This only properly covers the small-message nonblocking case. Covering
the large-message and the blocking case is a much larger effort assuming
we want to re-send the failed message if parts of the message has gone
through properly. Don't do that for now.

This was found by sanba_dnsupdate constantly recreating its irpc handle to
winbindd in the RODC case.

The messaging_dgm code cached connected datagram sockets based on the
destination pid for 1 second. Which means the IRPC responses from
winbindd are never delivered to samba_dnsupdate,
which will then hit a timeout.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13786

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
selftest/knownfail.d/local-messaging [deleted file]
source3/lib/messages_dgm.c

diff --git a/selftest/knownfail.d/local-messaging b/selftest/knownfail.d/local-messaging
deleted file mode 100644 (file)
index 46cf30c..0000000
+++ /dev/null
@@ -1 +0,0 @@
-^samba3.smbtorture_s3.LOCAL-MESSAGING-READ3
index cb0c17d6c247a075ae2878c84ffa2ac6771c8ffb..aaafcc103078dc547c8fdc847aac275c2cf44ef7 100644 (file)
@@ -1419,6 +1419,7 @@ int messaging_dgm_send(pid_t pid,
        struct messaging_dgm_context *ctx = global_dgm_context;
        struct messaging_dgm_out *out;
        int ret;
+       unsigned retries = 0;
 
        if (ctx == NULL) {
                return ENOTCONN;
@@ -1426,6 +1427,7 @@ int messaging_dgm_send(pid_t pid,
 
        messaging_dgm_validate(ctx);
 
+again:
        ret = messaging_dgm_out_get(ctx, pid, &out);
        if (ret != 0) {
                return ret;
@@ -1435,6 +1437,20 @@ int messaging_dgm_send(pid_t pid,
 
        ret = messaging_dgm_out_send_fragmented(ctx->ev, out, iov, iovlen,
                                                fds, num_fds);
+       if (ret == ECONNREFUSED) {
+               /*
+                * We cache outgoing sockets. If the receiver has
+                * closed and re-opened the socket since our last
+                * message, we get connection refused. Retry.
+                */
+
+               TALLOC_FREE(out);
+
+               if (retries < 5) {
+                       retries += 1;
+                       goto again;
+               }
+       }
        return ret;
 }