Delay reusing ids to make protocol more robust idtree
authorRusty Russell <rusty@rustcorp.com.au>
Wed, 9 Jun 2010 23:28:55 +0000 (08:58 +0930)
committerRusty Russell <rusty@rustcorp.com.au>
Wed, 9 Jun 2010 23:28:55 +0000 (08:58 +0930)
commit9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed
tree81153313cb21557025e749a68d637d29128c1b54
parent32c04e11ebbcf8239e47016302c6ce802a8b0a6f
Delay reusing ids to make protocol more robust

Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
client/ctdb_client.c
common/ctdb_util.c
include/ctdb_private.h