Delay reusing ids to make protocol more robust
authorRusty Russell <rusty@rustcorp.com.au>
Wed, 9 Jun 2010 23:28:55 +0000 (08:58 +0930)
committerRusty Russell <rusty@rustcorp.com.au>
Wed, 30 Jun 2010 05:21:05 +0000 (14:51 +0930)
commit9b4884e0bad3b23a8cf32ff19dc9bb8b26436e2d
tree52a06f0058a2c618cf6110172d6dec96ad40fec3
parent7d4658d3fc09560ccf16b304ffdb5391a2b48f72
Delay reusing ids to make protocol more robust

Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
client/ctdb_client.c
common/ctdb_util.c
include/ctdb_private.h