This fixes the following condition: On a cluster with many nodes a single
node
was running as the only node. Any transaction that was attempted against the
persistent databases hung. From a former run of the cluster
"__transaction_lock__" existed in the persistent databases, but with a dmaster
entry in the ctdb header that was not the local node. When the
transaction_start code tried to acquire this, ctdb queued the dmaster request
to a node that does not exist, hanging forever.
I though -- wait a second, why has nobody found this yet with non-persistent
databases? Answer: Non-persistent databases are opened with CLEAR_IF_FIRST,
which means that all records are locally deleted when ctdb attaches to it.
This wipe does not happen for persistent databases, but we have this one
__transaction_lock__ record around that is treated like a non-persistent
database. This patch treats the __transaction_lock__ for persistent db's
specially: It deletes it locally when ctdbd attaches to the db.
ctdb_check_db_empty(ctdb_db);
}
+ if (persistent) {
+ TDB_DATA transaction_key;
+ transaction_key.dptr = discard_const(
+ CTDB_TRANSACTION_LOCK_KEY);
+ transaction_key.dsize = strlen(CTDB_TRANSACTION_LOCK_KEY);
+ tdb_delete(ctdb_db->ltdb->tdb, transaction_key);
+ }
+
DLIST_ADD(ctdb->db_list, ctdb_db);
/* setting this can help some high churn databases */