Michael Adam [Wed, 13 Jan 2010 22:53:54 +0000 (23:53 +0100)]
s3:dbwrap_ctdb: exit early when nothing has been written in transaction_commit.
This skips update of the __db_sequence_number__ record when nothing else has
been written. There are transactions that are just openend and then nothing
is written until transaction_commit is called. This is for instance the case
with registry initialization routines: They start a transaction and only
write somthing when the registry has not been initialized yet.
So this change will skip many db_seqnum bumps and TRANS3_COMMIT roundtrips.
Michael Adam [Wed, 13 Jan 2010 22:51:34 +0000 (23:51 +0100)]
s3:dbwrap_ctdb: fix brown paperbag bug in ctdb_transaction_commit.
I carefully prepared the return value only to "return 0;" at the bottom. :-(
This may well have hit us for instance in the nested cancel case
and produced random errors.
Michael Adam [Tue, 5 Jan 2010 23:37:21 +0000 (00:37 +0100)]
s3:dbwrap_ctdb: fix logic error in pull_newest_from_marshall_buffer().
The logic bug was that if a record was found in the marshall buffer,
then always the ctdb header of tha last record in the marshall buffer
was returned, and not the ctdb header of the last occurrence of the
requested record.
This is fixed by introducing an additional temporary variable.
Michael Adam [Fri, 11 Dec 2009 13:07:28 +0000 (14:07 +0100)]
s3:dbwrap_ctdb: maintain a database sequence number that bumps in transactions
For persistent databases, 64bit integer is kept in a special record
__db_sequence_number__. This record is incremented with each completed
transaction.
The retry mechanism for failing TRANS3_COMMIT controls inside the
db_ctdb_transaction_commit() function now relies one a modified
behaviour of ctdbd's treatment of persistent databases in recoveries.
Recently, a special treatment for persistent databases had been
introduced in ctdb (1.0.108) to work around the problems with the
orinal design of persistent transactions.
Now with the rewrite we need to revert to the old behaviour that
ctdb always takes the newest copies of all records.
This change also paves the way for a next step, which will make
recovery use the db seqnum to tell which node has the newest copy
of a persistent db and use that node's copy. This will greatly
reduce the amount of data transferred with each recovery.
Michael Adam [Thu, 3 Dec 2009 16:29:54 +0000 (17:29 +0100)]
s3:dbwrap_ctdb: start rewrite of transactions using the global lock (g_lock)
This simplifies the transaction code a lot:
* transaction_start essentially consists of acquiring a global lock.
* No write operations at all are performed on the local database
until the transaction is committed: Every store operation is just
going into the marshall buffer.
* The commit operation calls a new simplified TRANS3_COMMIT control
in ctdb which rolls out thae changes to all nodes including the
node that is performing the transaction.
Volker Lendecke [Sun, 25 Oct 2009 15:12:12 +0000 (16:12 +0100)]
s3: Implement global locks in a g_lock tdb
This is the basis to implement global locks in ctdb without depending on a
shared file system. The initial goal is to make ctdb persistent transactions
deterministic without too many timeouts.
(cherry picked from commit 4c1c3f2549f32fd069e0e7bf3aec299213f1e85b)
Andrew Tridgell [Sat, 6 Feb 2010 05:08:56 +0000 (21:08 -0800)]
s3-smbd: add a rate limited cleanup of brl, connections and locking db
On unclean shutdown we can end up with stale entries in the brlock,
connections and locking db. Previously we would do the cleanup on
every unclean exit, but that can cause smbd to be completely
unavailable for several minutes when a large number of child smbd
processes exit.
This adds a rate limited cleanup of the databases, with the default
that cleanup happens at most every 20s
(cherry picked from commit dd498d2eecf124a03b6117ddab892a1112f9e9db)
The last 4 patches address bug #7312 (Many disconnecting clients renders
clustered samba unusuable for some time).
Andrew Tridgell [Sat, 6 Feb 2010 04:59:43 +0000 (20:59 -0800)]
s3-brlock: add a minimim retry time for pending blocking locks
When we are waiting on a pending byte range lock, another smbd might
exit uncleanly, and therefore not notify us of the removal of the
lock, and thus not trigger the lock to be retried.
We coped with this up to now by adding a message_send_all() in the
SIGCHLD and cluster reconfigure handlers to send a MSG_SMB_UNLOCK to
all smbd processes. That would generate O(N^2) work when a large
number of clients disconnected at once (such as on a network outage),
which could leave the whole system unusable for a very long time (many
minutes, or even longer).
By adding a minimum re-check time for pending byte range locks we
avoid this problem by ensuring that pending locks are retried at a
more regular interval.
(cherry picked from commit 5b398edbee672392f2cea260ab17445ecca927d7)
Andrew Tridgell [Sat, 6 Feb 2010 03:14:45 +0000 (19:14 -0800)]
s3-events: make the old timed events compatible with tevent
tevent ensures that a timed event is only called once. The old events
code relied on the called handler removing the event itself. If the
handler removed the event after calling a function which invoked the
event loop then the timed event could loop forever.
This change makes the two timed event systems more compatible, by
allowing the handler to free the te if it wants to, but ensuring it is
off the linked list of events before the handler is called, and
ensuring it is freed even if the handler doesn't free it.
(cherry picked from commit 5dbf175c75bd6139f3238f36665000641f7f7f79)
s3:rpc_client: don't mix layers and keep a reference to cli_state in the caller
We should not rely on the backend to have a reference to the cli_state.
This will make it possible for the backend to set its cli_state reference
to NULL, when the transport is dead.
Note that before this change pidl generated code that just dereferenced size_is
and length_is values from unique pointers without checking whether these
pointers were actually NULL.
With this change, pidl now throws a warning like:
warning: Got pointer for `data_size', expected fully derefenced variable
which is not correct, probably because pidl does not evaluate the C expression.
talloc_stack: make sure we never let talloc_tos() return ts->talloc_stack[-1]
In smbd there's a small gab between TALLOC_FREE(frame); before
we call smbd_parent_loop() where we don't have a valid talloc stackframe.
smbd_parent_loop() calls talloc_stackframe() only within the while(1) loop.
As DEBUG(2,("waiting for connections")) uses talloc_tos() to construct
the time header for the debug message we crash on some systems.
Jeff Layton [Tue, 26 Jan 2010 13:36:11 +0000 (08:36 -0500)]
mount.cifs: don't allow it to be run as setuid root program
mount.cifs has been the subject of several "security" fire drills due to
distributions installing it as a setuid root program. This program has
not been properly audited for security and the Samba team highly
recommends that it not be installed as a setuid root program at this
time.
To make that abundantly clear, this patch forcibly disables the ability
for mount.cifs to run as a setuid root program. People are welcome to
trivially patch this out, but they do so at their own peril.
A security audit and redesign of this program is in progress and we hope
that we'll be able to remove this in the near future.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
The last 3 patches address bug #6853 (mount.cifs race that allows user to
replace mountpoint with a symlink).
Jeff Layton [Tue, 26 Jan 2010 13:36:03 +0000 (08:36 -0500)]
mount.cifs: check for invalid characters in device name and mountpoint
It's apparently possible to corrupt the mtab if you pass embedded
newlines to addmntent. Apparently tabs are also a problem with certain
earlier glibc versions. Backslashes are also a minor issue apparently,
but we can't reasonably filter those.
Make sure that neither the devname or mountpoint contain any problematic
characters before allowing the mount to proceed.
Jeff Layton [Tue, 26 Jan 2010 13:35:35 +0000 (08:35 -0500)]
mount.cifs: take extra care that mountpoint isn't changed during mount
It's possible to trick mount.cifs into mounting onto the wrong directory
by replacing the mountpoint with a symlink to a directory. mount.cifs
attempts to check the validity of the mountpoint, but there's still a
possible race between those checks and the mount(2) syscall.
To guard against this, chdir to the mountpoint very early, and only deal
with it as "." from then on out.
Simo Sorce [Mon, 1 Mar 2010 19:50:50 +0000 (14:50 -0500)]
s3:ads fix dn parsing name was always null
While there also use ldap_exploded_dn instead of ldb_dn_validate()
so we can remove a huge dependency that is hanging there only for one very
minor marginal use.
Jeremy Allison [Fri, 19 Feb 2010 22:18:51 +0000 (14:18 -0800)]
First part of fix for bug #7159 - client rpc_transport doesn't cope with bad server data returns.
Ensure that subreq is *always* talloc_free'd in the _done
function, as it has an event timeout attached. If the
read requests look longer than the cli->timeout, then
the timeout fn is called with already freed data.