14 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
16 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
18 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
26 <<<samba-kisses-better-selection.jpg,height=.8\textheight>>>
30 ==== Short History ====
33 * 2.0: 1999/01: domain-member, +SWAT
34 * 2.2: 2001/04: NT4-DC
35 * 3.0: 2003/09: AD-member, Samba4 project started
36 * 3.2: 2008/07: GPLv3, experimental clustering
37 * 3.3: 2009/01: clustering
38 * 3.4: 2009/07: merged S3+S4 code
39 * 3.5: 2010/03: experimental SMB 2.0
40 * 3.6: 2011/09: SMB 2.0
41 * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0
42 * 4.1: 2013/10: stability
43 * 4.2: soon: AD trusts, performance, scalability, CTDB included
45 ==== Release Stream ====
49 <<<samba-release-stream_exp.png,width=.8\textwidth>>>
55 <<<samba-team-20141011.png,height=.9\textheight>>>
61 <<<samba-team-20141011-colorized.png,height=.9\textheight>>>
65 ==== Samba File Server Topics / Challenges ====
67 # performance: scalable file server
68 #* scale-up: exhaust powerful boxes
69 #* scale-out: flexible all-active clusters
70 #* scale-down: perform well on low-end boxes
71 # interop: multi-protocol access (nfs, afp, ...)
72 # server workloads / SMB features
73 #* tune for: small \# of connections, threaded applications
75 #* SMB3 (clustering, RDMA, ...)
76 # special file systems support (gluster, ceph, gpfs, btrfs, ...)
77 # cloud / openstack?...
78 %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...)
81 %% ==== Samba File Serving Topics ====
84 %% * Clustering (CTDB)
85 %% * SMB features (SMB3...)
86 %% * Interop (protocols, NFS, AFP, ...)
87 %% * special file systems support (gluster, ceph, gpfs, btrfs...)
90 %%==== Other Samba Topics ====
92 %%* Auth/Domain Member
97 ==== Performance ====[plain]
101 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
104 ==== Performance - low end systems ====
107 <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)}
109 ** didn't saturate 1G nic (arm), CPU 100\%
110 * reduced memory allocations
111 * instrument SMB 2.1 multi-credit / large MTU
113 ** saturates 1G nic (arm), CPU $<$ 100\%
117 ==== Performance - DB performance ====
121 * used for IPC (smbd processes)
122 * cluster (CTDB): local copies
125 <[block]{hot databases}
126 * @locking.tdb@ (open files)
127 * @brlock.tdb@ (byte range locks)
128 * @notify\_index.tdb@ (for change notify)
131 ==== Performance - DB performance ====
134 * fcntl bty range locks for record locks
135 * contention via single kernel spinlock
139 * alternative to fcntl: pthread robust mutexes
140 * ==> massive speedup
141 * ==> included in TDB 1.3.1, Samba 4.2
144 ==== Performance - DB performance ====
148 ** single chain, contended (@locking.tdb@)
149 ** gets fragmented (singly linked)
150 * especially a problem in ctdb-cluster: vacuuming
153 <[block]{improvements}
154 * make use of small per-record freelists (dead records)
155 * add automatic defragmentation upon traversal
156 * ==> included in TDB 1.3.1, Samba 4.2
159 ==== Performance - DB performance ====
161 * change notify not scalable
164 <[block]{first improvement}
165 * restructured @notify.tdb@ to
166 ** global @notify\_index.tdb@ and
167 ** local @notify.tdb@
168 ** ==> better but still not good enough for some workloads
172 * replace DB-approach by new scalable, async notify daemon using messaging
173 * some false positives do not harm
178 ==== Performance - scaling ====
180 <[block]{parellelism}
181 * samba is multi-process:
182 ** smbd child process $\leftrightarrow$ TCP connection
183 ** event-loop in one process
184 * within a smbd process:
185 ** pthread-pool jobs for potentially blocking syscalls
186 ** ==> parallelism for reads/writes
187 ** default for async I/O since Samba 4.0
190 ==== Performance - scaling ====
193 * classical messaging:
194 ** messages.tdb and signals between processes
195 ** does not scale well
196 * new massaging in Samba 4.2:
197 ** fast and scalable messaging based on unix datagram messages
198 ** ==> WIP: integrate with AD/DC messaging
199 ** ==> features fd-passing for sockets (SMB3 multi-channel)
200 ** ==> TODO: integrate into CTDB inter-node-messaging
204 ==== Interop ====[plain]
209 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
213 ==== Interop-Central ====
215 <[block]{multi-protocol access}
216 * nfs (kernel, ganesha, ...)
219 * SMB2+ unix-extensions
223 ==== File Server Layout/Scope ====
226 <<<samba-layers.jpg,height=.8\textheight>>>
230 ==== Interop - Fruit ====
235 * MacOS 10.9: SMB 2.1 preferred file protocol
236 * @vfs\_fruit@ - new module in Samba 4.2
248 ** SMB2 create context
249 ** speed up directory listings
253 <<<apfel_1280.jpg,width=.9\textwidth>>>
263 ==== SMB features ====[plain]
272 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
277 ==== SMB features in Samba - SMB2 ====
284 * SMB 2.0 (Vista / 2008):
285 ** durable file handles [4.0]
286 * SMB 2.1 (Win7 / 2008R2):
287 ** multi-credit / large mtu [4.0]
288 ** dynamic reauthentication [4.0]
290 ** resilient file handles [WIP-tracer]
293 <<<durable-crop-colormod-1024,width=.9\textwidth,right>>>
298 ==== SMB features in Samba - SMB3 ====
305 * SMB 3.0 (Win8 / 2012):
306 ** new crypto (sign/encrypt) [4.0]
307 ** secure negotiation [4.0]
308 ** durable handles v2 [4.0]
309 ** persistent file handles [WIP.tracer]
310 ** multi-channel [WIP+]
311 ** SMB direct [designed/starting]
312 ** cluster features [designing]
314 ** storage features [WIP]
315 * SMB 3.02 (Win8.1 / 2012R2): [WIP]
316 * SMB 3.1 (Win10 / 2014): [ess.DONE]
319 <<<durable-crop-colormod-1024,width=.9\textwidth,right>>>
329 %%==== Clusterd Samba / CTDB (SOFS since 2007) ====
332 %%<<<design-ctdb-three-nodes.png,width=.9\textwidth>>>
342 %%% * new crypto (signing, transport encryption)
343 %%% * persistent file handles
345 %%% * RDMA transport (SMB direct)
346 %%% * storage features
349 %%% ** transparent failover (continuous availability)
350 %%% ** all-active (scale-out)
353 %%% ==== SMB3 - Goals ====
358 %%% * fault tolerance / reliability
359 %%% * performance / throughput / scaling
360 %%% * focus on support for server workloads \\ %
361 %%% (as opposed to workstation workloads)
362 %%% * especially support for:
366 %%% ** replace block storage in data center
367 %%% ** block (SCSI) over SMB
370 %%% ==== Requirements for Hyper-V ====
375 %%% * minimum requirements:
377 %%% ** is that really all??? - maybe resilient file handles..
380 %%% * desired features:
381 %%% ** cluster ($\ge 2$ nodes)
382 %%% ** CA / persistent handles
383 %%% ** RDMA / SMB direct
387 %%% ==== SMB Protocol in Samba ====
395 %%% ** experimental incomplete support for SMB 2.0
397 %%% ** official support for SMB 2.0
398 %%% ** missing: durable handles
399 %%% ** default server max proto: SMB 1
401 %%% ** SMB 2.0: complete with durable handles
402 %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication
403 %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2
404 %%% ** default server max proto: SMB 3.0
406 %%% ** SMB 3.02: basic
409 %%% ==== ==== [plain]
412 %%% Technical Details...
420 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
425 ==== Multi-Channel - Windows/Protocol ====
427 * find interfaces with interface discovery: \\ %
428 @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@
429 * bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind)
430 * windows: uses connections of same (and best quality)
431 * windows: binds only to a single node
432 * replay / retry mechanisms, epoch numbers
434 ==== Multi-Channel - Samba ====
436 * samba/smbd: multi-process
437 ** process $\Leftrightarrow$ tcp connection
438 ** ==> transfer new connection to existing smbd
439 ** use fd-passing (sendmsg/recvmsg)
441 * preparation: messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2]
442 * add fd-passing [DONE,4.2]
443 * transfer connection already in negprot (ClientGUID) [ess.DONE]
444 * implement channel epoch numbers [WIP]
445 * implement interface discovery [WIP]
447 ==== Multi-Channel - Samba ====
450 <<<smb3-mc-samba_exp.png,height=.9\textheight>>>
460 ==== SMB Direct (RDMA) ====
463 ** requires multi-channel
464 ** start with TCP, bind an RDMA channel
465 ** reads and writes use RDMB write/read
466 ** protocol/metadata via send/receive
468 * wireshark dissector: [DONE]
471 ** prereq: multi-channel / fd-passing
472 ** buffer / transport abstractions [TODO]
473 ** _red_problem_: libraries: not fork safe and no fd-passing \\ %
474 ==> central daemon (or kernel module) to serve as RDMA "proxy"
476 ==== SMB Direct (RDMA) - Plan ====
479 <<<smb3-rdma-samba_exp.png,height=.9\textheight>>>
482 %%%==== SMB Direct (RDMA) - Plan ====
485 %%%* smbd-d (rdma proxy daemon)
486 %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@)
487 %%%** listens for RDMA connection (as told by main smbd)
489 %%%** listens for TCP connections
490 %%%** connects to smbd-d-socket
491 %%%*** request rdma-interfaces, tell smbd-d on which to listen
492 %%%** "accepts" new smb-direct connections on smdb-d-socket
495 %%%==== SMB Direct (RDMA) - Plan ====
499 %%%** connects via TCP --> smbd forks child smbd (c)
500 %%%** connects via RDMA to smbd-d
502 %%%** creates socket-pair as rdma-proxy-channel
503 %%%** passes one end of socket-pair to main smbd for accept
504 %%%** sends smb direct packages over proxy-channel
506 %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID
508 %%%** continues proxy-communication with smdb-d
511 %%%* For @rdma\_read@ and @rdma\_write@:
512 %%%** c and smbd-d establish shared memory area
516 %%% ==== Persistent Handles ====
521 %%% * like durable file handles with strong guarantees
522 %%% * framework is already there in samba (by support for durable v2)
523 %%% ** ==> easy to satisfy at the protocol level
526 %%% * the difficulty lies in implementing the guarantees
527 %%% ** need make metadata persistent
528 %%% ** but don't kill performance!
529 %%% ** persistent tdbs !would! kill performance
531 %%% *** need to be sync
532 %%% *** record-level transactions (instead of db-level)
533 %%% *** only replicate to some nodes, not all
537 %%==== Clustering Concepts (Windows) ====
543 %%** (``traditional'') failover cluster (active-passive)
544 %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@
546 %%*** runs off a cluster (failover) volume
547 %%*** offers the Witness service
550 %%* Scale-Out (SOFS):
551 %%** scale-out cluster (all-active!)
552 %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@
553 %%** no client caching
554 %%** Windows: runs off a cluster shared volume (implies cluster)
557 %%* Continuous Availability (CA):
558 %%** transparent failover, persistent handles
559 %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
560 %%** can independently turned on on any cluster share (failover or scale-out)
561 %%** ==> changed client retry behaviour!
564 %%% ==== Clustering -- Controlling Flags from Windows ====
569 %%% * a share on a cluster carries
570 %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume.
573 %%% * a share on a cluster carries
574 %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV
575 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
578 %%% * independently settable on a clustered share:
579 %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
580 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
584 %%==== Clustering -- Server Behaviour ====
589 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
590 %%** run witness service (RPC)
591 %%** client can register and get notified about resource changes
594 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
595 %%** do not grant batch oplocks, write leases, handle leases
596 %%** ==> no durable handles unless also CA
599 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
600 %%** offer persistent handles
601 %%** timeout from durable v2 request
605 %%==== Clustering -- Client Behaviour (Win8) ====
611 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
612 %%** clients happily work if witness is not available
615 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
616 %%** clients happily connect if @CLUSTER@ is not set.
617 %%** clients DO request oplocks/leases/durable handles
618 %%** clients are not confused if they get these
621 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
622 %%** clients happily connect if @CLUSTER@ is not set.
623 %%** clients typically request persistent handle with RWH lease
628 %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ %
629 %%%$\Leftrightarrow$ \\ %
630 %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@.
633 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
636 %%% * Test: Win8 against slightly pimped Samba (2 IPs)
639 %%% * Server-Matrix (on/off):
640 %%% ** persistent handle cap
641 %%% ** durable handles
642 %%% ** cluster share cap
648 %%% ** connect to share with explorer
649 %%% ** start copying file (2G)
651 %%% ** wait for the client to pop up an error dialog
656 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
659 %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA
663 %%% ** 3 consecutive attempt rounds:
664 %%% *** for each of the two IPs: \\ %
666 %%% three tcp syn attempts to IP with 0.5 sec breaks
667 %%% ** ==> some 2.1 seconds for 1 round
668 %%% ** between attempts:
669 %%% ** dns, ping, arp ... 5.8 seconds
670 %%% ** ==> _red_18 seconds_
674 %%% ** retries attempt rounds from above for _red_14 minutes_
684 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
689 %%==== Clustering with Samba/CTDB ====
692 %%* all-active SMB-cluster with Samba and CTDB... \\ %
693 %%+<3->{...since 2007! \smiley }
696 %%* transparent for the client
698 %%*** metadata and messaging engine for Samba in a cluster
699 %%*** plus cluster resource manager (IPs, services...)
700 %%** client only sees one ``big'' SMB server
701 %%** we could not change the client!...
702 %%** works ``well enough''
706 %%** how to integrate SMB3 clustering with Samba/CTDB
707 %%** good: rather orthogonal
708 %%** ctdb-clustering transparent mostly due to management
711 %%==== Witness Service ====
715 %%** monitoring of availability of resources (shares, NICs)
716 %%** server asks client to move to another resource
720 %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@
721 %%** but clients happily connect w/o witness
724 %%* status in Samba [WIP (Metze, Gregor Beck)]:
725 %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk)
726 %%** wireshark dissector: essentially done
727 %%** client: in @rpcclient@ - done
728 %%** server: dummy PoC / tracer bullet implementation done
729 %%** CTDB: changes / integration needed
737 %%% !@https://wiki.samba.org/index.php/SMB3@!
747 %%% [[[.6\textwidth]]]
749 %%% [[[.3\textwidth]]]
750 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
756 ==== Misc ====[plain]
761 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
766 <[block]{File Systems}
767 * gpfs, gluster, ceph, btrfs...
768 * support through vfs modules
769 * fuse-based: avoid context switches
770 * instrument SMB3 storage features (fsctls)
775 %%<[block]{Under the hood}
776 %%* restructurings, reconsilations
777 %%* ctdb moved into samba tree
778 %%* published libs: talloc, tdb, tevent ...
782 * unprivileged selftest, autobuild
783 * selfcontained testing: wrapper
787 ** resolv wrapper [_red_new_]
788 * externalized as separate projects:
789 ** ==> @http://cwrap.org/@
791 ** ==> Andreas Schneider's talk
795 ==== Forecast: Cloudy ====
797 <[block]{Possible involvement with OpenStack}
798 * SMB storage service for Windows (and other) VMs
799 * SMB3 storage backend for Hyper-V images
800 * also: chances for AD-integration into auth
805 <[block]{especially but not exclusively}
815 ==== Conclusion ====[plain]
819 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
825 * Samba 4.X is quite different from 3.Y
828 <[block]{What's coming?}
829 * Performance: the story continues
830 * Interop: strengthen strenths
831 * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...)
832 * Some clouds in the sky...
836 ==== Thanks for your attention! ====[plain]
855 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>