README

   1 INTRODUCTION
   2 ============
   3
   4 Autocluster is set of scripts for building virtual clusters to test
   5 clustered Samba.  It uses Linux's libvirt and KVM virtualisation
   6 engine.
   7
   8 Autocluster is a collection of scripts, template and configuration
   9 files that allow you to create a cluster of virtual nodes very
  10 quickly.  You can create a cluster from scratch in less than 30
  11 minutes.  Once you have a base image you can then recreate a cluster
  12 or create new virtual clusters in minutes.
  13
  14 The current implementation creates virtual clusters of RHEL5 nodes.
  15
  16
  17 CONTENTS
  18 ========
  19
  20 * INSTALLING AUTOCLUSTER
  21
  22 * HOST MACHINE SETUP
  23
  24 * CREATING A CLUSTER
  25
  26 * BOOTING A CLUSTER
  27
  28 * POST-CREATION SETUP
  29
  30 * CONFIGURATION
  31
  32 * DEVELOPMENT HINTS
  33
  34
  35 INSTALLING AUTOCLUSTER
  36 ======================
  37
  38 Before you start, make sure you have the latest version of
  39 autocluster. To download autocluster do this:
  40
  41   git clone git://git.samba.org/tridge/autocluster.git autocluster
  42
  43 Or to update it, run "git pull" in the autocluster directory
  44
  45 You probably want to add the directory where autocluster is installed
  46 to your PATH, otherwise things may quickly become tedious.
  47
  48
  49 HOST MACHINE SETUP
  50 ==================
  51
  52 This section explains how to setup a host machine to run virtual
  53 clusters generated by autocluster.
  54
  55
  56  1) Install and configure required software.
  57
  58  a) Install kvm, libvirt and expect.
  59
  60     Autocluster creates virtual machines that use libvirt to run under
  61     KVM.  This means that you will need to install both KVM and
  62     libvirt on your host machine.  Expect is used by the "waitfor"
  63     script and should be available for installation form your
  64     distribution.
  65
  66     For various distros:
  67
  68     * RHEL/CentOS
  69
  70       Autocluster should work with the standard RHEL6 qemu-kvm and
  71       libvirt packages.  However, RHEL's KVM doesn't support the SCSI
  72       emulation, so you will need these settings:
  73
  74         SYSTEM_DISK_TYPE=ide
  75         SHARED_DISK_TYPE=virtio
  76         KVM=/usr/libexec/qemu-kvm
  77
  78       For RHEL5/CentOS5, useful packages for both kvm and libvirt used
  79       to be found here:
  80
  81         http://www.lfarkas.org/linux/packages/centos/5/x86_64/
  82
  83       However, since recent versions of RHEL5 ship with KVM, 3rd party
  84       KVM RPMs for RHEL5 are now scarce.
  85
  86       RHEL5.4's KVM also has problems when autocluster uses virtio
  87       shared disks, since multipath doesn't notice virtio disks.  This
  88       is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
  89       be able to use the settings recommended above for RHEL6.
  90
  91       If you're still running RHEL5.4, you have lots of time, you have
  92       lots of disk space and you like complexity then see the sections
  93       below on "iSCSI shared disks" and "Raw IDE system disks".
  94
  95     * Fedora Core
  96
  97       Useful packages ship with Fedora Core 10 (Cambridge) and later.
  98       Some of the above notes on RHEL might apply to Fedora Core's
  99       KVM.
 100
 101     * Ubuntu
 102
 103       Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
 104       In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
 105       package is called "qemu-kvm".  Older versions have a package
 106       called "kvm".
 107
 108     For other distributions you'll have to backport distro sources or
 109     compile from upstream source as described below.
 110
 111     * For KVM see the "Downloads" and "Code" sections at:
 112
 113         http://www.linux-kvm.org/
 114
 115     * For libvirt see:
 116
 117         http://libvirt.org/
 118
 119  b) Install guestfish or qemu-nbd and nbd-client.
 120
 121     Recent Linux distributions, including RHEL6.0, contain guestfish.
 122     Guestfish (see http://libguestfs.org/ - there are binary packages
 123     for several distros here) is a CLI for manipulating KVM/QEMU disk
 124     images.  Autocluster supports guestfish, so if guestfish is
 125     available then you should use it.  It should be more reliable than
 126     NBD.
 127
 128     Guestfish isn't yet the default autocluster method for disk image
 129     manipulation.  To use it put this in your configuration file:
 130
 131       SYSTEM_DISK_ACCESS_METHOD=guestfish
 132
 133     Note that autocluster's guestfish support is new and was written
 134     to work around some bugs in RHEL6.0's version of guestfish... so
 135     might not work well with newer, non-buggy versions.  If so, please
 136     report bugs!
 137
 138     If you can't use guestfish then you'll have to use NBD.  For this
 139     you will need the qemu-nbd and nbd-client programs, which
 140     autocluster uses to loopback-nbd-mount the disk images when
 141     configuring each node.
 142
 143     NBD for various distros:
 144
 145     * RHEL/CentOS
 146
 147       qemu-nbd is only available in the old packages from lfarkas.org.
 148       Recompiling the RHEL5 kvm package to support NBD is quite
 149       straightforward.  RHEL6 doesn't have an NBD kernel module, so is
 150       harder to retrofit for NBD support - use guestfish instead.
 151
 152       Unless you can find an RPM for nbd-client then you need to
 153       download source from:
 154
 155         http://sourceforge.net/projects/nbd/
 156
 157       and build it.
 158
 159     * Fedora Core
 160
 161       qemu-nbd is in the qemu-kvm or kvm package.
 162
 163       nbd-client is in the nbd package.
 164
 165     * Ubuntu
 166
 167       qemu-nbd is in the qemu-kvm or kvm package.  In older releases
 168       it is called kvm-nbd, so you need to set the QEMU_NBD
 169       configuration variable.
 170
 171       nbd-client is in the nbd-client package.
 172
 173     * As mentioned above, nbd can be found at:
 174
 175         http://sourceforge.net/projects/nbd/
 176
 177  c) Environment and libvirt virtual networks
 178
 179     You will need to add the autocluster directory to your PATH.
 180
 181     You will need to configure the right kvm networking setup. The
 182     files in host_setup/etc/libvirt/qemu/networks/ should help. This
 183     command will install the right networks for kvm:
 184
 185        rsync -av --delete host_setup/etc/libvirt/qemu/networks/ /etc/libvirt/qemu/networks/
 186
 187     Note that you'll need to edit the installed files to reflect any
 188     changes to IPBASE, IPNET0, IPNET1, IPNET2 away from the defaults.
 189     This is also true for named.conf.local and squid.conf (see below).
 190
 191     After this you might need to reload libvirt:
 192
 193       /etc/init.d/libvirt reload
 194
 195     or similar.
 196
 197     You might also need to set:
 198
 199       VIRSH_DEFAULT_CONNECT_URI=qemu:///system
 200
 201     in your environment so that virsh does KVM/QEMU things by default.
 202
 203  2) You need a caching web proxy on your local network. If you don't
 204     have one, then install a squid proxy on your host. See
 205     host_setup/etc/squid/squid.conf for a sample config suitable for a
 206     virtual cluster. Make sure it caches large objects and has plenty
 207     of space. This will be needed to make downloading all the RPMs to
 208     each client sane
 209
 210     To test your squid setup, run a command like this:
 211
 212       http_proxy=http://10.0.0.1:3128/ wget <some-url>
 213
 214     Check your firewall setup.  If you have problems accessing the
 215     proxy from your nodes (including from kickstart postinstall) then
 216     check it again!  Some distributions install nice "convenient"
 217     firewalls by default that might block access to the squid port
 218     from the nodes.  On a current version of Fedora Core you may be
 219     able to run system-config-firewall-tui to reconfigure the
 220     firewall.
 221
 222  3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
 223     sample config that is suitable. It needs to redirect DNS queries
 224     for your virtual domain to your windows domain controller
 225
 226  4) Download a RHEL install ISO.
 227
 228
 229 CREATING A CLUSTER
 230 ==================
 231
 232 A cluster comprises a single base disk image, a copy-on-write disk
 233 image for each node and some XML files that tell libvirt about each
 234 node's virtual hardware configuration.  The copy-on-write disk images
 235 save a lot of disk space on the host machine because they each use the
 236 base disk image - without them the disk image for each cluster node
 237 would need to contain the entire RHEL install.
 238
 239 The cluster creation process can be broken down into 2 mains steps:
 240
 241  1) Creating the base disk image.
 242
 243  2) Create the per-node disk images and corresponding XML files.
 244
 245 However, before you do this you will need to create a configuration
 246 file.  See the "CONFIGURATION" section below for more details.
 247
 248 Here are more details on the "create cluster" process.  Note that
 249 unless you have done something extra special then you'll need to run
 250 all of this as root.
 251
 252  1) Create the base disk image using:
 253
 254       ./autocluster create base
 255
 256     The first thing this step does is to check that it can connect to
 257     the YUM server.  If this fails make sure that there are no
 258     firewalls blocking your access to the server.
 259
 260     The install will take about 10 to 15 minutes and you will see the
 261     packages installing in your terminal
 262
 263     The installation process uses kickstart.  If your configuration
 264     uses a SoFS release then the last stage of the kickstart
 265     configuration will be a postinstall script that installs and
 266     configures packages related to SoFS.  The choice of postinstall
 267     script is set using the POSTINSTALL_TEMPLATE variable, allowing you
 268     to adapt the installation process for different types of clusters.
 269
 270     It makes sense to install packages that will be common to all
 271     nodes into the base image.  This save time later when you're
 272     setting up the cluster nodes.  However, you don't have to do this
 273     - you can set POSTINSTALL_TEMPLATE to "" instead - but then you
 274     will lose the quick cluster creation/setup that is a major feature
 275     of autocluster.
 276
 277     When that has finished you should mark that base image immutable
 278     like this:
 279
 280       chattr +i /virtual/ac-base.img
 281
 282     That will ensure it won't change. This is a precaution as the
 283     image will be used as a basis file for the per-node images, and if
 284     it changes your cluster will become corrupt
 285
 286  2) Now run "autocluster create cluster" specifying a cluster
 287     name. For example:
 288
 289       autocluster create cluster c1
 290
 291     This will create and install the XML node descriptions and the
 292     disk images for your cluster nodes, and any other nodes you have
 293     configured.  Each disk image is initially created as an "empty"
 294     copy-on-write image, which is linked to the base image.  Those
 295     images are then attached to using guestfish or
 296     loopback-nbd-mounted, and populated with system configuration
 297     files and other potentially useful things (such as scripts).
 298
 299
 300 BOOTING A CLUSTER
 301 =================
 302
 303 At this point the cluster has been created but isn't yet running.
 304 Autocluster provides a command called "vircmd", which is a thin
 305 wrapper around libvirt's virsh command.  vircmd takes a cluster name
 306 instead of a node/domain name and runs the requested command on all
 307 nodes in the cluster.
 308
 309  1) Now boot your cluster nodes like this:
 310
 311       vircmd start c1
 312
 313     The most useful vircmd commands are:
 314
 315       start    : boot a node
 316       shutdown : graceful shutdown of a node
 317       destroy  : power off a node immediately
 318
 319  2) You can watch boot progress like this:
 320
 321        tail -f /var/log/kvm/serial.c1*
 322
 323     All the nodes have serial consoles, making it easier to capture
 324     kernel panic messages and watch the nodes via ssh
 325
 326
 327 POST-CREATION SETUP
 328 ===================
 329
 330 Now you have a cluster of nodes, which might have a variety of
 331 packages installed and configured in a common way.  Now that the
 332 cluster is up and running you might need to configure specialised
 333 subsystems like GPFS or Samba.  You can do this by hand or use the
 334 sample scripts/configurations that are provided
 335
 336  1)  Now you can ssh into your nodes. You may like to look at the
 337      small set of scripts in /root/scripts on the nodes for
 338      some scripts. In particular:
 339
 340        mknsd.sh           :  sets up the local shared disks as GPFS NSDs
 341        setup_gpfs.sh      :  sets up GPFS, creates a filesystem etc
 342        setup_samba.sh     :  sets up Samba and many other system compoents
 343        setup_tsm_server.sh:  run this on the TSM node to setup the TSM server
 344        setup_tsm_client.sh:  run this on the GPFS nodes to setup HSM
 345
 346      To setup a SoFS system you will normally need to run
 347      setup_gpfs.sh and setup_samba.sh.
 348
 349  2)  If using the SoFS GUI, then you may want to lower the memory it
 350      uses so that it fits easily on the first node. Just edit this
 351      file on the first node:
 352
 353        /opt/IBM/sofs/conf/overrides/sofs.javaopt
 354
 355  3)  For automating the SoFS GUI, you may wish to install the iMacros
 356      extension to firefox, and look at some sample macros I have put
 357      in the imacros/ directory of autocluster. They will need editing
 358      for your environment, but they should give you some hints on how
 359      to automate the final GUI stage of the installation of a SoFS
 360      cluster.
 361
 362
 363 CONFIGURATION
 364 =============
 365
 366 Basics
 367 ======
 368
 369 Autocluster uses configuration files containing Unix shell style
 370 variables.  For example,
 371
 372   FIRSTIP=30
 373
 374 indicates that the last octet of the first IP address in the cluster
 375 will be 30.  If an option contains multiple words then they will be
 376 separated by underscores ('_'), as in:
 377
 378   ISO_DIR=/data/ISOs
 379
 380 All options have an equivalent command-line option, such
 381 as:
 382
 383   --firstip=30
 384
 385 Command-line options are lowercase.  Words are separated by dashes
 386 ('-'), as in:
 387
 388   --iso-dir=/data/ISOs
 389
 390 Normally you would use a configuration file with variables so that you
 391 can repeat steps easily.  The command-line equivalents are useful for
 392 trying things out without resorting to an editor.  You can specify a
 393 configuration file to use on the autocluster command-line using the -c
 394 option.  For example:
 395
 396   autocluster -c config-foo create base
 397
 398 If you don't provide a configuration variable then autocluster will
 399 look for a file called "config" in the current directory.
 400
 401 You can also use environment variables to override the default values
 402 of configuration variables.  However, both command-line options and
 403 configuration file entries will override environment variables.
 404
 405 Potentially useful information:
 406
 407 * Use "autocluster --help" to list all available command-line options
 408   - all the items listed under "configuration options:" are the
 409   equivalents of the settings for config files.  This output also
 410   shows descriptions of the options.
 411
 412 * You can use the --dump option to check the current value of
 413   configuration variables.  This is most useful when used in
 414   combination with grep:
 415
 416     autocluster --dump | grep ISO_DIR
 417
 418   In the past we recommended using --dump to create initial
 419   configuration file.  Don't do this - it is a bad idea!  There are a
 420   lot of options and you'll create a huge file that you don't
 421   understand and can't debug!
 422
 423 * Configuration options are defined in config.d/*.defconf.  You
 424   shouldn't need to look in these files... but sometimes they contain
 425   comments about options that are too long to fit into help strings.
 426
 427 Keep it simple
 428 ==============
 429
 430 * I recommend that you aim for the smallest possible configuration file.
 431   Perhaps start with:
 432
 433     FIRSTIP=<whatever>
 434
 435   and move on from there.
 436
 437 * Use the --with-release option on the command-line or the
 438   with_release function in a configuration file to get default values
 439   for building virtual clusters for releases of particular "products".
 440   Currently there are only release definitions for SoFS.
 441
 442   For example, you can setup default values for SoFS-1.5.3 by running:
 443
 444     autocluster --with-release=SoFS-1.5.3 ...
 445
 446   Equivalently you can use the following syntax in a configuration
 447   file:
 448
 449     with_release "SoFS-1.5.3"
 450
 451   So the smallest possible config file would have something like this
 452   as the first line and would then set FIRSTIP:
 453
 454     with_release "SoFS-1.5.3"
 455
 456     FIRSTIP=<whatever>
 457
 458   Add other options as you need them.
 459
 460   The release definitions are stored in releases/*.release.  The
 461   available releases are listed in the output of "autocluster --help".
 462
 463   NOTE: Occasionally you will need to consider the position of
 464   with_release in your configuration.  If you want to override options
 465   handled by a release definition then you will obviously need to set
 466   them later in your configuration.  This will be the case for most
 467   options you will want to set.  However, some options will need to
 468   appear before with_release so that they can be used within a release
 469   definition - the most obvious one is the (rarely used) RHEL_ARCH
 470   option, which is used in the default ISO setting for each release.
 471   If things don't work as expected use --dump to confirm that
 472   configuration variables have the values that you expect.
 473
 474 * The NODES configuration variable controls the types of nodes that
 475   are created.  At the time of writing, the default value is:
 476
 477     NODES="rhel_base:0-3"
 478
 479   This means that you get 4 nodes, at IP offsets 0, 1, 2, & 3 from
 480   FIRSTIP, all part of the CTDB cluster.  That is, with standard
 481   settings and FIRSTIP=35, 4 nodes will be created in the IP range
 482   10.0.0.35 to 10.0.0.38.
 483
 484   The SoFS releases use a default of:
 485
 486     NODES="tsm_server:0 sofs_gui:1 sofs_front:2-4"
 487
 488   which should produce a set of nodes the same as the old SoFS
 489   default.  You can add extra rhel_base nodes if you need them for
 490   test clients or some other purpose:
 491
 492     NODES="$NODES rhel_base:7,8"
 493
 494   This produces an additional 2 base RHEL nodes at IP offsets 7 & 8
 495   from FIRSTIP.  Since sofs_* nodes are present, these base nodes will
 496   not be part of the CTDB cluster - they're just extra.
 497
 498   For many standard use cases the nodes specified by NODES can be
 499   modified by setting NUMNODES, WITH_SOFS_GUI and WITH_TSM_NODE.
 500   However, these options can't be used to create nodes without
 501   specifying IP offsets - except WITH_TSM_NODE, which checks to see if
 502   IP offset 0 is vacant.  Therefore, for many uses you can ignore the
 503   NODES variable.
 504
 505   However, NODES is the recommended mechanism for specifying the nodes
 506   that you want in your cluster.  It is powerful, easy to read and
 507   centralises the information in a single line of your configuration
 508   file.
 509
 510 iSCSI shared disks
 511 ==================
 512
 513 The RHEL5 version of KVM does not support the SCSI block device
 514 emulation.  Therefore, you can use either virtio or iSCSI shared
 515 disks.  Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
 516 virtio block devices are not supported by the version of multipath in
 517 RHEL5.  So this leaves iSCSI as the only choice.
 518
 519 The main configuration options you need for iSCSI disks are:
 520
 521   SHARED_DISK_TYPE=iscsi
 522   NICMODEL=virtio        # Recommended for performance
 523   add_extra_package iscsi-initiator-utils
 524
 525 Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
 526 iSCSI shared disks because KVM doesn't (need to) know about them.
 527
 528 You will need to install the scsi-target-utils package on the host
 529 system.  After creating a cluster, autocluster will print a message
 530 that points you to a file tmp/iscsi.$CLUSTER - you need to run the
 531 commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
 532 booting your cluster.  This will remove any old target with the same
 533 ID, and create the new target, LUNs and ACLs.
 534
 535 You can use the following command to list information about the
 536 target:
 537
 538   tgtadm --lld iscsi --mode target --op show
 539
 540 If you need multiple clusters using iSCSI on the same host then each
 541 cluster will need to have a different setting for ISCSI_TID.
 542
 543 Raw IDE system disks
 544 ====================
 545
 546 The RHEL5 version of KVM does not support the SCSI block device
 547 emulation.  Therefore, you can use virtio or ide system disks.
 548 However, writeback caching, qcow2 and virtio are incompatible and
 549 result in I/O corruption.  So, you can use either virtio system disks
 550 without any caching, accepting reduced performance, or you can use IDE
 551 system disks with writeback caching, with nice performance.
 552
 553 For IDE disks, here are the required settings:
 554
 555   SYSTEM_DISK_TYPE=ide
 556   SYSTEM_DISK_PREFIX=hd
 557   SYSTEM_DISK_CACHE=writeback
 558
 559 The next problem is that RHEL5's KVM does not include qemu-nbd.  The
 560 best solution is to build your own qemu-nbd and stop reading this
 561 section.
 562
 563 If, for whatever reason, you're unable to build your own qemu-nbd,
 564 then you can use raw, rather than qcow2, system disks.  If you do this
 565 then you need significantly more disk space (since the system disks
 566 will be *copies* of the base image) and cluster creation time will no
 567 longer be pleasantly snappy (due to the copying time - the images are
 568 large and a single copy can take several minutes).  So, having tried
 569 to warn you off this option, if you really want to do this then you'll
 570 need these settings:
 571
 572   SYSTEM_DISK_FORMAT=raw
 573   BASE_FORMAT=raw
 574
 575 Note that if you're testing cluster creation with iSCSI shared disks
 576 then you should find a way of switching off raw disks.  This avoids
 577 every iSCSI glitch costing you a lot of time while raw disks are
 578 copied.
 579
 580 DEVELOPMENT HINTS
 581 =================
 582
 583 The -e option provides support for executing arbitrary bash code.
 584 This is useful for testing and debugging.
 585
 586 One good use of this option is to test template substitution using the
 587 function substitute_vars().  For example:
 588
 589   ./autocluster --with-release=SoFS-1.5.3 -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
 590
 591 This prints templates/node.xml with all appropriate substitutions
 592 done.  Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
 593 given fairly arbitrary values but the various MAC address strings are
 594 set using the function set_macaddrs().
 595
 596 The -e option is also useful when writing scripts that use
 597 autocluster.  Given the complexities of the configuration system you
 598 probably don't want to parse configuration files yourself to determine
 599 the current settings.  Instead, you can ask autocluster to tell you
 600 useful pieces of information.  For example, say you want to script
 601 creating a base disk image and you want to ensure the image is
 602 marked immutable:
 603
 604   base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
 605   chattr -V -i "$base_image"
 606
 607   if autocluster -c $CONFIG create base ; then
 608     chattr -V +i "$base_image"
 609     ...
 610
 611 Note that the command that autocluster should run is enclosed in
 612 single quotes.  This means that $VIRTBASE and $BASENAME will be expand
 613 within autocluster after the configuration file has been loaded.