README

   1 INTRODUCTION
   2 ============
   3
   4 Autocluster is set of scripts for building virtual clusters to test
   5 clustered Samba.  It uses Linux's libvirt and KVM virtualisation
   6 engine.
   7
   8 Autocluster is a collection of scripts, template and configuration
   9 files that allow you to create a cluster of virtual nodes very
  10 quickly.  You can create a cluster from scratch in less than 30
  11 minutes.  Once you have a base image you can then recreate a cluster
  12 or create new virtual clusters in minutes.
  13
  14 The current implementation creates virtual clusters of RHEL5 nodes.
  15
  16
  17 CONTENTS
  18 ========
  19
  20 * INSTALLING AUTOCLUSTER
  21
  22 * HOST MACHINE SETUP
  23
  24 * CREATING A CLUSTER
  25
  26 * BOOTING A CLUSTER
  27
  28 * POST-CREATION SETUP
  29
  30 * CONFIGURATION
  31
  32 * DEVELOPMENT HINTS
  33
  34
  35 INSTALLING AUTOCLUSTER
  36 ======================
  37
  38 Before you start, make sure you have the latest version of
  39 autocluster. To download autocluster do this:
  40
  41   git clone git://git.samba.org/tridge/autocluster.git autocluster
  42
  43 Or to update it, run "git pull" in the autocluster directory
  44
  45 You probably want to add the directory where autocluster is installed
  46 to your PATH, otherwise things may quickly become tedious.
  47
  48
  49 HOST MACHINE SETUP
  50 ==================
  51
  52 This section explains how to setup a host machine to run virtual
  53 clusters generated by autocluster.
  54
  55
  56  1) Install kvm, libvirt, qemu-nbd, nbd-client and expect.
  57
  58     Autocluster creates virtual machines that use libvirt to run under
  59     KVM.  This means that you will need to install both KVM and
  60     libvirt on your host machine.  You will also need the qemu-nbd and
  61     nbd-client programs, which autocluster uses to loopback-nbd-mount
  62     the disk images when configuring each node.  Expect is used by the
  63     "waitfor" script and should be available for installation form
  64     your distribution.
  65
  66     For various distros:
  67
  68     * RHEL/CentOS
  69
  70       For RHEL5/CentOS5, useful packages for both kvm and libvirt used
  71       to be found here:
  72
  73         http://www.lfarkas.org/linux/packages/centos/5/x86_64/
  74
  75       However, since recent versions of RHEL5 ship with KVM, 3rd party
  76       KVM RPMs for RHEL5 are now scarce.
  77
  78       RHEL5.4 ships with KVM but it doesn't have the SCSI disk
  79       emulation that autocluster uses by default.  There are also
  80       problems when autocluster uses virtio on RHEL5.4's KVM.  See the
  81       sections below on "iSCSI shared disks" and "Raw IDE system
  82       disks".  Also, to use the RHEL5 version of KVM you will need to
  83       set
  84
  85         KVM=/usr/libexec/qemu-kvm
  86
  87       in your configuration file.
  88
  89       Unless you can find an RPM for nbd-client then you need to
  90       download source from:
  91
  92         http://sourceforge.net/projects/nbd/
  93
  94       and build it.
  95
  96     * Fedora Core
  97
  98       Useful packages ship with Fedora Core 10 (Cambridge) and later.
  99
 100       qemu-nbd is in the kvm package.
 101
 102       nbd-client is in the nbd package.
 103
 104     * Ubuntu
 105
 106       Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
 107
 108       qemu-nbd is in the kvm package but is called kvm-nbd, so you
 109       need to set the QEMU_NBD configuration variable.
 110
 111       nbd-client is in the nbd-client package.
 112
 113     For other distributions you'll have to backport distro sources or
 114     compile from upstream source as described below.
 115
 116     * For KVM see the "Downloads" and "Code" sections at:
 117
 118         http://www.linux-kvm.org/
 119
 120     * For libvirt see:
 121
 122         http://libvirt.org/
 123
 124     * As mentioned about, nbd can be found at:
 125
 126         http://sourceforge.net/projects/nbd/
 127
 128     You will need to add the autocluster directory to your PATH.
 129
 130     You will need to configure the right kvm networking setup. The
 131     files in host_setup/etc/libvirt/qemu/networks/ should help. This
 132     command will install the right networks for kvm:
 133
 134        rsync -av --delete host_setup/etc/libvirt/qemu/networks/ /etc/libvirt/qemu/networks/
 135
 136     Note that you'll need to edit the installed files to reflect any
 137     changes to IPBASE, IPNET0, IPNET1, IPNET2 away from the defaults.
 138     This is also true for named.conf.local and squid.conf (see below).
 139
 140     After this you might need to reload libvirt:
 141
 142       /etc/init.d/libvirt reload
 143
 144     or similar.
 145
 146     You might also need to set:
 147
 148       VIRSH_DEFAULT_CONNECT_URI=qemu:///system
 149
 150     in your environment so that virsh does KVM/QEMU things by default.
 151
 152  2) You need a caching web proxy on your local network. If you don't
 153     have one, then install a squid proxy on your host. See
 154     host_setup/etc/squid/squid.conf for a sample config suitable for a
 155     virtual cluster. Make sure it caches large objects and has plenty
 156     of space. This will be needed to make downloading all the RPMs to
 157     each client sane
 158
 159     To test your squid setup, run a command like this:
 160
 161       http_proxy=http://10.0.0.1:3128/ wget <some-url>
 162
 163     Check your firewall setup.  If you have problems accessing the
 164     proxy from your nodes (including from kickstart postinstall) then
 165     check it again!  Some distributions install nice "convenient"
 166     firewalls by default that might block access to the squid port
 167     from the nodes.  On a current version of Fedora Core you may be
 168     able to run system-config-firewall-tui to reconfigure the
 169     firewall.
 170
 171  3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
 172     sample config that is suitable. It needs to redirect DNS queries
 173     for your virtual domain to your windows domain controller
 174
 175  4) Download a RHEL install ISO.
 176
 177
 178 CREATING A CLUSTER
 179 ==================
 180
 181 A cluster comprises a single base disk image, a copy-on-write disk
 182 image for each node and some XML files that tell libvirt about each
 183 node's virtual hardware configuration.  The copy-on-write disk images
 184 save a lot of disk space on the host machine because they each use the
 185 base disk image - without them the disk image for each cluster node
 186 would need to contain the entire RHEL install.
 187
 188 The cluster creation process can be broken down into 2 mains steps:
 189
 190  1) Creating the base disk image.
 191
 192  2) Create the per-node disk images and corresponding XML files.
 193
 194 However, before you do this you will need to create a configuration
 195 file.  See the "CONFIGURATION" section below for more details.
 196
 197 Here are more details on the "create cluster" process.  Note that
 198 unless you have done something extra special then you'll need to run
 199 all of this as root.
 200
 201  1) Create the base disk image using:
 202
 203       ./autocluster create base
 204
 205     The first thing this step does is to check that it can connect to
 206     the YUM server.  If this fails make sure that there are no
 207     firewalls blocking your access to the server.
 208
 209     The install will take about 10 to 15 minutes and you will see the
 210     packages installing in your terminal
 211
 212     The installation process uses kickstart.  If your configuration
 213     uses a SoFS release then the last stage of the kickstart
 214     configuration will be a postinstall script that installs and
 215     configures packages related to SoFS.  The choice of postinstall
 216     script is set using the POSTINSTALL_TEMPLATE variable, allowing you
 217     to adapt the installation process for different types of clusters.
 218
 219     It makes sense to install packages that will be common to all
 220     nodes into the base image.  This save time later when you're
 221     setting up the cluster nodes.  However, you don't have to do this
 222     - you can set POSTINSTALL_TEMPLATE to "" instead - but then you
 223     will lose the quick cluster creation/setup that is a major feature
 224     of autocluster.
 225
 226     When that has finished you should mark that base image immutable
 227     like this:
 228
 229       chattr +i /virtual/ac-base.img
 230
 231     That will ensure it won't change. This is a precaution as the
 232     image will be used as a basis file for the per-node images, and if
 233     it changes your cluster will become corrupt
 234
 235  2) Now run "autocluster create cluster" specifying a cluster
 236     name. For example:
 237
 238       autocluster create cluster c1
 239
 240     This will create and install the XML node descriptions and the
 241     disk images for your cluster nodes, and any other nodes you have
 242     configured.  Each disk image is initially created as an "empty"
 243     copy-on-write image, which is linked to the base image.  Those
 244     images are then loopback-nbd-mounted and populated with system
 245     configuration files and other potentially useful things (such as
 246     scripts).
 247
 248
 249 BOOTING A CLUSTER
 250 =================
 251
 252 At this point the cluster has been created but isn't yet running.
 253 Autocluster provides a command called "vircmd", which is a thin
 254 wrapper around libvirt's virsh command.  vircmd takes a cluster name
 255 instead of a node/domain name and runs the requested command on all
 256 nodes in the cluster.
 257
 258  1) Now boot your cluster nodes like this:
 259
 260       vircmd start c1
 261
 262     The most useful vircmd commands are:
 263
 264       start    : boot a node
 265       shutdown : graceful shutdown of a node
 266       destroy  : power off a node immediately
 267
 268  2) You can watch boot progress like this:
 269
 270        tail -f /var/log/kvm/serial.c1*
 271
 272     All the nodes have serial consoles, making it easier to capture
 273     kernel panic messages and watch the nodes via ssh
 274
 275
 276 POST-CREATION SETUP
 277 ===================
 278
 279 Now you have a cluster of nodes, which might have a variety of
 280 packages installed and configured in a common way.  Now that the
 281 cluster is up and running you might need to configure specialised
 282 subsystems like GPFS or Samba.  You can do this by hand or use the
 283 sample scripts/configurations that are provided
 284
 285  1)  Now you can ssh into your nodes. You may like to look at the
 286      small set of scripts in /root/scripts on the nodes for
 287      some scripts. In particular:
 288
 289        mknsd.sh           :  sets up the local shared disks as GPFS NSDs
 290        setup_gpfs.sh      :  sets up GPFS, creates a filesystem etc
 291        setup_samba.sh     :  sets up Samba and many other system compoents
 292        setup_tsm_server.sh:  run this on the TSM node to setup the TSM server
 293        setup_tsm_client.sh:  run this on the GPFS nodes to setup HSM
 294
 295      To setup a SoFS system you will normally need to run
 296      setup_gpfs.sh and setup_samba.sh.
 297
 298  2)  If using the SoFS GUI, then you may want to lower the memory it
 299      uses so that it fits easily on the first node. Just edit this
 300      file on the first node:
 301
 302        /opt/IBM/sofs/conf/overrides/sofs.javaopt
 303
 304  3)  For automating the SoFS GUI, you may wish to install the iMacros
 305      extension to firefox, and look at some sample macros I have put
 306      in the imacros/ directory of autocluster. They will need editing
 307      for your environment, but they should give you some hints on how
 308      to automate the final GUI stage of the installation of a SoFS
 309      cluster.
 310
 311
 312 CONFIGURATION
 313 =============
 314
 315 Basics
 316 ======
 317
 318 Autocluster uses configuration files containing Unix shell style
 319 variables.  For example,
 320
 321   FIRSTIP=30
 322
 323 indicates that the last octet of the first IP address in the cluster
 324 will be 30.  If an option contains multiple words then they will be
 325 separated by underscores ('_'), as in:
 326
 327   ISO_DIR=/data/ISOs
 328
 329 All options have an equivalent command-line option, such
 330 as:
 331
 332   --firstip=30
 333
 334 Command-line options are lowercase.  Words are separated by dashes
 335 ('-'), as in:
 336
 337   --iso-dir=/data/ISOs
 338
 339 Normally you would use a configuration file with variables so that you
 340 can repeat steps easily.  The command-line equivalents are useful for
 341 trying things out without resorting to an editor.  You can specify a
 342 configuration file to use on the autocluster command-line using the -c
 343 option.  For example:
 344
 345   autocluster -c config-foo create base
 346
 347 If you don't provide a configuration variable then autocluster will
 348 look for a file called "config" in the current directory.
 349
 350 You can also use environment variables to override the default values
 351 of configuration variables.  However, both command-line options and
 352 configuration file entries will override environment variables.
 353
 354 Potentially useful information:
 355
 356 * Use "autocluster --help" to list all available command-line options
 357   - all the items listed under "configuration options:" are the
 358   equivalents of the settings for config files.  This output also
 359   shows descriptions of the options.
 360
 361 * You can use the --dump option to check the current value of
 362   configuration variables.  This is most useful when used in
 363   combination with grep:
 364
 365     autocluster --dump | grep ISO_DIR
 366
 367   In the past we recommended using --dump to create initial
 368   configuration file.  Don't do this - it is a bad idea!  There are a
 369   lot of options and you'll create a huge file that you don't
 370   understand and can't debug!
 371
 372 * Configuration options are defined in config.d/*.defconf.  You
 373   shouldn't need to look in these files... but sometimes they contain
 374   comments about options that are too long to fit into help strings.
 375
 376 Keep it simple
 377 ==============
 378
 379 * I recommend that you aim for the smallest possible configuration file.
 380   Perhaps start with:
 381
 382     FIRSTIP=<whatever>
 383
 384   and move on from there.
 385
 386 * Use the --with-release option on the command-line or the
 387   with_release function in a configuration file to get default values
 388   for building virtual clusters for releases of particular "products".
 389   Currently there are only release definitions for SoFS.
 390
 391   For example, you can setup default values for SoFS-1.5.3 by running:
 392
 393     autocluster --with-release=SoFS-1.5.3 ...
 394
 395   Equivalently you can use the following syntax in a configuration
 396   file:
 397
 398     with_release "SoFS-1.5.3"
 399
 400   So the smallest possible config file would have something like this
 401   as the first line and would then set FIRSTIP:
 402
 403     with_release "SoFS-1.5.3"
 404
 405     FIRSTIP=<whatever>
 406
 407   Add other options as you need them.
 408
 409   The release definitions are stored in releases/*.release.  The
 410   available releases are listed in the output of "autocluster --help".
 411
 412   NOTE: Occasionally you will need to consider the position of
 413   with_release in your configuration.  If you want to override options
 414   handled by a release definition then you will obviously need to set
 415   them later in your configuration.  This will be the case for most
 416   options you will want to set.  However, some options will need to
 417   appear before with_release so that they can be used within a release
 418   definition - the most obvious one is the (rarely used) RHEL_ARCH
 419   option, which is used in the default ISO setting for each release.
 420   If things don't work as expected use --dump to confirm that
 421   configuration variables have the values that you expect.
 422
 423 * The NODES configuration variable controls the types of nodes that
 424   are created.  At the time of writing, the default value is:
 425
 426     NODES="rhel_base:0-3"
 427
 428   This means that you get 4 nodes, at IP offsets 0, 1, 2, & 3 from
 429   FIRSTIP, all part of the CTDB cluster.  That is, with standard
 430   settings and FIRSTIP=35, 4 nodes will be created in the IP range
 431   10.0.0.35 to 10.0.0.38.
 432
 433   The SoFS releases use a default of:
 434
 435     NODES="tsm_server:0 sofs_gui:1 sofs_front:2-4"
 436
 437   which should produce a set of nodes the same as the old SoFS
 438   default.  You can add extra rhel_base nodes if you need them for
 439   test clients or some other purpose:
 440
 441     NODES="$NODES rhel_base:7,8"
 442
 443   This produces an additional 2 base RHEL nodes at IP offsets 7 & 8
 444   from FIRSTIP.  Since sofs_* nodes are present, these base nodes will
 445   not be part of the CTDB cluster - they're just extra.
 446
 447   For many standard use cases the nodes specified by NODES can be
 448   modified by setting NUMNODES, WITH_SOFS_GUI and WITH_TSM_NODE.
 449   However, these options can't be used to create nodes without
 450   specifying IP offsets - except WITH_TSM_NODE, which checks to see if
 451   IP offset 0 is vacant.  Therefore, for many uses you can ignore the
 452   NODES variable.
 453
 454   However, NODES is the recommended mechanism for specifying the nodes
 455   that you want in your cluster.  It is powerful, easy to read and
 456   centralises the information in a single line of your configuration
 457   file.
 458
 459 iSCSI shared disks
 460 ==================
 461
 462 The RHEL5 version of KVM does not support the SCSI block device
 463 emulation.  Therefore, you can use either virtio or iSCSI shared
 464 disks.  Unfortunately, at the time of writing, virtio block devices
 465 are not supported by the version of multipath in RHEL5.  So this
 466 leaves iSCSI as the only choice.
 467
 468 The main configuration options you need for iSCSI disks are:
 469
 470   SHARED_DISK_TYPE=iscsi
 471   NICMODEL=virtio        # Recommended for performance
 472   add_extra_package iscsi-initiator-utils
 473
 474 Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
 475 iSCSI shared disks because KVM doesn't (need to) know about them.
 476
 477 You will need to install the scsi-target-utils package on the host
 478 system.  After creating a cluster, autocluster will print a message
 479 that points you to a file tmp/iscsi.$CLUSTER - you need to run the
 480 commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
 481 booting your cluster.  This will remove any old target with the same
 482 ID, and create the new target, LUNs and ACLs.
 483
 484 You can use the following command to list information about the
 485 target:
 486
 487   tgtadm --lld iscsi --mode target --op show
 488
 489 If you need multiple clusters using iSCSI on the same host then each
 490 cluster will need to have a different setting for ISCSI_TID.
 491
 492 Raw IDE system disks
 493 ====================
 494
 495 The RHEL5 version of KVM does not support the SCSI block device
 496 emulation.  Therefore, you can use virtio or ide system disks.
 497 However, writeback caching, qcow2 and virtio are incompatible and
 498 result in I/O corruption.  So, you can use either virtio system disks
 499 without any caching, accepting reduced performance, or you can use IDE
 500 system disks with writeback caching, with nice performance.
 501
 502 For IDE disks, here are the required settings:
 503
 504   SYSTEM_DISK_TYPE=ide
 505   SYSTEM_DISK_PREFIX=hd
 506   SYSTEM_DISK_CACHE=writeback
 507
 508 The next problem is that RHEL5's KVM does not include qemu-nbd.  The
 509 best solution is to build your own qemu-nbd and stop reading this
 510 section.
 511
 512 If, for whatever reason, you're unable to build your own qemu-nbd,
 513 then you can use raw, rather than qcow2, system disks.  If you do this
 514 then you need significantly more disk space (since the system disks
 515 will be *copies* of the base image) and cluster creation time will no
 516 longer be pleasantly snappy (due to the copying time - the images are
 517 large and a single copy can take several minutes).  So, having tried
 518 to warn you off this option, if you really want to do this then you'll
 519 need these settings:
 520
 521   SYSTEM_DISK_FORMAT=raw
 522   BASE_FORMAT=raw
 523
 524 Note that if you're testing cluster creation with iSCSI shared disks
 525 then you should find a way of switching off raw disks.  This avoids
 526 every iSCSI glitch costing you a lot of time while raw disks are
 527 copied.
 528
 529 DEVELOPMENT HINTS
 530 =================
 531
 532 The -e option provides support for executing arbitrary bash code.
 533 This is useful for testing and debugging.
 534
 535 One good use of this option is to test template substitution using the
 536 function substitute_vars().  For example:
 537
 538   ./autocluster --with-release=SoFS-1.5.3 -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
 539
 540 This prints templates/node.xml with all appropriate substitutions
 541 done.  Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
 542 given fairly arbitrary values but the various MAC address strings are
 543 set using the function set_macaddrs().
 544
 545 The -e option is also useful when writing scripts that use
 546 autocluster.  Given the complexities of the configuration system you
 547 probably don't want to parse configuration files yourself to determine
 548 the current settings.  Instead, you can ask autocluster to tell you
 549 useful pieces of information.  For example, say you want to script
 550 creating a base disk image and you want to ensure the image is
 551 marked immutable:
 552
 553   base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
 554   chattr -V -i "$base_image"
 555
 556   if autocluster -c $CONFIG create base ; then
 557     chattr -V +i "$base_image"
 558     ...
 559
 560 Note that the command that autocluster should run is enclosed in
 561 single quotes.  This means that $VIRTBASE and $BASENAME will be expand
 562 within autocluster after the configuration file has been loaded.