clusterfork: a cluster admin tool
=================================
by Harry Mangalam <harry.mangalam@uci.edu>
v1.0, Aug 17, 2010
:icons:

//Harry Mangalam mailto:harry.mangalam@uci.edu[harry.mangalam@uci.edu]
// this file is converted to the HTML via the command:
// export fileroot="/home/hjm/nacs/clusterfork"; asciidoc -a toc -a numbered ${fileroot}.txt; scp ${fileroot}.[ht]*   moo:~/public_html;  

// update svn from BDUC
// scp ${fileroot}.txt  hmangala@claw1:~/bduc/trunk/sge; ssh hmangala@bduc-login 'cd ~/bduc/trunk/sge; svn update; svn commit -m "new mods to clusterfork files"'

.Summary
*******************************************************************************
clusterfork is a commandline perl script for issuing the same command to many computers 
simultaneously via ssh, collating the results of that command by node, and presenting 
those results to the user in a number of ways to judge whether it has been successful.
*******************************************************************************


Introduction
------------
While modern cluster technologies (Perceus, ROCKS) should obviate the need for this kind of
utility, there are many non-Perceus/ROCKS clusters and even more cluster-like aggregations of nodes that often need this kind of tool.  Even in Perceus/ROCKS clusters there's often the need to issue a command to each node to evaluate hardware, search logs, determine memory errors, etc that is not met by the provisioning system.

There are existing tools that do something similar:  

- http://code.google.com/p/parallel-ssh/[pssh] - pssh is a very nice set of tools written in Python.  It's fairly mature and has been packaged nicely.  It can use a configuration file (and in fact doesn't allow IP range specification from the commandline), and it can write the results to a dir (but doesn't write a summary or allow in-line viewing.  A nice example is http://www.linux.com/archive/feature/151340[described here].  It doesn't allow as easy a IP range specification, nor grouping as clusterfork.  It also is a set of tools rather than one. But since it is available via both RPM and deb, it is very convenient to install.  If you're using pssh and are familiar with it, I'd suggest staying with it.
- http://clusterit.sourceforge.net/[ClusterIt] - this is a fairly large hammer when all I wanted was to send commands to a set of nodes.  ClusterIt is writ in C, for speed supposedly, tho what it's doing is just issuing commands so speed of execution shouldn't be an issue.  It's also trying to be a scheduling tool which complicates the core functionality of what should be a pretty simple tool.
- http://sourceforge.net/apps/mediawiki/clusterssh/index.php?title=Main_Page[clusterssh aka cssh] is a similar tool (but requires tcl/tk).  As such it has some advantages - can set up hosts and de/select hosts via mouse, but you interact with the targets via 1 xterm per host, hardly an efficient use of your desktop.

Features
--------
So why use clusterfork rather than the tools noted above?

*clusterfork*:

- is config file-based and will write a config file template if one doesn't exist. You can also specify alternative config files.
- has an easy way to specify large, discontinuous IP# ranges with negations: ie 128.200.34.[23:45 -25 77:155 -100:-120] will send the cmd to the nodes (on net 128.200.34.0) 23 to 45 EXCEPT 25 and then 77 to 155 EXCEPT the nodes 100 to 120.  Such specifications can also be chained.
- config file can specify IP ranges based on arbitrary scripts such as SGE's 'qhost'.
- can combine IP ranges into larger groups via 'named group addition' (GRP1 + GRP2 + GRP3)
- is pretty fast (can fork commands so that they execute in parallel).
- comes with at least a decent amount of documentation, both external (this file) as well as internal help (clusterfork -h) and perldoc: (perldoc clusterfork).
- code is pretty well-documented and easy to modify.
- provides a mechanism to evaluate the results of the command (like pssh, but better).
- can archive the results, altho in a fairly primitive way.
- can be used to (crudely) monitor the status of a cluster's locally installed software.
- will note which IP #s overlap in a command so nodes won't receive multiple commands.

Known problems
--------------
clusterfork is a fairly new program and while it works pretty well, there are some known problems, mostly having to do with regular expressions.

- regular expressions passed as part of the remote command may suffer in the program to ssh to shell translation and be garbled on the remote end.  I'm looking into a few examples to nail this down.

- the 1st 20 characters of the commandline are used as part of the directory name and regular expressions included as part of those 20 characters may sometimes be garbled into impossible dir names which are rejected by the OS.  An error message should be emitted if this happens and I'm trying to catch and retranslate these regexes.

Prerequisites
-------------
Note that it does have a few Perl dependencies beyond the usual 'strict' and 'Env':

. 'Getopt::Long' to process options
. 'Config::Simple' to process the configuration file.
. 'Socket' to provide name resolution.

It also requires some apps that are usually installed on most Linux boxen anyway.:

. 'ssh' for executing the commands (and obviously the nodes have to share ssh keys to provide passwordless ssh)
. 'mutt', for emailing out notifications (if desired)
. 'diff', for doing comparisons between files of IP #s
. 'yum' or 'apt-get' if you want to use it for updating / installing apps.
. 'mc', midnight commander, a Norton Commander-like clone to view/manipulate files and 
. whatever local apps, scripts, etc that you need to use to generate IP# lists if this is of interest (SGE's 'qhost' to see what nodes are alive, for example).


Installation
------------

for recent Ubuntu-based distros, the following will install the prerequisite packages.
----------------------------------------------------------------------
sudo apt-get install libgetopt-mixed-perl libconfig-simple-perl \
libio-interface-perl mc diff yum apt mutt 
----------------------------------------------------------------------

for CentOS5 and comparable RedHat based systems:
----------------------------------------------------------------------
sudo yum install perl-Config-Simple.noarch perl-Getopt-Mixed.noarch \
perl-Config-Simple.noarch perl-IO-Interface.<arch>  mc.<arch> \
diffutils.<arch> yum.noarch mutt.<arch>
----------------------------------------------------------------------
where <arch> is either 'x86' or 'x86_64'.

Beyond that, the installation requires:
. downloading the http://moo.nac.uci.edu/~hjm/clusterfork/clusterfork.pl[clusterfork script itself].

. move it to your '/usr/local/bin' as 'clusterfork' (and optionally, symlink it to 'cf')
. 'chmod' it to make it executable
. run it once to write a '.clusterforkrc' file to your '$HOME' (see below).  
. edit that file to adjust it to your local requirements
. start clusterforking.

If you want to use the email features, you need a working sendmail-like agent that mutt can talk to, such as http://www.exim.org/[exim] or http://www.postfix.org/[postfix].


Initialization
--------------
The 1st time you use clusterfork, you should get this message (unless you've already copied
a ~.clusterforkrc file from somewhere else).  Just follow the instructions.
-------------------------------------------------------------

$ clusterfork

        It looks like this is the 1st time you've run clusterfork
        as this user on this system.  An example .clusterforkrc file
        will be written to your home dir. Once you edit it to your
        specifications, run a non-destructive command with it
        (ie 'ls -lSh') to make sure it's working and examine the output
        so that you understand the workflow and the output.

        Remember that in order for clusterfork to work, passwordless ssh keys
        must be operational from the node where you execute clusterfork to the
        client nodes.  If you're going to use sudo to execute clusterfork, the
        root user public ssh key must be shared out to the clients.

        Typical cluster use implies a shared /home file system which means that
        the shared keys should only have to be installed once in
        /home/$USER/.ssh/authorized_keys.

        Please edit the ~/.clusterforkrc template that's just been written so that
        the next time things go smoother.

-------------------------------------------------------------


Some real-world examples
------------------------

To cause 'cf' to dump its help file into the 'less' pager
-------------------------------------------------------------
$ clusterfork -h  
-------------------------------------------------------------

Have 'cf' read the alternative config file './this_file' and list the groups that are defined 
there
-------------------------------------------------------------
clusterfork --config=./this_file  --listgroup
-------------------------------------------------------------

Have 'cf' read the default config file and target the group 'CLAWS' with the command
'ls -lSh'
-------------------------------------------------------------
$ clusterfork  --target=CLAWS  'ls -lSh'
-------------------------------------------------------------

Check the memory error counts for the nodes 192.168.1.15 thru 192.168.1.75 except 192.168.1.66
-------------------------------------------------------------
$ clusterfork --target=192.18.1.[15:75 -66] 'cd /sys/devices/system/edac/mc &&  grep [0-9]* mc*/csrow*/[cu]e_count'
-------------------------------------------------------------

Tell the nodes in the group ALL_ADC to send a single ping to the login node
-------------------------------------------------------------
$ clusterfork --target=ALL_ADC 'ping -c 1 bduc-login'
-------------------------------------------------------------

Ask the nodes [10.255.78.12 to 10.255.78.45] 
and [10.255.35.101 to  10.255.35.165] to dump the catalog of their installed packages.
-------------------------------------------------------------
clusterfork --target='10.255.78.[12:45] 10.255.35.[101:165]' 'dpkg -l' 
-------------------------------------------------------------

Ask the nodes in the ADC_2X group to /serially/ dump their hardware memory configuration
-------------------------------------------------------------
$ sudo clusterfork --target=ADC_2X --nofork 'lshw -short -class memory'
-------------------------------------------------------------


The .clusterforkrc configuration file
-------------------------------------

The 'cf' config file is arranged like a Windows .INI file with stanza headers indicated with 
[STANZA].  Each stanza can have an arbitrary number of entries, but only the stanzas shown are supported by 'cf'.  Nothing prevents you from adding more, but you'll have to process them yourself.


The stanzas named [IPRANGE] and [GROUPS] can be expanded arbitarily and 'cf' should pick them up.  Additionally, if you specify groups which have overlapping IP ranges, 'cf' will detect that overlap and will only issue the command once per IP #.


-----------------------------------------------------------------------------------
# This is the config file for the 'clusterfork' application (aka cf) which executes
# commands on a range of machines defined as below.  Use 'clusterfork -h'
# to view the help file

# Comments start with a pound ('#') sign and //cannot share the same line//
# with other configuration data.
# Strings do not need to be quoted unles they contain commas

[ADMIN]

  # RPMDB - file that lists the RPMs that cf has been used to install
  RPMDB = /home/hmangala/BDUC_RPM_LIST.DB
  
  # ALLNODESFILE holds a list of ALL the IP nodes that this will support.
  # this should actually be generated outside of cf and written out if required.
  ALLNODESFILE = /home/hmangala/ALLNODESFILE
  
  # emails to notify of newly installed packages
  # you do not need to escape the '@' in the above list.
  EMAIL_LIST = "hmangala@uci.edu, jsaska@uci.edu, lopez@uci.edu"
  
  # command to install apps - if this is found in the command, triggers a routine to
  # email admins with updated install info.
  INSTALLCMD = "yum install -y"

[SGE]
  CELL          = bduc_nacs
  JOB_DIR       = /sge62/bduc_nacs/spool/qmaster/jobs
  EXECD_PORT    = 537
  QMASTER_PORT  = 536
  ROOT          = /sge62

[APPS]
# these will probably not change much among distros, but YMMV
 yum   = /usr/bin/yum
 diff  = /usr/bin/diff
 mutt  = /usr/bin/mutt
 mc    = /usr/bin/mc


[IPRANGE]
# you //definitely// need to change these.
# use ';' as separators, not commas.  Spaces are ignored.
  ADC_2X = 10.255.78.[10:22 26 35:49]  ;  10.255.78.[77:90] ;  12.23.34.[13:25 33:44 56:75]
  ADC_4X = 10.255.78.[50:76]
  ICS_2X = 10.255.89.[5:44]
  CLAWS = 10.255.78.[5:9]

  # for a definition based on a script, the value must be in the form of:
  #   [SCRIPT:"whatever the script is"]
  # with required escaping being embedded in the submitted script
  # (see below for an example in QHOST)
  # the following QHOST example uses the host-local SGE 'qhost' and 'scut' binaries
  # to generate a list of hosts to process and filters only 'a64' hosts
  # which are responsive (don't have ' - ' entries).
  QHOST = SCRIPT:"qhost |grep a64 | grep -v ' - ' | scut --c1=0 | perl -e 's/\\n/ /gi' -p"
  
  # Set temporarily dead nodes in here if required.
  IGNORE = 10.255.78.12 ; 10.255.78.48 ; 12.23.34.[22:25]

[GROUPS]

# GROUPS can be composed of primary IPRANGE groups as well as other
# GROUP groups as long as they have been previously defined.
  ALL_2X = ICS_2X + ADC_2X
  CENTOS = ICS_2X + ADC_2X + ADC_4X
  ADC_ALL = ALL_2X + ADC_4X + CLAWS

-----------------------------------------------------------------------------------

Output
------

When 'cf' is done in parallel mode, the nodes are simply listed, with the output written to the newly created directory named with a combination of the command, the date and the time:
-----------------------------------------------------------------------------------
./clusterfork.pl --target=QHOST 'ls -lSh *gz'
INFO: Creating dir [REMOTE_CMD-ls--lSh--gz-13.53.12_2010-08-17]....success
INFO: Processing name: [QHOST]
======================================================
Processing [QHOST]
======================================================


...
a64-179 [10.255.78.88]:

a64-180 [10.255.78.89]:

a64-181 [10.255.78.90]:

  -------------============== CAUTION ==============-------------
Since this is going to be a variable length process with a number of nodes
writing into the dir over a network, you might want to wait a few seconds
before you continue to let all the nodes complete and the file handles close.

If not, the analysis may well be faulty as it will analyze the CURRENT file
state, not the FINAL file state.

Hit [Enter] when you want to complete the analysis of the results.

-----------------------------------------------------------------------------------

When you hit [Enter] the analysis of the command execution will be shown in 'less':

-----------------------------------------------------------------------------------
Analysis of contents for files in REMOTE_CMD-ls--lSh--gz-13.53.12_2010-08-17
Command: [ls -lSh *gz]
 ========================================================================
 line / word / chars  |              md5 sum                | # |  hosts ->
         21 189 1443       538ca54bd6f10af5da3872b3a6f14c3e  120  a64-001 a64-002 a64-003 ..
REMOTE_CMD-ls--lSh--gz-13.53.12_2010-08-17/Summary (END)   
-----------------------------------------------------------------------------------

In the above case, because of the shared dir structure, the result is identical on all nodes. In the case below, where the result is a network latency, there's quite a bit more variability.

-----------------------------------------------------------------------------------
Analysis of contents for files in REMOTE_CMD-ping--c-1-bduc-login-13.58.33_2010-08-17
Command: [ping -c 1 bduc-login]
 ========================================================================
 line / word / chars  |              md5 sum                | # |  hosts ->
            6 36 270       c86c8f74e14ef6e6b42d51d00ade483a    1  a64-021
            6 36 270       86e1ee2d631857ae7e65cbe6a7615fb8    1  a64-012
            6 36 270       19a407cf64c01f03ceaa508be0269c40    1  a64-024
            6 36 270       ceba5ef2b4cb4b647c36e7de9361ca46    2  a64-106 a64-139
            6 36 270       36be0140aaee1d45a6565a6c6783c06d    1  a64-145
            6 36 270       be484f980418af69358be51f3ef2184b    1  a64-179
            6 36 270       abbf9229964805d06cabb4a0b8a361ec    1  a64-123
            6 36 270       102f114c2607222132bfed67822fc57f    1  a64-016
            6 36 270       e78da0916e4b5aea70f1f1dd828a477b    2  a64-141 a64-161
            6 36 270       a73cb10e936aa0cf2061329c8e92c03b    2  a64-023 a64-028
            6 36 270       24d5417a54458bc9f502b79b514a8f2f    1  a64-010

-----------------------------------------------------------------------------------

The last option of the clusterfork analysis allows you to choose to see the results in http://www.midnight-commander.org/[Midnight Commander] (aka 'mc').  The above results shown in 'mc' look like this:

.clusterfork output viewed in Midnight Commander
image:clusterfork_mc_s.png[clusterfork output in mc]