The Perceus Provisioning System

1. What’s Perceus?

Perceus (Provision Enterprise Resources & Clusters Enabling Uniform Systems) is an Open Source provisioning system for Linux clusters developed by the creators of Warewulf, of which Perceus is the successor.

Perceus typically runs as a server process on an administrative node of a cluster and provides the Operating System to requesting nodes via the network. It is optimized for stateless systems - those in which the OS is not resident on-disk, but freshly net-booted at each startup - but can also provision stateful systems in which the OS is written to the disk on the nodes. It can provision nodes to be completely homogenous (as would be required for a compute cluster) or can provision them to be fairly heterogeneous as for sets of application servers such that each OS image is tuned to a particular service, such as compute servers, storage nodes, or interactive nodes. Perceus also provides utilities to modify the client OS images and push out changes either immediately via rsync or to save them to the client image to be refreshed at next reboot.

As befits a tool for handling thousands of nodes, Perceus handles most things automatically. It will detect unidentified MAC addresses in the private network, add them into the default Perceus group, and provision a default image. Perceus can also set specific configurations for certain nodes based on MAC address.

It should be noted that Perceus is well supported by Infiscale, the company formed to commercialize it. There is a User Guide for Perceus 1.6 that should be used as the definitive introduction and guide to Perceus. This document varies from that one in that this is more closely focussed on installing Perceus on Debian-based systems and then integrating the Perceus server and the provisioned cluster with an already existing NIS/NFS/Kerberos campus system. This probably represents a large proportion of how Perceus will be installed.

2. Operating System Support

Currently, Perceus supports the following operating systems for both server and clients.

Debian and Debian-derived distributions such as Ubuntu. This report will describe the configuration of a Ubuntu 10.04(LTS)-based cluster. Tim Copeland <tim@criteriondigital.net> has recently provided a Debian-derivation combination genchroot script that combines handling a number of Debian, Ubuntu, and Mint distros into one script. It generates a chroot package that can be imported into Perceus as described below.
Red Hat Enterprise Linux 5 and newer, and other RPM systems such as Fedora and CentOS.

Perceus.org provides Quickstart Guides for some of these OSs.

Perseus supplies the client nodes with their OSs in modules, a stripped-down version of the OS (no Desktop GUI, minimal libraries & utilities, no applications) that provides the base functionality

The OS modules included with Perceus (from Infiscale) are:

GravityOS - Debian-based linux distribution.
Caos NSA - An RPM-based distribution of linux that focuses on high performance computing.
some others have been made available from the OSUOSL labs
still others can be generated by the alternative genchroot scripts that have been made available by Infiscale and others.
as noted, the master Debian genchroot script from CriterionDigital will generate multiple versons derived from Ubuntu, Debian, and Mint.

3. A Perceus Glossary

Since the Perceus approach is somewhat different than the typical static-OS-on-disk approach, we’ll dedicate some space to defining some Perceus terms:

3.1. Stateful Provisioning

In most computers, the OS is installed on the local disk. Stateful provisioning is the process of obtaining an OS image from a server and installing that OS to the local disk of the client node. This approach is useful when bandwidth is extremely limited or changes in OS image are expected to be infrequent. This is provided in Perceus v1.6 and above.

3.2. Stateless Provisioning

This is the opposite of stateful provisioning. Instead of installing the OS on the local disk, the provided image is installed into RAM which allows the hard disk space to be used for something else. The advantages of this approach are that the OS image is refreshed at each reboot and that any changes to the server-hosted image is propagated to each node. It also saves some disk space on each node (the nodes could be completely diskless, as long as the work loads did not require fast swap).

3.3. Virtual Node File System (VNFS)

The VNFS is essentially just a disk image that is provisioned to the nodes. The VNFS is split into 4 parts: VNFS capsule, VNFS rootfs, VNFS config files, and VNFS image.

3.3.1. VNFS Capsule

A VNFS capsule is a compressed base package of the OS. While you can make your own capsules, Perceus supplies sample capsules of GravityOS and CAOS which are generally sufficient for real world use.

3.3.2. VNFS rootfs

Once you’ve imported, uncompressed, and mounted a VNFS capsule on the server using the perceus utility commands, you can access the files of the image and make changes to the image. This appears as a complete root filesystem to a user on the server and can be cd’ed into, edited, upgraded as a chrooted filesystem, etc.

3.3.3. VNFS Config files

These files:

close*  configure*  livesync*      master-includes  nodescripts/  umount*  vnfs.img
config  hybridize   livesync.skip  mount*           rootfs/       vmlinuz

are located at the top of the VNFS file tree , typically at /etc/perceus/vnfs/<VNFSNAME>, and describe various options, conditions, filesystem mounts, etc for each VNFS. Because of this, each VNFS can be configured quite differently from any other.

3.3.4. VNFS Image

The VNFS image is the actual image that is provisioned to the nodes. Once you mount and configure the VNFS rootfs, you have to unmount it to update the VNFS image.

3.4. Modules

Perceus provides utilities that can import and load modules to it’s nodes (SUCH AS??)

3.5. Import Nodes From Other Sources

Perceus can also import nodes definitions that are used with other provisioning such as Rocks, Warewulf, and Oscar (done with a simple import command).

4. The Provisioning Process

4.1. Step by step

As noted above, Perceus is a client/server process, with the clients requesting their entire OS to be given to them by the Perceus Server (in stateless mode). We assume the server is up and running with a Perceus daemon running and listening to the private interface.

The client node is booted and requests an OS via PXE-boot.
The Perceus daemon responds with the first stage boot image which loads the very lightweight Perceus client daemon
The client daemon configures the node by getting a DHCP-provided IP # and initiating the PXE-boot.
The client requests a VNFS capsule and preps the in-RAM filesystem to load the OS.
Once the new kernel boots, the Perceus client daemon is purged and the RAM returned to the system.
The system runs as normal, starting whatever services and mounting whatever filesystems the VNFS is configured to do.

4.2. Some Issues

Since in stateless mode the OS is net-booted each time, the node can run entirely diskless. However, this requires that it have sufficient RAM to hold not only the OS and all associated filesystems in-memory, as well as all of the application and user code, but also that it never hits swap (since it doesn’t have any) and the only local working space (/scratch) is RAM based.

Our nodes do have disks, but we partition them into a swap partition (since some of our codes balloon to fairly large sizes) and to a /scratch partition so that user data can be pre-staged to prevent lots of network bandwidth during a run. We provide a Perl init script for the VNFS format-mount.pl that takes care of detecting, partitioning, and mkfs’ing the disk prior to the node being made available to users.

5. Why Perceus for the BDUC cluster

The BDUC nodes currently use stateful provisioning through TFTP. The nodes get an entire OS initially (including many of the libs, utils, & apps) and until the OS is manually refreshed, it does not undergo any updates. Currently, these nodes are suffering from bit rot, the variation from a standard installation to one that is at variance with the expected image due to many reasons. Chief among them is a node going down and missing an update or cluster-wide installation. This bit rot is typically handled on a case-by-case basis which can entail significant admin time.

Perceus addresses the bit rot problem with a livesync or (worst case) simple reboot. All installation and configuration tasks will be handled by the Perceus server. If we needed to make a quick change within a node OS, we can just configure the VNFS rootfs and push the image changes to the cluster instead of going into every node and making the change. Apart from Perceus features, we can more efficiently use the node’s local disk as scratch space since the OS will not have to reside on-disk, making BDUC more efficient at no additional cost and less manual intervention.

mailto:ppk@ats.ucla.edu[Prakashan Korambath]  and mailto:kjin@ats.ucla.edu[Kejian Jin] provided similar arguments for http://moo.nac.uci.edu/~hjm/ucla_perceus_test.pdf[using Perceus on UCLA's Hoffman2 cluster].

6. Getting Started

We will describe the Perceus installation and configuration for both a minimal setup similar to the one described in the Perceus User Guide and a Production Setup we will use to append a Perceus cluster to our current BDUC production cluster. The main difference between the two is that the basic setup will allow only root login to the nodes unless another user is added in the VNFS. The production version allows BDUC users to login to the Perceus-provisioned nodes and transparently access their files on the BDUC cluster via integration with the BDUC NIS/NFS and Kerberos system.

6.1. Perceus Components

6.1.1. Hardware

Perceus requires minimal hardware to test. It requires only:

A private network. This can be as few as 1 node connected to a small switch or hub. The faster the network hardware the better, but it can be as slow as 10Mb. We used a 24-port Netgear Gigabit Switch.
At least 1 Perceus Master Server with a 2 interfaces, one for the external Internet and one facing a private network that services the cluster. For testing purposes, the Perceus server can be a small, slow machine; the most important parts of it are the speed of the network adapters, although the CPU speed is relevant when compressing a modified VNFS. We used a AMD dualcore Opteron @ 1.4GHZ, 4GB RAM, 60GB IDE HD, Ubuntu 10.04 (AMD64) Desktop OS, 2x Broadcom 1Gb interfaces
At least 1 node whose BIOS has been configured to PXE-boot. It can also be an a small, slow node, but it has to have enough RAM to hold the OS; I’d rec no less than 1GB. we used 2 nodes, each having 2 AMD Opterons @ 2.4GHZ, 8GB RAM, 320GB SATA HD, 2x Broadcom 1Gb interfaces (only 1 used).
A VNFS capsule containing the node OS to be provisioned from the server to the node. We used the Debian-derived gravityos module.

This is the simplest Perceus configuration; You can also use multiple Perceus servers with local or nonlocal shared filesystems. For example, in a production cluster, a single hefty server could be used as the login/head node, the Perceus server, and the storage server, altho this puts a lot of eggs in a single basket. An alternative is to keep the head/login node separate and put the Perceus and storage server on the same node. In the following schema, we will use a single server for everything; the rationale is that if one of the parts goes down, most of the cluster functionality is lost anyway.

6.1.2. Network Configuration

Install a Debian-based Linux OS if you haven’t done so. As stated above, we installed Ubuntu 10.04(LTS) Desktop (AMD64) on our Perceus server to take advantage of the GUI tools. Obviously the Desktop version isn’t necessary (and there are good reasons not to use it).

Since you’ll need an operating network to update the OS and obtain the optional packages, let’s address the network configuration 1st. I’ve never had a good experience with any default Network Manager. The alternative is to edit the /etc/network/interfaces file by hand.

Our /etc/network/interfaces file:

auto lo
  iface lo inet loopback
  # The primary network interface

auto eth0
  iface eth0 inet static
  address 128.200.34.147
  netmask 255.255.255.0
  network 128.200.34.0
  broadcast 128.200.34.255
  gateway 128.200.34.1
  # dns-* options are implemented by the resolvconf package, if installed
  dns-nameservers 128.200.1.201

auto eth1
  iface eth1 inet static
   address 192.168.1.1
   netmask 255.255.255.0
   network 192.168.1.0
   broadcast 192.168.1.255
   # if you want the cluster nodes to be able to see the
   #   public internet, include the following 2 lines
   gateway 128.200.34.1
   dns-nameservers 128.200.1.201

We also need to use IP Masquerade to enable the the private 192.168.1.0 network to communicate with the public 128.200.34.0 network and gain access to the outside world. You can directly manipulate iptables to make this configuration, but we chose to use guidedog (part of the KDE Desktop), which accomplished this transparently.

Restart the network to activate the new configurations and check that the OS thinks everything is fine.

/etc/init.d/networking restart
ifconfig
# should dump a configuration that show that eth0 is assigned 128.200.34.147
# and eth0 is assigned 192.168.1.1 and you should now be able to ping
# to remote hosts
ping www.google.com
PING www.l.google.com (66.102.7.99) 56(84) bytes of data.
64 bytes from lax04s01-in-f99.1e100.net (66.102.7.99): icmp_seq=1 ttl=53 time=2.94 ms
64 bytes from lax04s01-in-f99.1e100.net (66.102.7.99): icmp_seq=2 ttl=53 time=4.46 ms
 ... etc ...

6.1.3. Software

The Perceus server will need the following packages and files to run Perceus. Since we’ll be running Perceus on a Ubuntu server, the packages are referenced using the Ubuntu deb names. The client nodes need nothing of course, since they will be fully provisioned by Perceus. The nodes do need to be of recent enough vintage that they can be configured to PXE-boot, which is set using the BIOS configuration (which unfortunately requires you to boot each node into the BIOS configuration screens one time to set this via the Boot or Startup screens.

Here are the packages and files needed to run the basic Perceus on the Ubuntu 10.04(LTS) server.

Files (not part of a Ubuntu distribution).

Perceus Version 1.6 Debian Package
gravityos (base VNFS Image)
format_mount.pl - locally written disk-formatting utility.

Deb packages (and dependencies, if not noted explicitly).

libnet-daemon-perl             nfs-kernel-server
libnet-pcap-perl               nasm
libplrpc-perl                  perl
libunix-syslog-perl            libdbi-perl
libyaml-perl                   libio-interface-perl
libyaml-syck-perl              libnet-arp-perl
openssh-server
guidedog        # depends on KDE; could also manipulate iptables directly

Install them all with:

sudo apt-get install libnet-daemon-perl nfs-kernel-server           \
   libnet-pcap-perl nasm libplrpc-perl perl libunix-syslog-perl     \
   libdbi-perl libyaml-perl libio-interface-perl libyaml-syck-perl  \
   libnet-arp-perl openssh-server guidedog

6.2. Installing and Configuring Perceus

Getting the necessary Perceus-specific packages and files to install Perceus on the main server:

cd ~
mkdir perceus-dist
cd perceus-dist
# now get the required debs from Infiscale
wget http://altruistic.infiscale.org/deb/perceus16.deb
wget http://altruistic.infiscale.org/~ian/gravityos-base.vnfs

# install Perceus
sudo dpkg -i perceus16.deb

# and start it.
sudo perceus start

When perceus start executes for the 1st time, it will ask some questions about how you want to configure the cluster. The questions are quite straightforward and usually the default answer is acceptable. The following demonstrates the questions, with comments prefixed by ##. Accepting the default is designated by <Enter>

Do you wish to have Perceus do a complete system initialization (yes/no)? yes

What IP address should the node boot address range start at?
(192.168.1.192)> 192.168.1.11
## the private net is going to be used ONLY for the cluster, so we only
## reserve the 1st 10 addresses for special-purpose servers.

What IP address should the node boot address range end at?
(192.168.1.254)> <Enter>

What domain name should be appended to the DNS records for each entry in
DNS? This won't require you to specify the domain for DNS lookups, but it
prevents conflicts from other non-local hostnames.
(nac.uci.edu)> <Enter>
## Perceus determines what local net you're on

What device should the booting node direct its console output to? Typically
this would be set to 'tty0' unless you are monitoring your nodes over the
serial port. A typical serial port option might be 'ttyS0,115200'.
note: This is a global option which will affect all booting nodes.
(tty0)> <Enter>

Creating Perceus ssh keys
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/perceus.
Your public key has been saved in /root/.ssh/perceus.pub.
The key fingerprint is:
cb:4e:bb:ee:6c:95:65:f9:a4:89:23:a7:f6:de:23:63 root@flip
The key's randomart image is:
+--[ DSA 1024]----+
|         ..      |
|     . . o.      |
|    . + o  . .   |
|        +.  +    |
|    o S   .  +   |
|   . . o o  o    |
|      . + ..     |
|     .E+.o.      |
|     ..oooo      |
+-----------------+
Created Perceus ssh host keys
Created Perceus ssh rsa host keys
Created Perceus ssh dsa host keys

Perceus is now ready to begin provisioning your cluster!
## pretty easy, no?

6.3. Importing a VNFS Capsule to Perceus

At this point, we’ll be importing a VNFS capsule created by the developers of Perceus. The VNFS capsule includes a Debian-based OS image of gravityos.

Locate the gravityos-base.vnfs OS capsule that you just downloaded and import it using the following shell command.

sudo perceus vnfs import /path/to/gravityos-base.vnfs

After importing the capsule, there will be a prompt asking to create a root password for the VNFS image, gravityos-base in this case. This will be your only login for the basic node setup unless other users are added later on.

There will also be a series of configuration questions (mostly network) regarding the VNFS image. These questions are straightforward; we will be add more details in later versions, if necessary.

Your modified VNFS files are located in /etc/perceus/vnfs; the rest of the Perceus configuration files are in /etc/perceus.

The file /etc/perceus/dnsmasq.conf is automatically configured based on answers provided during the installation process. If you misconfigured somewhere regarding network settings and need to fix it, this is the file to check. You’ll also find the dhcp boot range (IP addresses provisioned to nodes) for the nodes here.

The file /etc/perceus/defaults.conf holds a default set of configurations to provision nodes that were not explicitly identified in the Perceus cluster, including giving a default image to an unidentified node. The settings found in this configuration file will include:

default image
Starting IP # of the client nodes
default group

In the same /etc/perceus/defaults.conf, set "Vnfs Name = NAMEOFOS-base". In our test cluster, we set it to "Vnfs Name = gravityos-base".

The file /etc/perceus/perceus.conf is also automatically configured by Perceus during the installation process. Make sure the master network device is the ethernet port for the private network (eth1 in our case) and the VNFS transfer method is nfs.

Now power-on the nodes and the Perceus server should provide default settings and add new nodes to its database. This ends the basic setup of Perceus.

When the provisioning is complete you should have a set of nodes that starts from the starting IP# and increases up to the maximum number you set. You should be able to login to the nodes at the console as root (and only as root). You should also be able to ssh to the nodes as root from the Perceus master and poke around to verify that the node is a true compute node. Adding other user names is covered below in the Production Setup.

7. Reconfiguration of the VNFS

Once the Perceus clients are up and running, you will soon discover that you need other services and filesystem mounts available. While you can make these changes on a live node to verify that they work correctly, in order to make these changes permanent, you’ll have to make the changes on the Perceus-imported VNFS image on the Perceus server and then mirror the changes to the image. In most cases, it’s sufficient to make the changes in the image and then livesync the changes to the cluster, but you should designate 1 node as a test target and test the new image against that target before any changes are launched cluster-wide.

This is where having a fast Perceus server WILL make a difference, since the ~200MB image has to be processed and compressed each time it’s written out. While it’s a bit of a pain, the best approach is as described above - test the changes on a live node and then immediately reiterate the change on the mounted image, then test the change in the image by rsyncing it to a designated stunt node.

8. Perceus Production Setup with BDUC

The additional features for the Production Setup are:

Kerberos (network authentication) to allow transparent centralized authorization to the cluster.
NIS (Network Information Service) to allow transparent user login to any node after Kerberos authotization
NFS (Network File System) in conjuction with NIS, allows users to access their files from any node in the cluster.
autofs / automount - allows the remote filesystems to be mounted on demand and unmounted when idle to prevent stale/locked NFS mounts.
format_mount.pl - detects, partitions, mkswap’s, and mkfs’s the node disk to allow swap and /scratch to be made & used.

For our cluster, we are using the campus Kerberos server for authorization - ie, the Perceus server is neither the Kerberos server nor the NIS/NFS server, so we can make use of those external services without configuring the Perceus server to supply these additional services; it just has to be configured to consume these services.

To do this, you’ll needed these additional packages (and dependencies).

krb5-clients                           autofs5
libpam-krb5                            parted
nis                                    krb5-kdc
binutils

#install them all with ..
sudo apt-get install krb5-clients libpam-krb5 nis  autofs5 krb5-kdc binutils parted

the krb5 realm is 'UCI.EDU'
the kerberos kdc server is   'kerberos.service.uci.edu'
the kerberos admin_server is 'kerberos.service.uci.edu'
the NIS domain we want to join is 'YP.bduc.uci.edu'
see bduc-login:/etc/yp.conf

And these configuration files for your cluster (BDUC in our case)

Files from a NIS/NFS/Kerberos client in your cluster

/etc/yp.conf (from a NIS client)
/etc/ypserv.conf
/etc/nsswitch.conf
/etc/krb5.conf
/etc/autofs_ldap_auth.conf
/etc/auto.master
/etc/auto.misc # not used by BDUC
/etc/auto.net  # not used by BDUC
/etc/auto.smb  # not used by BDUC

These simply need to be copied to the same position on the Perceus server (after carefully making backups of the originals). This will allow us to access our campus LDAP server for login information and automount BDUC’s user and application filesystems.

Once the files are backed up and copied to the Perceus Server, the services have to be started.

# start the NIS services
sudo /etc/init.d/nis start
# the following line initializes the local Kerberos database.
# REMEMBER the password you set!! (should only need to be done the 1st time)
kdb5_util create -s
# and then start the krb5 services.
/etc/init.d/krb5-kdc start

8.1. Sun Grid Engine requirements

8.1.1. Perceus Master

For the Perceus master to be included usefully in the SGE domain, it must:

automount the exported SGEROOT
have the binutils installed (see above)
be added as an execution host to SGE
be added as a submission and/or admin host to SGE
for SGE jobs to be sent to the node, it has to be added to a host group which is servicing jobs (ie: @long).

8.1.2. Perceus Clients

For the Perceus client nodes to be included in the SGE domain, the vnfs module has to include the same configurations.

The same remote NFS mounts have to be automounted. /home (mounted over existing /home, if need be) /sge52 /apps (automounted on request)

8.2. NFS access to cluster files

To access your cluster files from the Perceus server, you’ll need the help of a BDUC admin to modify the /etc/exports file on all NFS servers that supply files to BDUC (bduc-login, bduc-sched). The file needs to be edited to allow the Perceus server to mount the exported files. Don’t forget to exportfs the configuration on the NFS servers.

Finally, test whether our Perceus server is connected to our NIS master on BDUC (bduc-sched) by executing the following on the Perceus server:

ypcat passwd
<dumps yp passwd info>

This command listed login information and directories of all users in the BDUC cluster, so it was a success!

Once you restart the networking on the Perceus server, you should be able to ssh to the Perceus main server with your UCINetID username and password and be able to read/write your BDUC files as if you were logged into BDUC.

We’re now done with the Perceus server; now onto the client nodes.

9. Configuring the Perceus Clients

In order for the Perceus clients to gain these same abilities, the above configuration files have to be copied to the chrooted, live-mounted VNFS image in the same location (/etc/..), and the same debs have to be chroot-installed into that image as described on pages 22-23 in the Perceus User Guide.

9.1. Mount the VNFS

On the Perceus server, we have to mount the VNFS image using

sudo perceus vnfs mount gravityos-base # or the name of the VNFS pkg used

The mount directory is /mnt/gravityos-base

We’ll have to chroot into the directory so we can install packages in the now-live image.

sudo chroot /mnt/gravityos-base

9.2. Install & Configure the Debs

As with the server, here are the packages we’ll need to install for the VNFS image:

krb5-kdc                               nis
krb5-clients                           autofs5
libpam-krb5                            parted

#install them all (still in the chroot) with ..
sudo apt-get install krb5-kdc krb5-clients libpam-krb5 nis \
   autofs5 parted

# and exit the chroot
exit

Repeat the NIS and Kerberos configuration file copying as described in the production setup above. Essentially, you have to copy those file from the Perceus server to the VNFS image.

# as root on the Perceus master
cd /etc
cp yp.conf ypserv.conf nsswitch.conf krb5.conf autofs_ldap_auth.conf auto.master /mnt/gravityos-base/etc

# have to check that the autofs file is chmod'ed correctly so that
# the owner (root) can rw
chmod u+rw   /mnt/gravityos-base/etc/autofs_ldap_auth.conf

# and group and other can't rwx
sudo chmod og-rwx /mnt/gravityos-base/etc/autofs_ldap_auth.conf

Once those files are copied and you verify that the init scripts are in place in the VNFS, you have to push those changes to the nodes. This can be done via the livesync option or by the entire export/reboot process. The livesync is much faster and involves using ssh and rsync to push all changes to the nodes while they’re still live.

sudo perceus vnfs livesync gravityos-base

Eventually, you’ll have to unmount the VNFS (which results in a significant delay as the image has to be compressed) and then reboot your test client to verify that it works from the ground state.

The umount is done via a specific perceus command

sudo perceus vnfs umount gravityos-base

and you can export it to save it as a backup or to make it available to others

sudo perceus vnfs export gravityos-base /path/to/gravityos-base_<mod_date>.vnfs

As above, check that after a network restart, the client node can automatically communicate to the campus Kerberos server and the BDUC NIS/NFS servers.

9.3. Automatic disk processing

The next step is to convince the format_mount.pl script to execute during boot time by incorporating it into the init script sequence.

Get the file format_mount.pl and chmod it so it becomes executable.

chmod +x format_mount.pl

Copy it to the /bin directory of the VNFS image. (You’ll have to re-mount the image if you’ve exported it).

sudo perceus vnfs import /path/to/gravityos-base.vnfs
sudo cp /path/to/format_mount.pl /mnt/gravityos-base/bin

Now we need to edit /etc/rc.local to pass in arguments to the script so it can run during boot time. It’ll run quite late, but only the user will be accessing the swap and scratch partitions so it won’t affect other processes.

Add this line to the VNFS’s /etc/rc.local file (/mnt/gravityos-base/etc/rc.local). Make sure it’s above the line that executes exit 0.

...
# we determined that the VNFS detected the disk as '/dev/sda' by examining
# dmesg output on 1st boot.

format_mount.pl sda 8000 xfs NODEBUG
exit 0

Now save and compress the modified image by unmounting the VNFS image:

sudo perceus vnfs umount gravityos-base

Once that’s finished, reboot the stunt node to test, and if it appears with new swap and a /scratch dir, the production setup is complete.

10. Integrating Perceus with the existing BDUC Environment

We have a production cluster (BDUC) and can’t bring it down for several days to re-do it as a Perceus cluster. We are therefore integrating a small (25 node) Perceus cluster with the existing CentOS cluster and when we’ve debugged it to the point where it behaves, we’ll flip the entire cluster over a weekend day. As noted, we already have a small Ubuntu-based sub-cluster integrated with the subcluster, so using a Debian-based distro won’t be a completely new experience.

Our new Perceus master server, claw1 has the Ubuntu 10.04 distribution installed. We’ve already installed Perceus 1.6 from using the Debian package downloaded from the main website. Integrating Perceus into our production server is not hard. Using the tutorial/process above, we discuss the differences below.

10.1. New Applications required

We needed to install TCL to support Modules:

# as root
perceus vnfs mount percdebian
chroot /mnt/percdebian
apt-get install tcl8.3

10.2. Hardware Changes

There are very few hardware changes need to support Perceus. The only notable changes are to set the BIOS to request a netboot on the correct interface (the source of a number of initial errors) and to change the BIOS Chipset section so that the EDAC system will record ECC errors. This is BIOS-specific, so we’ll not go into this in depth.

10.3. Perceus Configuration file changes

10.3.1. /etc/perceus/perceus.conf

Originally, the controlport and VNFS transfer master were set to localhost. Since claw1 has both a public and private IP address, we don’t want Perceus to use the wrong ethernet port. Thus we explicitly specified the localhost with the private network IP address.

master network device = eth0

vnfs transfer method = nfs

vnfs transfer master = 10.255.78.5

vnfs transfer prefix =

database type = btree

database server = localhost
database name = perceus
database user = db user
database pass = db pass

node timeout = 600

controlport bind address = 10.255.78.5

controlport allow address = 10.255.78.5

10.3.2. /etc/perceus/default.conf

Node Name = n###

Group Name =

Vnfs Name = perdebian

Enabled = 1

First Node = 101

10.3.3. /etc/perceus/dnsmasq.conf

This file is generated after running sudo perceus init. Shouldn’t have to modify anything here, besides the dhcp-range, if needed.

interface=eth0
enable-tftp
tftp-root=/usr/var/lib/perceus/tftp
dhcp-option=vendor:Etherboot,60,"Etherboot"
dhcp-boot=pxelinux.0
local=/
domain=bduc
expand-hosts
dhcp-range=10.255.78.100,10.255.78.254
dhcp-lease-max=21600
read-ethers

Note that more DNS changes are noted below

10.3.4. /etc/fstab

We have to configure the VNFS capsule to contact 10.255.78.5, the private IP address of claw1. Following the tutorial above, mount the VNFS capsule, chroot into the directory, and edit the /etc/fstab file. We need 2 NFS mounts from the Perceus master, the shared Perceus lib and the master’s /usr tree to provide apps and libs without increasing the size of the VNFS

The modifications should be similar to this:

# the perceus shared dir
10.255.78.5:/usr/var/lib/perceus /usr/var/lib/perceus nfs ro,soft,bg 0 0

# the claw1 /usr tree to share apps, libs with nodes (see text)
10.255.78.5:/usr                  /u                    nfs ro,soft,bg 0 0  0 0

These are permanent mounts with the accompanying pros (easy) and cons (will fail if claw1 NFS server locks up). We may switch to the automount process that we use for most other NFS mounts on the cluster if we have trouble with the permanent mounts.

10.3.5. /etc/profile

We have to add some paths in the above /u to allow the Perceus nodes to find the apps/libs it provides.

# near top of file
if [ "`id -u`" -eq 0 ]; then PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/u/bin:/u/local/bin:/u/sbin"
 else
   PATH="/usr/local/bin:/usr/bin:/bin:/u/bin:/u/local/bin"
 fi
export PATH

# ... set other non-Perceus-related things

# then set the LD_LIBRARY_PATH to direct to NFS mounts
export LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib:/u/lib:/u/local/lib:/opt/lib:$LD_LIBARY_PATH

# and do the same thing for a variety of other ENV-related variables (non-exhaustive)
export PERL5LIB=/u/share/perl5
export PYTHONPATH=/u/lib/python2.6

SGE wasn’t automounting at all. We’ve discovered the problem to be that the nodes couldn’t contact bduc-sched which controls the automount feature for SGE. It seems that the Perceus client nodes were trying to contact bduc-sched’s through its public IP address which they cannot "see". To remedy this, we had to modify the /etc/hosts file and define bduc-sched to be its private IP address.

10.3.6. /etc/hosts

To allow SGE to automount correctly from bduc-sched, we had to add the private IP number to the nodes' /etc/hosts file, along with claw1.

127.0.0.1     localhost.localdomain     localhost
10.255.78.5   bduc-claw1.nacs.uci.edu   claw1
10.255.78.3   bduc-sched.nacs.uci.edu   bduc-sched sched

10.4. VNFS Changes

To use our preconfigured VNFS capsule from the remote Perceus install, we had to move the VNFS to the new Perceus master, claw1 There are two ways to do this. The first and recommended approach is to log onto the old Perceus master and export the VNFS capsule using sudo perceus vnfs export. Then copy the file onto the new Perceus master server and import it using sudo perceus vnfs import. The other approach is to simply tar up the /etc/perceus/vnfs/VNFSNAME directory, copy it into the new Perceus master server, and extract it (as root) in the /usr/var/lib/perceus/vnfs (symlinked to /etc/perceus/vnfs) directory.

For our setup, we went with the tarball and it seems to have worked correctly.

Environment and - modify vnfs ld.so.conf.d to provide info about the new libs. - modify vnfs PATH to include /u/bin, /u/local/bin - modify nvfs ENV variables for locale, etc. - change the /etc/apt/sources.list to use the same Ubuntu sources as the master. - add and master node’s root public keys to the VNFS’s /root/.ssh/authorized_keys so that you’ll be able to ssh in without a password.

10.5. Symbolic Link Changes

We need to modify any claw1 symlinks that use full paths to avoid redirecting to the node /usr tree.

ie: we have to change links like this:

    ln -s /usr/lib/libblas.so.3 -> /usr/lib/libblas.so.3.0
to this format:
    cd /usr/lib; ln -s libblas.so.3 -> libblas.so.3

10.6. Testing all Module applications

We now have about 140 Module-based apps and libs. Each one has to be run to verify that it works correctly on the new nodes. We suspect that only those that have a specific libc or kernel requirement will fail, but this has to be tested. Those that don’t run and can’t be addressed with symlinks to existing libs will have to be recompiled.

10.7. Named / DNS Changes

Perceus was designed to run as a homogeneous cluster. Since we’re running it as a heterogeneous cluster, that presents us with some DNS problems. While the Perceus nodes know about each other and can resolve external-to-Perceus hosts, the other hosts in the cluster don’t know about the Perceus nodes. In order to allow this to happen, the authoratative DNS server for the cluster (on the login node, not claw1) has be explicitly updated with the Perceus node information. /etc/resolv.conf on claw1 points to bduc-login which is the authoratative nameserver for the cluster. Because of that designation, we have to provide bduc-login with the correct IP# & name mappings so that the other cluster nodes can resolve the Perceus nodes. this is especially important for the SGE scheduler.

To this end, we have written a Python script which watches the dhcp-leases file on the Perceus master and re-writes the named database files on bduc-login if there are any changes.

10.7.1. Named Modification Script

This Python script will:

monitor the Perceus dhcpd.leases file and on a change, will
re-write the named database files 10.255.78.db and bduc.db on claw1 (in /var/named/chroot/var/named)
make backups of those files on bduc-login
copy the new files into place on bduc-login, and
cause named to re-read the configuration files.

The script is here.

10.8. Adding Perceus nodes to SGE

Many problems associated with SGE are associated with the named problems noted immediately above. The few remaining To Dos are:

VNFS Chroot

Add the sgeexecd script to the /etc/init.d/ directory on the VNFS
Configure /etc/init.d/sgeexecd and append the PATH variable with /u/bin
Install sgeexecd with update-rc.d sgeexecd defaults

You SHOULD NOT have to make any modifications towards the perceus nodes /etc/hosts file for SGE.

SGE Side

Add the nodes as Execution Hosts with qconf -ae <template>. Or use the qmon GUI. It’s probably a wash as to speed.

Also, add the nodes as Submit Hosts with qconf -as <node.name>. This is probably faster with the command line. If they are not added as Submit Hosts, they will not be able to run the necessary SGE utilities to support sgeexecd. If this is not done, you’ll get this kind of error:

 $ qhost
error: commlib error: access denied (client IP resolved to host name "". This is not identical to clients host name "")
error: unable to contact qmaster using port 536 on host "bduc-sched.nacs.uci.edu"

which is difficult to debug since it doesn’t say anything about Submit Hosts

11. One genchroot to rule them all

Or, how to use Tim Copeland’s mint-buntu-deb_genchroot.sh script to make new vnfs packages based on Mint, Ubuntu, or straight Debian.

Tim Copeland wrote and is supporting a single genchroot script that downloads and creates vnfs’s for the Debian based distro’s.

It is very easy to use. Just download and execute with the correct options. To generate a Debian 7 (Wheezy) based vnfs, just direct the master script to do so:

# as root
./mint-buntu-deb_genchroot.sh -D x86_64 -c wheezy -r 7.0 -m
 ...
# the script auto-names the above definition & puts the capsule in
# '/var/tmp/vnfs/debian-7.0-1.amd64' by default

# then make it stateless or stateful (haven't tried the stateful yet)
./chroot2stateless.sh /var/tmp/vnfs/debian-7.0-1.amd64 \
/usr/var/lib/perceus/opt_vnfs/deb7.stls.vnfs

# new vnfs has to be imported into perceus & modified for local config
perceus vnfs import /usr/var/lib/perceus/opt_vnfs/deb7.stls2.vnfs
# .. (answer a few simple questions.)
# ...
# VNFS Configuration Complete
#
# Un-mounting VNFS 'deb7.stls2'...
# This will take some time as the image is updated and compressed...
# VNFS 'deb7.stls2' has been successfully imported.

# config ends with the vnfs unmounted so it has to be mounted to chroot for
# further mods
# still as root
perceus vnfs mount deb7.stls
chroot /mnt/deb7.stls

# install the infiniband modules, tcl, some utils
apt-get install  bzip2 dapl2-utils dialog diffstat ibsim-utils ibutils \
 ibverbs-utils infiniband-diags joe less libclass-isa-perl libdapl2 \
 libfribidi0 libgdbm3 libibcm1 libibcommon1 libibdm1 libibmad1 \
 libibumad1 libibverbs1 libipathverbs1 libmlx4-1 libmthca1 libnewt0.52 \
 libopensm2  librdmacm1 libsdp1 libswitch-perl libumad2sim0 \
 module-assistant netbase opensm perftest  perl-modules \
 rdmacm-utils rds-tools   sdpnetstat srptools tcl8.4 tclreadline \
 whiptail   libpam-krb5 nis autofs5 parted sudo

# dont need krb5-kdc krb5-clients - this is the server; only need the clients.
# NB:
# - the krb5 realm is 'UCI.EDU'
# - the kerberos kdc server is   'kerberos.service.uci.edu'
# - the kerberos admin_server is 'kerberos.service.uci.edu'
# - the NIS domain we want to join is 'YP.bduc.uci.edu'(see
#   'bduc-login:/etc/yp.conf')

And for the Modules system, need to set the VNFS

# as root on the perceus master; NOT chrooted yet
VNFS=/your/VNFS/mount/point
# ie VNFS=/mnt/deb7.stls

and then mouse the rest into a root shell

cp /usr/bin/modulecmd     ${VNFS}/usr/bin
# local disk util to set up swap and /scratch on an unpartitioned disk
cp /usr/var/lib/perceus/format_mount.pl ${VNFS}/bin
# local rc.local to set up various module, format_mount.pl
cp /usr/var/lib/perceus/rc.local ${VNFS}/etc
# our local fstab
cp /usr/var/lib/perceus/fstab    ${VNFS}/etc
# our local hosts file
cp /usr/var/lib/perceus/hosts    ${VNFS}/etc
cd /usr/lib/
cp libX11.so.6            ${VNFS}/usr/lib/libX11.so.6
cp libxcb.so.1.1.0        ${VNFS}/usr/lib/libxcb.so.1
cp libXdmcp.so.6.0.0      ${VNFS}/usr/lib/libXdmcp.so.6
cp libXau.so.6.0.0        ${VNFS}/usr/lib/libXau.so.6

# cp /etc config files across to the VNFS
cd /etc
cp yp.conf ypserv.conf nsswitch.conf krb5.conf autofs_ldap_auth.conf auto.master ${VNFS}/etc
# and set the permissions
chmod u+rw     ${VNFS}/etc/autofs_ldap_auth.conf
chmod og-rwx   ${VNFS}/etc/autofs_ldap_auth.conf

# prep the SGE init script
cp /etc/init.d/sgeexecd   ${VNFS}/etc/init.d
# now chroot
chroot ${VNFS}
# and update the scripts
update-rc.d sgeexecd defaults

# need to add a symlink for the logger in the chroot (for
# the way we've set up syslogging (on a CentOS server that
# provides the modules - it's complicated)
ln -s /usr/bin/logger /bin/logger
ln -s /apps/Modules   /usr/share/Modules
# then exit the chroot
exit
# unmount the vnfs to compact and prep it for distribution
perceus vnfs umount deb7.stls

So now the new, customized VNFS is ready to distribute to nodes. In order to distribute to a few nodes as a test, you have to define them as a group and then define that group to get the new VNFS.

First, lets’s take a look at the current disposition of the nodes:

$ perceus node summary
HostName             GroupName    Enabled   Vnfs

n101 (undefined) yes debuntu n102 (undefined) yes debuntu n103 (undefined) yes debuntu … n137 (undefined) yes debuntu n115 debian6 yes deb6.stls n138 debian7 yes deb7.stls n139 debian7 yes deb7.stls

You can see that most of the nodes have NOT been assigned to a group and get teh default 'debuntu' VNFS. Of the ones that have, 'n115' is part of the 'debian6' group which gets the 'deb6.stls' VNFS.


So let's define the nodes that we want to be in the test group.

# we’ll define a test group debian7 to include nodes n115 & n138

$ perceus node set group debian7 n115 n138

## Output is: # Hostname Group NodeID # -------------------------------------------------------------- # n115 debian6 00:D0:68:12:09:D1 # n138 (undefined) 00:25:90:58:57:7A

# note that node n115 was previously set to debian6 and will now be shifted # to debian7

# Are you sure you wish to set group=debian7 on 2 nodes? # Please Confirm [yes/no]> yes # 2 nodes set group=debian7

Then set the new group to get the new VNFS

perceus group set vnfs deb7.stls debian7

## Output is: # Hostname Group NodeID # --------------------------------------------------------- # n138 debian7 00:25:90:58:57:7A # n115 debian7 00:D0:68:12:09:D1 # # Are you sure you wish to set deb7.stls on 2 nodes? # Please Confirm [yes/no]> yes # 2 nodes set vnfs=deb7.stls

So now the new group 'debian7', composed of 'n138 & n115', will get the newly created VNFS 'deb7.stls' on reboot.

Verify that you've done this correctly with:

$ perceus node summary HostName GroupName Enabled Vnfs

n101                 (undefined)      yes   debuntu
n102                 (undefined)      yes   debuntu
n103                 (undefined)      yes   debuntu
...
n137                 (undefined)      yes   debuntu
n115                 debian7          yes   deb7.stls <---
n138                 debian7          yes   deb7.stls <---
n139                 debian7          yes   deb7.stls <---

And then reboot node n115 and see what happens:

ssh n115 reboot

perceus node summary # see above perceus node set group debuntu n134 # sets n134 to group debuntu perceus group set vnfs debuntu debuntu # sets the debuntu group to get the debuntu vnfs perceus node set group debuntu n138 # sets n138 to be part of the debuntu group perceus group set vnfs debuntu debuntu # as above; resets all the debuntu group to get debuntu vnfs

12. Adding, replacing, masking nodes from the perceus system

To replace a node that has failed with a new node, instead of booting a new node and then having to modify the SGE configs, you can simply delete the failed node and when a nw node is either added or PXE-boots, it will take the 1st missing hostname in the defined sequence:

# n110 fails due to hardware
perceus node delete n110
perceus add node [new MAC Address - ie 00:d0:68:12:9d:c3]
# when the new node boots, it will take the 1st missing number in the available sequence.

To mask MACs that should be booting from ROCKS or another PXE/TFTP boot system, add their MAC addresses to the /etc/perceus/dnsmasq.conf file.

# following are all the MACs for all the a64 machines that should be masked from
# getting a perceus OS.
...
dhcp-host=00:E0:81:32:01:F6,ignore
dhcp-host=00:E0:81:30:26:18,ignore

13. Bugs, oddities, ToDos

sudo still fails on the Perceus nodes. We can su - root but we’ll have to figure this out.
after a chroot, changing some Perceus configuration files, unmounting the VNFS image, and then rebooting, some nodes would not get the correct network info until we did a sudo perceus init followed by a sudo perceus configure nfs. This only happened once.
for some reason, in the /etc/resolv.conf file, localhost does not work, but 127.0.0.1 does.

14. Appendix

14.1. Competitors / Alternatives

14.1.1. Scyld ClusterWare

Commercial HPC (High Performance Computing) management cluster solution. Supported with Red Hat Linux Enterprise only. Scyld Official Website

14.1.2. xCAT

Open Source software similar to Perceus but not as simple or automated. xcat Official Website

14.1.3. Univa UD

Commercial Open Source cluster management software similar to Scyld and ROCKS (below)

14.1.4. ROCKS

Popular stateful provisioning middleware based on RPM and prepackaged application packs called rolls which can be added to add a chunk of functionality. ROCKS Official Website.

14.1.5. Linux Terminal Server Project

The LTSP is not exactly a competitor of Perceus, but it uses a lo tof the same or similar technology and I could see Perceus being used to provision thin clients and application servers in the same way that it provisions cluster nodes.