//Perceus Debian Package 1.6 Tutorial/Report //This report mainly covers the features of Perceus //and explain why it should be incorporated into the //BDUC cluster. This report will also provide tutorials //and examples to set up Perceus. //Run and create html : // export fileroot="/home/hjm/nacs/Perceus-Report"; asciidoc -a toc -a toclevels=4 -a numbered ${fileroot}.txt; scp ${fileroot}.[ht]* moo:~/public_html; // export fileroot="/home/anthony/Dropbox/Work/Perceus-Report"; asciidoc -a toc -a toclevels=4 -a numbered ${fileroot}.txt; // scp ${fileroot}.txt hmangala@claw1:~/bduc/trunk/sge; ssh hmangala@bduc-login 'cd ~/bduc/trunk/sge; svn update; svn commit -m "new mods to Perceus HOWTO"' The Perceus Provisioning System =============================== by Anthony Vuong & Harry Mangalam v1.12, June 30, 2011 //(thanks to Kaz Okayasu for the loan of the Gb switch) //This section will give a detailed description as to what Perceus has to offer. //There will be detailed information about the exclusive features of Perceus. //Also show how reliable, flexible, and scalable it is. What's Perceus? --------------- http://perceus.org/[Perceus] (Provision Enterprise Resources & Clusters Enabling Uniform Systems) is an Open Source provisioning system for Linux clusters developed by the creators of http://en.wikipedia.org/wiki/Warewulf[Warewulf], of which Perceus is the successor. Perceus typically runs as a server process on an administrative node of a cluster and provides the Operating System to requesting nodes via the network. It is optimized for stateless systems - those in which the OS is not resident on-disk, but freshly net-booted at each startup - but can also provision stateful systems. It can provision nodes to be completely homogenous (as would be required for a compute cluster) or can provision them to be fairly heterogeneous as for sets of application servers such that each OS image is tuned to a particular service. Perceus also provides utilities to modify the client OS images and push out changes either immediately via rsync or to save them to the client image to be refreshed at next reboot. As befits a tool for handling thousands of nodes, Perceus handles most things automatically. It will detect unidentified MAC addresses in the private network, add them into the default Perceus group, and provision a default image. Perceus can also set specific configurations for certain nodes based on MAC address. It should be noted that Perceus is well supported by http://www.infiscale.com/[Infiscale], the company formed to commercialize it. There is a http://altruistic.infiscale.org/docs/perceus-userguide1.6.pdf[User Guide for Perceus 1.6] that should be used as the definitive introduction and guide to Perceus. This document varies from that one in that it is more closely focussed on installing Perceus on Debian-based systems (specifically Ubuntu 10.04(LTS)) and then integrating the Perceus server and the provisioned cluster with an already existing NIS/NFS/Kerberos campus system. This probably represents a large proportion of how Perceus will be installed. //Perceus, Rocks, Univaud, LSTP.org Operating System Support ------------------------ Currently, Perceus supports the following operating systems for both server and clients. - http://www.debian.org/[Debian] and Debian-derived distributions such as http://www.ubuntu.com/[Ubuntu]. This report will describe the configuration of a Ubuntu 10.04(LTS)-based cluster. - http://www.redhat.com/rhel/[Red Hat Enterprise Linux 5] and newer, and other RPM systems such as http://fedoraproject.org/[Fedora] and http://www.centos.org/[CentOS]. Perceus.org provides http://www.perceus.org/site/html/documentation.html[Quickstart Guides] for some of these OSs. Perseus supplies the client nodes with their OSs in 'modules', a stripped-down version of the OS (no Desktop GUI, minimal libraries & utilities, no applications) that provides the base functionality The OS modules included with Perceus are: - GravityOS - Debian-based linux distribution. - Caos NSA - An RPM-based distribution of linux that focuses on high performance computing. A Perceus Glossary ------------------ Since the Perceus approach is somewhat different than the typical static-OS-on-disk approach, we'll dedicate some space to defining some Perceus terms: Stateful Provisioning ~~~~~~~~~~~~~~~~~~~~~ In most computers, the OS is installed on the local disk. Stateful provisioning is the process of obtaining an OS image from a server and installing that OS to the local disk of the client node. This approach is useful when bandwidth is extremely limited or changes in OS image are expected to be infrequent. This is provided in Perceus v1.6 and above. Stateless Provisioning ~~~~~~~~~~~~~~~~~~~~~~ This is the opposite of stateful provisioning. Instead of installing the OS on the local disk, the provided image is installed into RAM which allows the hard disk space to be used for something else. The advantages of this approach are that the OS image is refreshed at each reboot and that any changes to the server-hosted image is propagated to each node. It also saves some disk space on each node (the nodes could be completely diskless, as long as the work loads did not require fast swap). Virtual Node File System (VNFS) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The VNFS is essentially just a disk image that is provisioned to the nodes. The VNFS is split into three parts: VNFS capsule, VNFS rootfs, and VNFS image. [[vnscapsule]] VNFS Capsule ^^^^^^^^^^^^ A VNFS capsule is a compressed base package of the OS. While you can make your own capsules, Perceus supplies sample capsules of GravityOS and CAOS which are generally sufficient for real world use. VNFS rootfs ^^^^^^^^^^^ Once you've 'imported', uncompressed, and mounted a VNFS capsule on the server using the 'perceus' utility commands, you can access the files of the image and make changes to the image. This appears as a complete root filesystem to a user on the server and can be cd'ed into, edited, upgraded as a chrooted filesystem, etc. VNFS Image ^^^^^^^^^^ The VNFS image is the actual image that is provisioned to the nodes. Once you mount and configure the VNFS rootfs, you have to unmount it to update the VNFS image. Modules ~~~~~~~ Perceus provides utilities that can import and load modules to it's nodes (SUCH AS??) Import Nodes From Other Sources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Perceus can also import nodes definitions that are used with other provisioning such as Rocks, Warewulf, and Oscar (done with a simple 'import' command). The Provisioning Process ------------------------ Step by step ~~~~~~~~~~~~ As noted above, Perceus is a client/server process, with the clients requesting their entire OS to be given to them by the Perceus Server (in stateless mode). We assume the server is up and running with a Perceus daemon running and listening to the private interface. . The client node is booted and requests an OS via PXE-boot. . The Perceus daemon responds with the first stage boot image which loads the very lightweight Perceus client daemon . The client daemon configures the node by getting a DHCP-provided IP # and initiating the PXE-boot. . The client requests a VNFS capsule and preps the in-RAM filesystem to load the OS. . Once the new kernel boots, the Perceus client daemon is purged and the RAM returned to the system. . The system runs as normal, starting whatever services and mounting whatever filesystems the VNFS is configured to do. Some Issues ~~~~~~~~~~~ Since in stateless mode the OS is net-booted each time, the node can run 'entirely diskless'. However, this requires that it have sufficient RAM to hold not only the OS and all associated filesystems in-memory, as well as all of the application and user code, but also that it never hits swap (since it doesn't have any) and the only local working space ('/scratch') is RAM based. Our nodes do have disks, but we partition them into a swap partition (since some of our codes do balloon to fairly large sizes) and to a '/scratch' partition so that user data can be pre-staged to prevent lots of network bandwidth during a run. We provide a Perl init script for the VNFS http://moo.nac.uci.edu/~hjm/format_mount.pl[format-mount.pl] that takes care of detecting, partitioning, and mkfs'ing the disk prior to the node being made available to users. //Bit Rot, Fresh OS Image Why Perceus for the BDUC cluster -------------------------------- The BDUC nodes currently use stateful provisioning through TFTP. The nodes get an entire OS initially (including many of the libs, utils, & apps) and until the OS is manually refreshed, it does not undergo any updates. Currently, these nodes are suffering from 'bit rot', the variation from a standard installation to one that is at variance with the expected image due to many reasons. Chief among them is a node going down and missing an update or cluster-wide installation. This 'bit rot' is typically handled on a case-by-case basis which can entail significant admin time. Perceus addresses the 'bit rot' problem with a 'livesync' or (worst case) simple reboot. All installation and configuration tasks will be handled by the Perceus server. If we needed to make a quick change within a node OS, we can just configure the VNFS rootfs and push the image changes to the cluster instead of going into every node and making the change. Apart from Perceus features, we can more efficiently use the node's local disk as scratch space since the OS will not have to reside on-disk, making BDUC more efficient at no additional cost and less manual intervention. mailto:ppk@ats.ucla.edu[Prakashan Korambath] and mailto:kjin@ats.ucla.edu[Kejian Jin] provided similar arguments for http://moo.nac.uci.edu/~hjm/ucla_perceus_test.pdf[using Perceus on UCLA's Hoffman2 cluster]. Getting Started --------------- We will describe the Perceus installation and configuration for both a 'minimal setup' similar to the one described in the http://altruistic.infiscale.org/docs/perceus-userguide1.6.pdf[Perceus User Guide] and a 'Production Setup' we will use to 'append' a Perceus cluster to our current BDUC production cluster. The main difference between the two is that the basic setup will allow only 'root' login to the nodes unless another user is added in the VNFS. The production version allows BDUC users to login to the Perceus-provisioned nodes and transparently access their files on the BDUC cluster via integration with the BDUC http://en.wikipedia.org/wiki/Network_Information_Service[NIS]/http://en.wikipedia.org/wiki/Network_File_System_(protocol)[NFS] and http://en.wikipedia.org/wiki/Kerberos_(protocol)[Kerberos] system. [[perceuscomponents]] Perceus Components ~~~~~~~~~~~~~~~~~~ Hardware ^^^^^^^^ Perceus requires minimal hardware to test. It requires only: - A 'private network'. This can be as few as 1 node connected to a small switch or hub. The faster the network hardware the better, but it can be as slow as 10Mb. We used a 24-port Netgear Gigabit Switch. - At least 1 'Perceus Master Server' with a 2 interfaces, one for the external Internet and one facing a private network that services the cluster. For testing purposes, the Perceus server can be a small, slow machine; the most important parts of it are the speed of the network adapters, although the CPU speed is relevant when compressing a modified VNFS. We used a AMD dualcore Opteron @ 1.4GHZ, 4GB RAM, 60GB IDE HD, Ubuntu 10.04 (AMD64) Desktop OS, 2x Broadcom 1Gb interfaces - At least 1 node whose BIOS has been configured to 'PXE-boot'. It can also be an a small, slow node, but it has to have enough RAM to hold the OS; I'd rec no less than 1GB. we used 2 nodes, each having 2 AMD Opterons @ 2.4GHZ, 8GB RAM, 320GB SATA HD, 2x Broadcom 1Gb interfaces (only 1 used). - A link:#vnfcapsule[VNFS capsule] containing the node OS to be provisioned from the server to the node. We used the Debian-derived gravityos module. This is the simplest Perceus configuration; You can also use multiple Perceus servers with local or nonlocal shared filesystems. For example, in a production cluster, a single hefty server could be used as the login/head node, the Perceus server, and the storage server, altho this puts a lot of eggs in a single basket. An alternative is to keep the head/login node separate and put the Perceus and storage server on the same node. In the following schema, we will use a single server for everything; the rationale is that if one of the parts goes down, most of the cluster functionality is lost anyway. Network Configuration ^^^^^^^^^^^^^^^^^^^^^ Install a Debian-based Linux OS if you haven't done so. As stated above, we installed Ubuntu 10.04(LTS) Desktop (AMD64) on our Perceus server to take advantage of the GUI tools. Obviously the Desktop version isn't necessary (and there are good reasons not to use it). Since you'll need an operating network to update the OS and obtain the optional packages, let's address the network configuration 1st. I've never had a good experience with any default Network Manager. The alternative is to edit the '/etc/network/interfaces' file by hand. Our '/etc/network/interfaces' file: ----------------------------------------------------------------------------- auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet static address 128.200.34.147 netmask 255.255.255.0 network 128.200.34.0 broadcast 128.200.34.255 gateway 128.200.34.1 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 128.200.1.201 auto eth1 iface eth1 inet static address 192.168.1.1 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 # if you want the cluster nodes to be able to see the # public internet, include the following 2 lines gateway 128.200.34.1 dns-nameservers 128.200.1.201 ----------------------------------------------------------------------------- We also need to use IP Masquerade to enable the the private *192.168.1.0* network to communicate with the public *128.200.34.0* network and gain access to the outside world. You can directly manipulate iptables to make this configuration, but we chose to use 'guidedog' (part of the KDE Desktop), which accomplished this transparently. Restart the network to activate the new configurations and check that the OS thinks everything is fine. ----------------------------------------------------------------------------- /etc/init.d/networking restart ifconfig # should dump a configuration that show that eth0 is assigned 128.200.34.147 # and eth0 is assigned 192.168.1.1 and you should now be able to ping # to remote hosts ping www.google.com PING www.l.google.com (66.102.7.99) 56(84) bytes of data. 64 bytes from lax04s01-in-f99.1e100.net (66.102.7.99): icmp_seq=1 ttl=53 time=2.94 ms 64 bytes from lax04s01-in-f99.1e100.net (66.102.7.99): icmp_seq=2 ttl=53 time=4.46 ms ... etc ... ----------------------------------------------------------------------------- Software ^^^^^^^^ The Perceus server will need the following packages and files to run Perceus. Since we'll be running Perceus on a Ubuntu server, the packages are referenced using the Ubuntu deb names. The client nodes need nothing of course, since they will be fully provisioned by Perceus. The nodes do need to be of recent enough vintage that they can be configured to PXE-boot, which is set using the BIOS configuration (which unfortunately requires you to boot each node into the BIOS configuration screens one time to set this via the 'Boot' or 'Startup' screens. Here are the packages and files needed to run the basic Perceus on the Ubuntu 10.04(LTS) server. Files (not part of a Ubuntu distribution). - http://altruistic.infiscale.org/deb/perceus16.deb[Perceus Version 1.6 Debian Package] - http://altruistic.infiscale.org/~ian/gravityos-base.vnfs[gravityos (base VNFS Image)] - http://moo.nac.uci.edu/~hjm/format_mount.pl[format_mount.pl] - locally written disk-formatting utility. Deb packages (and dependencies, if not noted explicitly). ---------------------------------------------------------------- libnet-daemon-perl nfs-kernel-server libnet-pcap-perl nasm libplrpc-perl perl libunix-syslog-perl libdbi-perl libyaml-perl libio-interface-perl libyaml-syck-perl libnet-arp-perl openssh-server guidedog # depends on KDE; could also manipulate iptables directly ---------------------------------------------------------------- Install them all with: ---------------------------------------------------------------- sudo apt-get install libnet-daemon-perl nfs-kernel-server \ libnet-pcap-perl nasm libplrpc-perl perl libunix-syslog-perl \ libdbi-perl libyaml-perl libio-interface-perl libyaml-syck-perl \ libnet-arp-perl openssh-server guidedog ---------------------------------------------------------------- Installing and Configuring Perceus ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Getting the necessary Perceus-specific packages and files to install Perceus on the main server: ----------------------------------------------------------------------------- cd ~ mkdir perceus-dist cd perceus-dist # now get the required debs from Infiscale wget http://altruistic.infiscale.org/deb/perceus16.deb wget http://altruistic.infiscale.org/~ian/gravityos-base.vnfs # install Perceus sudo dpkg -i perceus16.deb # and start it. sudo perceus start ----------------------------------------------------------------------------- When 'perceus start' executes for the 1st time, it will ask some questions about how you want to configure the cluster. The questions are quite straightforward and usually the default answer is acceptable. The following demonstrates the questions, with comments prefixed by '##'. Accepting the default is designated by '' ----------------------------------------------------------------------------- Do you wish to have Perceus do a complete system initialization (yes/no)? yes What IP address should the node boot address range start at? (192.168.1.192)> 192.168.1.11 ## the private net is going to be used ONLY for the cluster, so we only ## reserve the 1st 10 addresses for special-purpose servers. What IP address should the node boot address range end at? (192.168.1.254)> What domain name should be appended to the DNS records for each entry in DNS? This won't require you to specify the domain for DNS lookups, but it prevents conflicts from other non-local hostnames. (nac.uci.edu)> ## Perceus determines what local net you're on What device should the booting node direct its console output to? Typically this would be set to 'tty0' unless you are monitoring your nodes over the serial port. A typical serial port option might be 'ttyS0,115200'. note: This is a global option which will affect all booting nodes. (tty0)> Creating Perceus ssh keys Generating public/private dsa key pair. Your identification has been saved in /root/.ssh/perceus. Your public key has been saved in /root/.ssh/perceus.pub. The key fingerprint is: cb:4e:bb:ee:6c:95:65:f9:a4:89:23:a7:f6:de:23:63 root@flip The key's randomart image is: +--[ DSA 1024]----+ | .. | | . . o. | | . + o . . | | +. + | | o S . + | | . . o o o | | . + .. | | .E+.o. | | ..oooo | +-----------------+ Created Perceus ssh host keys Created Perceus ssh rsa host keys Created Perceus ssh dsa host keys Perceus is now ready to begin provisioning your cluster! ## pretty easy, no? ----------------------------------------------------------------------------- Importing a VNFS Capsule to Perceus ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At this point, we'll be importing a VNFS capsule created by the developers of Perceus. The VNFS capsule includes a Debian-based OS image of 'gravityos'. Locate the 'gravityos-base.vnfs' OS capsule that you just downloaded and 'import' it using the following shell command. ----------------------------------------------------------------------------- sudo perceus vnfs import /path/to/gravityos-base.vnfs ----------------------------------------------------------------------------- After importing the capsule, there will be a prompt asking to create a root password for the VNFS image, gravityos-base in this case. This will be 'your only login' for the basic node setup unless other users are added later on. There will also be a series of configuration questions (mostly network) regarding the VNFS image. These questions are straightforward; we will be add more details in later versions, if necessary. Your modified VNFS files are located in '/etc/perceus/vnfs'; the rest of the Perceus configuration files are in '/etc/perceus'. The file '/etc/perceus/dnsmasq.conf' is automatically configured based on answers provided during the installation process. If you misconfigured somewhere regarding network settings and need to fix it, this is the file to check. You'll also find the dhcp boot range (IP addresses provisioned to nodes) for the nodes here. // ?? the following is a bit confused ?? The file '/etc/perceus/defaults.conf' holds a default set of configurations to provision nodes that were not explicitly identified in the Perceus cluster, including giving a default image to an unidentified node. The settings found in this configuration file will include: - default image - Starting IP # of the client nodes - default group In the same '/etc/perceus/defaults.conf', set "Vnfs Name = NAMEOFOS-base". In our test cluster, we set it to "Vnfs Name = gravityos-base". The file '/etc/perceus/perceus.conf' is also automatically configured by Perceus during the installation process. Make sure the master network device is the ethernet port for the private network ('eth1' in our case) and the VNFS transfer method is 'nfs'. Now power-on the nodes and the Perceus server should provide default settings and add new nodes to its database. This ends the basic setup of Perceus. When the provisioning is complete you should have a set of nodes that starts from the starting IP# and increases up to the maximum number you set. You should be able to login to the nodes at the console as 'root' (and only as 'root'). You should also be able to ssh to the nodes as 'root' from the Perceus master and poke around to verify that the node is a true compute node. Adding other user names is covered below in the 'Production Setup'. Reconfiguration of the VNFS --------------------------- Once the Perceus clients are up and running, you will soon discover that you need other services and filesystem mounts available. While you can make these changes on a live node to verify that they work correctly, in order to make these changes permanent, you'll have to make the changes on the Perceus-imported VNFS image on the Perceus server and then mirror the changes to the image. In most cases, it's sufficient to make the changes in the image and then 'livesync' the changes to the cluster, but you should designate 1 node as a test target and test the new image against that target before any changes are launched cluster-wide. This is where having a fast Perceus server WILL make a difference, since the ~200MB image has to be processed and compressed each time it's written out. While it's a bit of a pain, the best approach is as described above - test the changes on a live node and then immediately reiterate the change on the mounted image, then test the change in the image by rsyncing it to a designated 'stunt node'. [[productionsetup]] Perceus Production Setup with BDUC ---------------------------------- The additional features for the 'Production Setup' are: - Kerberos (network authentication) to allow transparent centralized authorization to the cluster. - NIS (Network Information Service) to allow transparent user login to any node after Kerberos authotization - NFS (Network File System) in conjuction with NIS, allows users to access their files from any node in the cluster. - autofs / automount - allows the remote filesystems to be mounted on demand and unmounted when idle to prevent stale/locked NFS mounts. - 'format_mount.pl' - detects, partitions, mkswap's, and mkfs's the node disk to allow 'swap' and '/scratch' to be made & used. For our cluster, we are using the campus 'Kerberos' server for authorization - ie, the Perceus server is neither the Kerberos server nor the NIS/NFS server, so we can make use of those external services without configuring the Perceus server to supply these additional services; it just has to be configured to consume these services. To do this, you'll needed these additional packages (and dependencies). ---------------------------------------------------------------- krb5-clients autofs5 libpam-krb5 gparted nis krb5-kdc binutils #install them all with .. sudo apt-get install krb5-clients libpam-krb5 nis autofs5 krb5-kdc binutils parted ---------------------------------------------------------------- // the NIS domain we want to join is 'YP.bduc.uci.edu' // see bduc-login:/etc/yp.conf // the krb5 realm is 'UCI.EDU' // the kerberos kdc server is 'kerberos.service.uci.edu' // the kerberos admin_server is 'kerberos.service.uci.edu' And these configuration files for your cluster (BDUC in our case) // if there's a choice, use the ones from ubuntu 10.04 // /etc/network ?? this is a directory Wl@1wc,ic,92617 Files from a NIS/NFS/Kerberos client in your cluster --------------------------------------------------------- /etc/yp.conf (from a NIS client) /etc/ypserv.conf /etc/nsswitch.conf /etc/krb5.conf /etc/autofs_ldap_auth.conf /etc/auto.master /etc/auto.misc # not used by BDUC /etc/auto.net # not used by BDUC /etc/auto.smb # not used by BDUC --------------------------------------------------------- These simply need to be copied to the same position on the Perceus server (after carefully making backups of the originals). This will allow us to access our campus LDAP server for login information and automount BDUC's user and application filesystems. Once the files are backed up and copied to the Perceus Server, the services have to be started. ----------------------------------------------------------------------------- # start the NIS services sudo /etc/init.d/nis start # the following line initializes the local Kerberos database. # REMEMBER the password you set!! (should only need to be done the 1st time) kdb5_util create -s # and then start the krb5 services. /etc/init.d/krb5-kdc start ----------------------------------------------------------------------------- Sun Grid Engine requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Perceus Master ^^^^^^^^^^^^^^ For the Perceus master to be included usefully in the SGE domain, it must: - automount the exported SGEROOT - have the binutils installed (see above) - be added as an execution host to SGE - be added as a submission and/or admin host to SGE - for SGE jobs to be sent to the node, it has to be added to a host group which is servicing jobs (ie: @long). Perceus Clients ^^^^^^^^^^^^^^^ For the Perceus client nodes to be included in the SGE domain, the vnfs module has to include the same configurations. The same remote NFS mounts have to be automounted. /home (mounted over existing /home, if need be) /sge52 /apps (automounted on request) NFS access to cluster files ~~~~~~~~~~~~~~~~~~~~~~~~~~~ To access your cluster files from the Perceus server, you'll need the help of a BDUC admin to modify the '/etc/exports' file on all NFS servers that supply files to BDUC (bduc-login, bduc-sched). The file needs to be edited to allow the Perceus server to mount the exported files. Don't forget to 'exportfs' the configuration on the NFS servers. Finally, test whether our Perceus server is connected to our NIS master on BDUC (bduc-sched) by executing the following on the Perceus server: ----------------------------------------------------------------------------- ypcat passwd ----------------------------------------------------------------------------- This command listed login information and directories of all users in the BDUC cluster, so it was a success! Once you restart the networking on the Perceus server, you should be able to ssh to the Perceus main server with your UCINetID username and password and be able to read/write your BDUC files as if you were logged into BDUC. We're now done with the Perceus server; now onto the client nodes. Configuring the Perceus Clients ------------------------------- In order for the Perceus clients to gain these same abilities, the above configuration files have to be copied to the chrooted, live-mounted VNFS image in the same location (/etc/..), and the same debs have to be chroot-installed into that image as described on pages 22-23 in the http://altruistic.infiscale.org/docs/perceus-userguide1.6.pdf[Perceus User Guide]. Mount the VNFS ~~~~~~~~~~~~~~ On the Perceus server, we have to mount the VNFS image using ----------------------------------------------------------------------------- sudo perceus vnfs mount gravityos-base # or the name of the VNFS pkg used ----------------------------------------------------------------------------- The mount directory is '/mnt/gravityos-base' We'll have to 'chroot' into the directory so we can install packages in the now-live image. ----------------------------------------------------------------------------- sudo chroot /mnt/gravityos-base ----------------------------------------------------------------------------- Install & Configure the Debs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As with the server, here are the packages we'll need to install for the VNFS image: ---------------------------------------------------------------- krb5-kdc nis krb5-clients autofs5 libpam-krb5 gparted #install them all (still in the chroot) with .. sudo apt-get install krb5-kdc krb5-clients libpam-krb5 nis \ autofs5 gparted # and exit the chroot exit ---------------------------------------------------------------- // ?? do we have to explicitly exit the chroot here ?? Repeat the NIS and Kerberos configuration file copying as described in the production setup above. Essentially, you have to copy those file from the Perceus server to the VNFS image. ---------------------------------------------------------------- cd /etc sudo cp yp.conf ypserv.conf nsswitch.conf krb5.conf autofs_ldap_auth.conf auto.master /mnt/gravityos-base/etc # have to check that the autofs file is chmod'ed correctly # owner (root) can rw sudo chmod u+rw /mnt/gravityos-base/etc/autofs_ldap_auth.conf # group and other can't rwx sudo chmod og-rwx /mnt/gravityos-base/etc/autofs_ldap_auth.conf ---------------------------------------------------------------- Once those files are copied and you verify that the init scripts are in place in the VNFS, you have to push those changes to the nodes. This can be done via the 'livesync' option or by the entire export/reboot process. The 'livesync' is much faster and involves using ssh and rsync to push all changes to the nodes while they're still live. ---------------------------------------------------------------- sudo perceus vnfs livesync gravityos-base ---------------------------------------------------------------- Eventually, you'll have to export the VNFS (which results in a significant delay as the image has to be compressed) and then reboot your test client to verify that it works from the ground state. ---------------------------------------------------------------- sudo perceus vnfs export gravityos-base /path/to/gravityos-base_.vnfs ---------------------------------------------------------------- As above, check that after a network restart, the client node can automatically communicate to the campus Kerberos server and the BDUC NIS/NFS servers. Automatic disk processing ~~~~~~~~~~~~~~~~~~~~~~~~~ The next step is to convince the 'format_mount.pl' script to execute during boot time by incorporating it into the init script sequence. Get the file http://moo.nac.uci.edu/~hjm/format_mount.pl[format_mount.pl] and chmod it so it becomes executable. ----------------------------------------------------------------------------- chmod +x format_mount.pl ----------------------------------------------------------------------------- Copy it to the /bin directory of the VNFS image. (You'll have to re-mount the image if you've exported it). ----------------------------------------------------------------------------- sudo perceus vnfs import /path/to/gravityos-base.vnfs sudo cp /path/to/format_mount.pl /mnt/gravityos-base/bin ----------------------------------------------------------------------------- Now we need to edit '/etc/rc.local' to pass in arguments to the script so it can run during boot time. It'll run quite late, but only the user will be accessing the swap and scratch partitions so it won't affect other processes. Add this line to the VNFS's '/etc/rc.local' file ('/mnt/gravityos-base/etc/rc.local', right?). Make sure it's above the line that executes 'exit 0'. ----------------------------------------------------------------------------- ... # we determined that the VNFS detected the disk as '/dev/sda' by examining # dmesg output on 1st boot. format_mount.pl sda 8000 xfs NODEBUG exit 0 ----------------------------------------------------------------------------- Now save and compress the modified image by unmounting the VNFS image: ----------------------------------------------------------------------------- sudo perceus vnfs umount gravityos-base ----------------------------------------------------------------------------- Once that's finished, reboot the stunt node to test, and if it appears with new 'swap' and a '/scratch' dir, the production setup is complete. //Anthony edit begin //hjm edit Integrating Perceus with the existing BDUC Environment ------------------------------------------------------- We have a production cluster (BDUC) and can't bring it down for several days to re-do it as a Perceus cluster. We are therefore integrating a small (25 node) Perceus cluster with the existing CentOS cluster and when we've debugged it to the point where it behaves, we'll flip the entire cluster over a weekend day. As noted, we already have a small Ubuntu-based sub-cluster integrated with the subcluster, so using a Debian-based distro won't be a completely new experience. Our new Perceus master server, _claw1_ has the _Ubuntu_ 10.04 distribution installed. We've already installed *Perceus 1.6* from using the Debian package downloaded from the http://www.perceus.org/site/html/download.html[main website]. Integrating Perceus into our production server is not hard. Using the tutorial/process above, we discuss the differences below. New Applications required ~~~~~~~~~~~~~~~~~~~~~~~~~ We needed to install TCL to support http://modules.sf.net[Modules]: ------------------------------------------------------ # as root perceus vnfs mount percdebian chroot /mnt/percdebian apt-get install tcl8.3 ------------------------------------------------------ Hardware Changes ~~~~~~~~~~~~~~~~ There are very few hardware changes need to support Perceus. The only notable changes are to set the BIO to request a netboot on the correct interface (the source of a number of initial errors) and to change the BIOS Chipset section so that the EDAC system will record ECC errors. This is BIOS-specific, so we'll not go into this in depth. Perceus Configuration file changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /etc/perceus/perceus.conf ^^^^^^^^^^^^^^^^^^^^^^^^^ Originally, the controlport and VNFS transfer master were set to localhost. Since claw1 has both a public and private IP address, we don't want Perceus to use the wrong ethernet port. Thus we explicitly specified the localhost with the private network IP address. ------------------------------------------------------ master network device = eth0 vnfs transfer method = nfs vnfs transfer master = 10.255.78.5 vnfs transfer prefix = database type = btree database server = localhost database name = perceus database user = db user database pass = db pass node timeout = 600 controlport bind address = 10.255.78.5 controlport allow address = 10.255.78.5 ------------------------------------------------------ /etc/perceus/default.conf ^^^^^^^^^^^^^^^^^^^^^^^^^ ------------------------------------------------------ Node Name = n### Group Name = Vnfs Name = perdebian Enabled = 1 First Node = 101 ------------------------------------------------------ /etc/perceus/dnsmasq.conf ^^^^^^^^^^^^^^^^^^^^^^^^^ This file is generated after running *sudo perceus init*. Shouldn't have to modify anything here, besides the dhcp-range, if needed. ------------------------------------------------------ interface=eth0 enable-tftp tftp-root=/usr/var/lib/perceus/tftp dhcp-option=vendor:Etherboot,60,"Etherboot" dhcp-boot=pxelinux.0 local=/ domain=bduc expand-hosts dhcp-range=10.255.78.100,10.255.78.254 dhcp-lease-max=21600 read-ethers ------------------------------------------------------ Note that more DNS changes are link:#DNS[noted below] /etc/fstab ^^^^^^^^^^ We have to configure the VNFS capsule to contact 10.255.78.5, the private IP address of 'claw1'. Following the tutorial above, mount the VNFS capsule, chroot into the directory, and edit the '/etc/fstab' file. We need 2 NFS mounts from the Perceus master, the shared Perceus lib and the master's '/usr' tree to provide apps and libs without increasing the size of the VNFS The modifications should be similar to this: ------------------------------------------------------ # the perceus shared dir 10.255.78.5:/usr/var/lib/perceus /usr/var/lib/perceus nfs ro,soft,bg 0 0 # the claw1 /usr tree to share apps, libs with nodes (see text) 10.255.78.5:/usr /u nfs ro,soft,bg 0 0 0 0 ------------------------------------------------------ These are 'permanent mounts' with the accompanying pros (easy) and cons (will fail if claw1 NFS server locks up). We may switch to the 'automount' process that we use for most other NFS mounts on the cluster if we have trouble with the permanent mounts. /etc/profile ^^^^^^^^^^^^ We have to add some paths in the above '/u' to allow the Perceus nodes to find the apps/libs it provides. ------------------------------------------------------ # near top of file if [ "`id -u`" -eq 0 ]; then PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/u/bin:/u/local/bin:/u/sbin" else PATH="/usr/local/bin:/usr/bin:/bin:/u/bin:/u/local/bin" fi export PATH # ... set other non-Perceus-related things # then set the LD_LIBRARY_PATH to direct to NFS mounts export LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib:/u/lib:/u/local/lib:/opt/lib:$LD_LIBARY_PATH # and do the same thing for a variety of other ENV-related variables (non-exhaustive) export PERL5LIB=/u/share/perl5 export PYTHONPATH=/u/lib/python2.6 ------------------------------------------------------ *SGE* wasn't automounting at all. We've discovered the problem to be that the nodes couldn't contact _bduc-sched_ which controls the automount feature for SGE. It seems that the Perceus client nodes were trying to contact bduc-sched's through its public IP address which they cannot "see". To remedy this, we had to modify the /etc/hosts file and define bduc-sched to be its private IP address. /etc/hosts ^^^^^^^^^^ To allow 'SGE' to automount correctly from 'bduc-sched', we had to add the private IP number to the nodes' '/etc/hosts' file, along with 'claw1'. ------------------------------------------------------ 127.0.0.1 localhost.localdomain localhost 10.255.78.5 bduc-claw1.nacs.uci.edu claw1 10.255.78.3 bduc-sched.nacs.uci.edu bduc-sched sched ------------------------------------------------------ VNFS Changes ~~~~~~~~~~~~ To use our preconfigured VNFS capsule from the remote Perceus install, we had to move the VNFS to the new Perceus master, 'claw1' There are two ways to do this. The first and *recommended* approach is to log onto the old Perceus master and export the VNFS capsule using *sudo perceus vnfs export*. Then copy the file onto the new Perceus master server and import it using *sudo perceus vnfs import*. The other approach is to simply tar up the '/etc/perceus/vnfs/VNFSNAME' directory, copy it into the new Perceus master server, and extract it (as root) in the /usr/var/lib/perceus/vnfs (symlinked to '/etc/perceus/vnfs') directory. For our setup, we went with the tarball and it seems to have worked correctly. Environment and - modify vnfs 'ld.so.conf.d' to provide info about the new libs. - modify vnfs PATH to include '/u/bin', '/u/local/bin' - modify nvfs ENV variables for 'locale', etc. - change the '/etc/apt/sources.list' to use the same Ubuntu sources as the master. - add and master node's root public keys to the VNFS's '/root/.ssh/authorized_keys' so that you'll be able to ssh in without a password. Symbolic Link Changes ~~~~~~~~~~~~~~~~~~~~~ We need to modify any claw1 symlinks that use full paths to avoid redirecting to the node '/usr' tree. ------------------------------------------------------ ie: we have to change links like this: ln -s /usr/lib/libblas.so.3 -> /usr/lib/libblas.so.3.0 to this format: cd /usr/lib; ln -s libblas.so.3 -> libblas.so.3 ------------------------------------------------------ Testing all Module applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We now have about 140 'Module'-based apps and libs. Each one has to be run to verify that it works correctly on the new nodes. We suspect that only those that have a specific 'libc' or kernel requirement will fail, but this has to be tested. Those that don't run and can't be addressed with symlinks to existing libs will have to be recompiled. [[DNS]] Named / DNS Changes ~~~~~~~~~~~~~~~~~~~ Perceus was designed to run as a 'homogeneous' cluster. Since we're running it as a 'heterogeneous' cluster, that presents us with some DNS problems. While the Perceus nodes know about each other and can resolve external-to-Perceus hosts, the other hosts in the cluster don't know about the Perceus nodes. In order to allow this to happen, the authoratative DNS server for the cluster (on the 'login' node, not 'claw1') has be explicitly updated with the Perceus node information. '/etc/resolv.conf' on 'claw1' points to 'bduc-login' which is the authoratative nameserver for the cluster. Because of that designation, we have to provide 'bduc-login' with the correct IP# & name mappings so that the other cluster nodes can resolve the Perceus nodes. this is especially important for the SGE scheduler. To this end, we have written a Python script which watches the 'dhcp-leases' file on the Perceus master and re-writes the 'named' database files on 'bduc-login' if there are any changes. Named Modification Script ^^^^^^^^^^^^^^^^^^^^^^^^^ This Python script will: - monitor the Perceus 'dhcpd.leases' file and on a change, will - re-write the 'named' database files '10.255.78.db' and 'bduc.db' on 'claw1' (in '/var/named/chroot/var/named') - make backups of those files on 'bduc-login' - copy the new files into place on 'bduc-login', and - cause 'named' to re-read the configuration files. http://moo.nac.uci.edu/~hjm/PerceusNotifier.py[The script is here.] //Anthony edit end Adding Perceus nodes to SGE ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Many problems associated with SGE are associated with the 'named' problems noted immediately above. The few remaining ToDos are: On the node / VNFS side - add the 'sgeexecd' script to the '/etc/init.d/' dir on the VNFS - in the Perceus VNFS chroot, install the 'sgeexecd' with *update-rc.d sgeexecd defaults* - you 'SHOULD NOT' have to do any modificaton of the node '/etc/hosts' file for SGE. On the SGE side: - add the nodes as 'Execution Hosts' with *qconf -ae