HPC Admin HOWTOs

1. Introduction

The text source of this doc is here. Download, edit, and then see the text source at top for how to process it and put it back in place.

1.1. Basic Cluster info

It is really REALLY easy to break ROCKS with one wrong update or configuration change. In the spirit of making life easier for us all and to keep mishaps to a minimum, and to not have to learn all of ROCKS XML ways of doing things, I am off loading as much as possible unto shell scripts which we can all follow and understand on the new cluster.

Obviously there are some things which can only be done via the ROCKS XML setup, like disk partitioning, etc, so those we have have no choice but to follow the ROCKS method.

There is plenty of ROCKS documentation so you can reference the ROCKS manuals for any ROCKS type of questions.

as of Oct 31, 2013, we are running ROCKS version XXXX.
CentOS 6.4 on all compute nodes
Kernel:

  #    | kernel version +
  -----------------------------------------------
 26     2.6.32-358.18.1.el6.centos.plus.x86_64 +
 40     2.6.32-358.18.1.el6.x86_64 +
  1     2.6.32-358.6.2.el6.x86_64 +

the Gluster nodes are running Scientific Linux release 6.2.
the Fhgfs nodes are running CentOS release 6.3

1.2. HPC Cluster Shutdown

When shutting down the cluster, if possible, it’s useful to take it down in stages to assure that things get shutdown neatly so things can start up again without errors.

The order for shutting down the cluster is:

if possible, notify users via email, motd, and wall that the system will be going down. Repeat the motd and wall messages several times,
at the point of shutdown, set the /etc/nologin file on the login nodes to prevent new users from logging in
Stop the SGE system to allow jobs to be suspended if possible, and checkpointing to write out files as necessary. Joseph, please expand this.
Shut down the compute nodes:

# via clusterfork on hpc-s as root
cf --tar=allup 'sync; sync; poweroff'  # queries qhost to determine which nodes are up and powers them down

# via tentakl
# Joseph?

shut down the login nodes:

# via clusterfork on hpc-s as root
cf --tar=LOGINS 'sync; sync; poweroff'  # queries qhost to determine which nodes are up and powers them down

# via tentakl
# Joseph?

unmount extra FSs from remaining nodes (nas-7-1)

# exec from from hpc-s
ssh nas-7-1 'sync; sync; umount /gl; umount /data
sync; sync; umount /gl; umount /ffs

shut down the gluster FS

ssh -t bs1 'gluster volume gl stop'
cf --tar=GLSRV '/etc/init.d/glusterd stop
# if it looks like the gluster system has shut down smoothly, shut them all off
cf --tar=GLSRV 'sync; sync; poweroff'

shut down the Fraunhofer FS

cf --tar=FHSRV '/etc/init.d/fhgfs-storage stop;'
ssh -t fs0 '/etc/init.d/fhgfs-meta stop; /etc/init.d/fhgfs-mgmtd stop; /etc/init.d/fhgfs-admon stop;'

# if it looks like the fhgfs system has shut down smoothly, shut them all off
cf --tar=ALLFH 'sync; sync; poweroff'

unmount /data from hpc-s

umount /data

poweroff all the NAS machines

cf --tar=ALLNAS 'sync; sync; poweroff'

# or

tentakel ??

shut down hpc-s, then bduc-login, and dabrick

sync; sync; poweroff
# from bduc-login (often using as ssh proxy)
sudo bash
#then
sync; sync; poweroff

ssh root@dabrick 'sync; sync; poweroff'

Now unplug everything at the PDU circuit level. Pull up the tiles, and uncouple the plugs from the main PDU feeds.

2. HPC CLuster Startup

Essentially the reverse of the section above.

3. Directories

Directory Path	What lives here
/data/apps	all (most) software programs on the cluster which is made available to all compute nodes.
/data/apps/compilations	compile scripts. These are the scripts which have all of the instructions to compile a program from source code. I have several (20) examples available there. If possible, please keep the same format.
/data/apps/sources	all source codes.
/data/shell-syswide-setup	shell scripts that are read by all users on the cluster at login. We need these scripts in order to easily make changes to the user base that have system wide impact.
/data/node-setup	all scripts that configure the compute nodes. For example, script "setup-ipmi.sh" sets up IPMI on each the node. The main script is "node-setup.sh" - this calls all other scripts as needed. Each script has plenty of internal documentation.
/data/system-files	Location for system files like kernels, OFED drivers, etc.
/data/head-node-scripts	self explanatory.
/data/modulefiles	environment module files, segregated by type of code. Use the file /data/modulefiles/apps/module.skeleton as a template for creating new application modules for similar behavior.
/data/download	downloaded software. Any RPM, tar file, etc, that you download and is needed to compile/configure the cluster, please leave a copy in here. This makes it easy to get a copy instead of hunting it down later when needed.
/data/users	Home directories for all users.

Directory Path

What lives here

/data/apps

all (most) software programs on the cluster which is made available to all compute nodes.

/data/apps/compilations

compile scripts. These are the scripts which have all of the instructions to compile a program from source code. I have several (20) examples available there. If possible, please keep the same format.

/data/apps/sources

all source codes.

/data/shell-syswide-setup

shell scripts that are read by all users on the cluster at login. We need these scripts in order to easily make changes to the user base that have system wide impact.

/data/node-setup

all scripts that configure the compute nodes. For example, script "setup-ipmi.sh" sets up IPMI on each the node. The main script is "node-setup.sh" - this calls all other scripts as needed. Each script has plenty of internal documentation.

/data/system-files

Location for system files like kernels, OFED drivers, etc.

/data/head-node-scripts

self explanatory.

/data/modulefiles

environment module files, segregated by type of code. Use the file /data/modulefiles/apps/module.skeleton as a template for creating new application modules for similar behavior.

/data/download

downloaded software. Any RPM, tar file, etc, that you download and is needed to compile/configure the cluster, please leave a copy in here. This makes it easy to get a copy instead of hunting it down later when needed.

/data/users

Home directories for all users.

4. ROCKS and Imaging

4.1. IPMI with the compute nodes

You can use IPMI if the node hardware supports it (all 64-core nodes support it) to reboot a node remotely, or to power cycle the node if it is stuck with a kernel panic for example. Here are some commands for IPMI. Ask Joseph for the password. Using compute-1-1 as an example:

ipmitool  -I lan -H compute-1-1.ipmi  -U ADMIN -P <password>  power status
ipmitool  -I lan -H compute-1-1.ipmi  -U ADMIN -P <password>  chassis status
ipmitool  -I lan -H compute-1-1.ipmi  -U ADMIN -P <password>  sensor
ipmitool  -I lan -H compute-1-1.ipmi  -U ADMIN -P <password>  power off
ipmitool  -I lan -H compute-1-1.ipmi  -U ADMIN -P <password>  power on
ipmitool  -I lan -H compute-1-1.ipmi  -U ADMIN -P <password>  power cycle

4.2. HOWTO Image a node

Joseph, this was copied over from your admin guide, but it’s a little unclear how to do this*

Below are some basic instructions on how to image a node on the HPC cluster.

When imaging an HPC Rack, start at the bottom working up. Most compute node names on HPC have the format of "compute-r-n" where "r" is the rack number and "n" is the compute number. (This formal numbering system has decayed recently, but may be re-established once the Green Planet Transition is past.)

Set the BIOS to boot from the network first, then the local drive.

Also, set the BIO to power up on a power failure.

Let’s say you want to Image Rack #5 which has 20 nodes. You log into hpc-s.oit.uci.edu, become root and enter:

insert-ethers --rack=5  --rank=1

Now select "compute node".

The "rack=5" is rack #5 and "rank=1" is compute #1. The resulting name after the node is imaged will be named "compute-5-1".

Start with the bottom node working your way up. Power up one node and WAIT until Rocks recognizes the node you powered up having the correct MAC address. Once you verify this, you can proceed to power the next one and repeat the proces.

If you need to remove a node, for example, say compute-5-1, use:

insert-ethers --remove compute-5-1

The main node configuration script is: /data/node-setup/node-setup.sh, a shell script that is called by "/etc/init.d/node-first-boot-setup" by all nodes on first boot.

node-setup.sh configures the node with our setup, reboots the node and then the node becomes ready for computation on the cluster.

To re-image a compute node on the cluster, from the HPC-S node (compute-1-1 as the example target):

rocks set host boot compute-1-1 action=install
rocks run host compute-1-1 reboot

4.3. HOWTO add default apps to the reimage script

To be filled.

5. SGE

To be filled.

5.1. Startup and shutdown

To be filled.

5.2. Common Misbehaviors

To be filled.

5.3. Upgrading

To be filled.

6. Networking

To be filled.

6.1. Ethernet

To be filled.

6.2. Infiniband

To be filled.

7. Storage & Filesystems

To be filled.

7.1. RobinHood

To be filled.

7.2. Gluster

To be filled.

7.2.1. Support / Help

To be filled.

7.2.2. bring down/up gluster

To be filled.

7.2.3. Gluster client installs

To be filled.

7.2.4. Upgrading

To be filled.

7.3. Fraunhofer

To be filled.

7.3.1. Support / Help

To be filled.

7.3.2. bring down/up Fhgfs

To be filled.

7.3.3. Fhgfs client installs

To be filled.

7.3.4. Monitoring with admon

To be filled.

7.3.5. Upgrading

To be filled.

7.4. GPFS

To be filled.

7.4.1. Testing Notes

To be filled.

7.5. NFS

To be filled.

7.5.1. Automount scripts

To be filled.

7.5.2. Common misbehaviors

To be filled.

7.6. Local RAIDs

To be filled.

7.6.1. mdadm scripts and checks

To be filled.

7.6.2. hardware RAID and scripts

To be filled.

7.7. RAID checks

To be filled.

8. Environment Modules

We are using modules version 3.29 in order to easily provide and setup the software environment for various types of software packages and also to easily support different shell environments such as bash, ksh, zsh, sh, csh, tcsh, etc.

One of the things I have always liked (and many users as well) in any computing environment, is to have a set of default software available. Nothing worse than logging into a computing system and have nothing / minimal software available. So I created a module called "Cluster_Defaults" and what this module does, is that it will load up a set of non-conflicting software for the user on login.

Users will also be able to be excluded from having the "Cluster_Defaults" being loaded, but it will be the default. This also enables us to automatically upgrade all users software packages by simply changing one module. So when a new version of PGI Compilers comes along, we update "Cluster_Defaults" and all users will get the latest PGI compiler automatically.

The Cluster modules are located in the following locations:

/data/modulefiles  <-- General module files location.
/data/modulefiles/software/Cluster_Defaults   <-- "Cluster_Defaults" module.

8.1. HOWTO create a module file correctly

To be filled.

9. MPI

To be filled.

9.1. MPICH

To be filled.

9.2. OpenMPI

To be filled.

10. Compilers

To be filled.

10.1. GNU

To be filled.

10.2. Intel

To be filled.

10.3. PGC

To be filled.

11. Application Notes

To be filled.

11.1. R

To be filled.

11.2. Perl

To be filled.

11.3. Python

To be filled.

11.4. Gromacs

To be filled.

11.5. NAMD

To be filled.

11.6. AMBER

To be filled.

11.7. Galaxy

Galaxy currently runs on compute-3-5 and is stored in /data/apps/galaxy/. Compute-3-5 acts as the frontend to galaxy via nginx as a reverse proxy. Galaxy offically runs on port 8080 and nginx offers a cache for static content back to port 80. Galaxy should be accessed via http://galaxy-hpc.oit.uci.edu/

Galaxy uses PostgreSQL and it is also installed on compute-3-5.

11.7.1. Upgrade Galaxy

To upgrade galaxy, follow the information here: http://wiki.galaxyproject.org/Admin/Get%20Galaxy#Keep_your_code_up_to_date

Make sure the alias of hg is setup correctly. Stop the galaxy service via service galaxy stop. Always backup the code cp -Rv /data/apps/galaxy/dist /data/apps/galaxy/dist-backup