1. Introduction
The text source of this doc is here. Download, edit, and then see the text source at top for how to process it and put it back in place.
1.1. Basic Cluster info
It is really REALLY easy to break ROCKS with one wrong update or configuration change. In the spirit of making life easier for us all and to keep mishaps to a minimum, and to not have to learn all of ROCKS XML ways of doing things, I am off loading as much as possible unto shell scripts which we can all follow and understand on the new cluster.
Obviously there are some things which can only be done via the ROCKS XML setup, like disk partitioning, etc, so those we have have no choice but to follow the ROCKS method.
There is plenty of ROCKS documentation so you can reference the ROCKS manuals for any ROCKS type of questions.
-
as of Oct 31, 2013, we are running ROCKS version XXXX.
-
CentOS 6.4 on all compute nodes
-
Kernel:
# | kernel version + ----------------------------------------------- 26 2.6.32-358.18.1.el6.centos.plus.x86_64 + 40 2.6.32-358.18.1.el6.x86_64 + 1 2.6.32-358.6.2.el6.x86_64 +
-
the Gluster nodes are running Scientific Linux release 6.2.
-
the Fhgfs nodes are running CentOS release 6.3
1.2. HPC Cluster Shutdown
When shutting down the cluster, if possible, it’s useful to take it down in stages to assure that things get shutdown neatly so things can start up again without errors.
The order for shutting down the cluster is:
-
if possible, notify users via email, motd, and wall that the system will be going down. Repeat the motd and wall messages several times,
-
at the point of shutdown, set the /etc/nologin file on the login nodes to prevent new users from logging in
-
Stop the SGE system to allow jobs to be suspended if possible, and checkpointing to write out files as necessary. Joseph, please expand this.
-
Shut down the compute nodes:
# via clusterfork on hpc-s as root cf --tar=allup 'sync; sync; poweroff' # queries qhost to determine which nodes are up and powers them down # via tentakl # Joseph?
-
shut down the login nodes:
# via clusterfork on hpc-s as root cf --tar=LOGINS 'sync; sync; poweroff' # queries qhost to determine which nodes are up and powers them down # via tentakl # Joseph?
-
unmount extra FSs from remaining nodes (nas-7-1)
# exec from from hpc-s ssh nas-7-1 'sync; sync; umount /gl; umount /data sync; sync; umount /gl; umount /ffs
-
shut down the gluster FS
ssh -t bs1 'gluster volume gl stop' cf --tar=GLSRV '/etc/init.d/glusterd stop # if it looks like the gluster system has shut down smoothly, shut them all off cf --tar=GLSRV 'sync; sync; poweroff'
-
shut down the Fraunhofer FS
cf --tar=FHSRV '/etc/init.d/fhgfs-storage stop;' ssh -t fs0 '/etc/init.d/fhgfs-meta stop; /etc/init.d/fhgfs-mgmtd stop; /etc/init.d/fhgfs-admon stop;' # if it looks like the fhgfs system has shut down smoothly, shut them all off cf --tar=ALLFH 'sync; sync; poweroff'
-
unmount /data from hpc-s
umount /data
-
poweroff all the NAS machines
cf --tar=ALLNAS 'sync; sync; poweroff' # or tentakel ??
-
shut down hpc-s, then bduc-login, and dabrick
sync; sync; poweroff # from bduc-login (often using as ssh proxy) sudo bash #then sync; sync; poweroff ssh root@dabrick 'sync; sync; poweroff'
-
Now unplug everything at the PDU circuit level. Pull up the tiles, and uncouple the plugs from the main PDU feeds.
2. HPC CLuster Startup
Essentially the reverse of the section above.
3. Directories
Directory Path | What lives here |
---|---|
/data/apps |
all (most) software programs on the cluster which is made available to all compute nodes. |
/data/apps/compilations |
compile scripts. These are the scripts which have all of the instructions to compile a program from source code. I have several (20) examples available there. If possible, please keep the same format. |
/data/apps/sources |
all source codes. |
/data/shell-syswide-setup |
shell scripts that are read by all users on the cluster at login. We need these scripts in order to easily make changes to the user base that have system wide impact. |
/data/node-setup |
all scripts that configure the compute nodes. For example, script "setup-ipmi.sh" sets up IPMI on each the node. The main script is "node-setup.sh" - this calls all other scripts as needed. Each script has plenty of internal documentation. |
/data/system-files |
Location for system files like kernels, OFED drivers, etc. |
/data/head-node-scripts |
self explanatory. |
/data/modulefiles |
environment module files, segregated by type of code. Use the file /data/modulefiles/apps/module.skeleton as a template for creating new application modules for similar behavior. |
/data/download |
downloaded software. Any RPM, tar file, etc, that you download and is needed to compile/configure the cluster, please leave a copy in here. This makes it easy to get a copy instead of hunting it down later when needed. |
/data/users |
Home directories for all users. |
4. ROCKS and Imaging
4.1. IPMI with the compute nodes
You can use IPMI if the node hardware supports it (all 64-core nodes support it) to reboot a node remotely, or to power cycle the node if it is stuck with a kernel panic for example. Here are some commands for IPMI. Ask Joseph for the password. Using compute-1-1 as an example:
ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P <password> power status ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P <password> chassis status ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P <password> sensor ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P <password> power off ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P <password> power on ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P <password> power cycle
4.2. HOWTO Image a node
-
Joseph, this was copied over from your admin guide, but it’s a little unclear how to do this*
Below are some basic instructions on how to image a node on the HPC cluster.
When imaging an HPC Rack, start at the bottom working up. Most compute node names on HPC have the format of "compute-r-n" where "r" is the rack number and "n" is the compute number. (This formal numbering system has decayed recently, but may be re-established once the Green Planet Transition is past.)
Set the BIOS to boot from the network first, then the local drive.
Also, set the BIO to power up on a power failure.
Let’s say you want to Image Rack #5 which has 20 nodes. You log into hpc-s.oit.uci.edu, become root and enter:
insert-ethers --rack=5 --rank=1
Now select "compute node".
The "rack=5" is rack #5 and "rank=1" is compute #1. The resulting name after the node is imaged will be named "compute-5-1".
Start with the bottom node working your way up. Power up one node and WAIT until Rocks recognizes the node you powered up having the correct MAC address. Once you verify this, you can proceed to power the next one and repeat the proces.
If you need to remove a node, for example, say compute-5-1, use:
insert-ethers --remove compute-5-1
The main node configuration script is: /data/node-setup/node-setup.sh, a shell script that is called by "/etc/init.d/node-first-boot-setup" by all nodes on first boot.
node-setup.sh configures the node with our setup, reboots the node and then the node becomes ready for computation on the cluster.
To re-image a compute node on the cluster, from the HPC-S node (compute-1-1 as the example target):
rocks set host boot compute-1-1 action=install rocks run host compute-1-1 reboot
4.3. HOWTO add default apps to the reimage script
To be filled.
5. SGE
To be filled.
5.1. Startup and shutdown
To be filled.
5.2. Common Misbehaviors
To be filled.
5.3. Upgrading
To be filled.
6. Networking
To be filled.
6.1. Ethernet
To be filled.
6.2. Infiniband
To be filled.
7. Storage & Filesystems
To be filled.
7.1. RobinHood
To be filled.
7.2. Gluster
To be filled.
7.2.1. Support / Help
To be filled.
7.2.2. bring down/up gluster
To be filled.
7.2.3. Gluster client installs
To be filled.
7.2.4. Upgrading
To be filled.
7.3. Fraunhofer
To be filled.
7.3.1. Support / Help
To be filled.
7.3.2. bring down/up Fhgfs
To be filled.
7.3.3. Fhgfs client installs
To be filled.
7.3.4. Monitoring with admon
To be filled.
7.3.5. Upgrading
To be filled.
7.4. GPFS
To be filled.
7.4.1. Testing Notes
To be filled.
7.5. NFS
To be filled.
7.5.1. Automount scripts
To be filled.
7.5.2. Common misbehaviors
To be filled.
7.6. Local RAIDs
To be filled.
7.6.1. mdadm scripts and checks
To be filled.
7.6.2. hardware RAID and scripts
To be filled.
7.7. RAID checks
To be filled.
8. Environment Modules
We are using modules version 3.29 in order to easily provide and setup the software environment for various types of software packages and also to easily support different shell environments such as bash, ksh, zsh, sh, csh, tcsh, etc.
One of the things I have always liked (and many users as well) in any computing environment, is to have a set of default software available. Nothing worse than logging into a computing system and have nothing / minimal software available. So I created a module called "Cluster_Defaults" and what this module does, is that it will load up a set of non-conflicting software for the user on login.
Users will also be able to be excluded from having the "Cluster_Defaults" being loaded, but it will be the default. This also enables us to automatically upgrade all users software packages by simply changing one module. So when a new version of PGI Compilers comes along, we update "Cluster_Defaults" and all users will get the latest PGI compiler automatically.
The Cluster modules are located in the following locations:
/data/modulefiles <-- General module files location. /data/modulefiles/software/Cluster_Defaults <-- "Cluster_Defaults" module.
8.1. HOWTO create a module file correctly
To be filled.
9. MPI
To be filled.
9.1. MPICH
To be filled.
9.2. OpenMPI
To be filled.
10. Compilers
To be filled.
10.1. GNU
To be filled.
10.2. Intel
To be filled.
10.3. PGC
To be filled.
11. Application Notes
To be filled.
11.1. R
To be filled.
11.2. Perl
To be filled.
11.3. Python
To be filled.
11.4. Gromacs
To be filled.
11.5. NAMD
To be filled.
11.6. AMBER
To be filled.
11.7. Galaxy
Galaxy currently runs on compute-3-5 and is stored in /data/apps/galaxy/. Compute-3-5 acts as the frontend to galaxy via nginx as a reverse proxy. Galaxy offically runs on port 8080 and nginx offers a cache for static content back to port 80. Galaxy should be accessed via http://galaxy-hpc.oit.uci.edu/
Galaxy uses PostgreSQL and it is also installed on compute-3-5.
11.7.1. Upgrade Galaxy
To upgrade galaxy, follow the information here: http://wiki.galaxyproject.org/Admin/Get%20Galaxy#Keep_your_code_up_to_date
Make sure the alias of hg is setup correctly. Stop the galaxy service via service galaxy stop. Always backup the code cp -Rv /data/apps/galaxy/dist /data/apps/galaxy/dist-backup