HPC Admin HOWTOs ================ by Harry Mangalam , Joseph Farran , Adam Brenner v1.02 - Feb 10, 2014 :icons: // last ed by hjm - HPC - shutdown notes //Harry Mangalam mailto:harry.mangalam@uci.edu[harry.mangalam@uci.edu] // Harry's copy (jf, aeb, add your own stanzas if you want it to be integrated into the main source). // this file is converted to the HTML via the command: // fileroot="/home/hjm/nacs/HPC-ADMIN-HOWTO"; asciidoc -a icons -a toc2 -b html5 -a numbered ${fileroot}.txt; scp ${fileroot}.html ${fileroot}.txt moo:~/public_html; // scp ${fileroot}.html ${fileroot}.txt root@hpc.oit.uci.edu:/data/hpc/www; // if at home // ssh -t moo 'scp ~/public_html/HPC-ADMIN-HOWTO* root@hpcs:/data/hpc/www/' // or on HPC, convert it in-place // fileroot="/data/hpc/www/HPC-ADMIN-HOWTO"; asciidoc -a icons -a toc2 -a numbered -b html5 ${fileroot}.txt // don't forget that the HTML equiv of '~' = '%7e' // asciidoc cheatsheet: http://powerman.name/doc/asciidoc // asciidoc user guide: http://www.methods.co.nz/asciidoc/userguide.html == Introduction The text source of this doc is http://hpc.oit.uci.edu/HPC-ADMIN-HOWTO.txt[here]. Download, edit, and then see the text source at top for how to process it and put it back in place. === Basic Cluster info It is really REALLY easy to break ROCKS with one wrong update or configuration change. In the spirit of making life easier for us all and to keep mishaps to a minimum, and to not have to learn all of ROCKS XML ways of doing things, I am off loading as much as possible unto shell scripts which we can all follow and understand on the new cluster. Obviously there are some things which can only be done via the ROCKS XML setup, like disk partitioning, etc, so those we have have no choice but to follow the ROCKS method. There is plenty of http://www.rocksclusters.org/rocks-documentation/4.1/getting-started.html[ROCKS documentation] so you can reference the ROCKS manuals for any ROCKS type of questions. - as of Oct 31, 2013, we are running ROCKS version XXXX. - CentOS 6.4 on all compute nodes - Kernel: + --------------------------------------------------------------- # | kernel version + ----------------------------------------------- 26 2.6.32-358.18.1.el6.centos.plus.x86_64 + 40 2.6.32-358.18.1.el6.x86_64 + 1 2.6.32-358.6.2.el6.x86_64 + --------------------------------------------------------------- - the Gluster nodes are running Scientific Linux release 6.2. - the Fhgfs nodes are running CentOS release 6.3 [[hpcclustershutdown]] === HPC Cluster Shutdown When shutting down the cluster, if possible, it's useful to take it down in stages to assure that things get shutdown neatly so things can start up again without errors. The order for shutting down the cluster is: - if possible, notify users via email, 'motd', and 'wall' that the system will be going down. Repeat the motd and wall messages several times, - at the point of shutdown, set the /etc/nologin file on the login nodes to prevent new users from logging in - Stop the SGE system to allow jobs to be suspended if possible, and checkpointing to write out files as necessary. *Joseph, please expand this.* - Shut down the compute nodes: --------------------------------------------------------------- # via clusterfork on hpc-s as root cf --tar=allup 'sync; sync; poweroff' # queries qhost to determine which nodes are up and powers them down # via tentakl # Joseph? --------------------------------------------------------------- - shut down the login nodes: --------------------------------------------------------------- # via clusterfork on hpc-s as root cf --tar=LOGINS 'sync; sync; poweroff' # queries qhost to determine which nodes are up and powers them down # via tentakl # Joseph? --------------------------------------------------------------- - unmount extra FSs from remaining nodes (nas-7-1) --------------------------------------------------------------- # exec from from hpc-s ssh nas-7-1 'sync; sync; umount /gl; umount /data sync; sync; umount /gl; umount /ffs --------------------------------------------------------------- - shut down the gluster FS --------------------------------------------------------------- ssh -t bs1 'gluster volume gl stop' cf --tar=GLSRV '/etc/init.d/glusterd stop # if it looks like the gluster system has shut down smoothly, shut them all off cf --tar=GLSRV 'sync; sync; poweroff' --------------------------------------------------------------- - shut down the Fraunhofer FS --------------------------------------------------------------- cf --tar=FHSRV '/etc/init.d/fhgfs-storage stop;' ssh -t fs0 '/etc/init.d/fhgfs-meta stop; /etc/init.d/fhgfs-mgmtd stop; /etc/init.d/fhgfs-admon stop;' # if it looks like the fhgfs system has shut down smoothly, shut them all off cf --tar=ALLFH 'sync; sync; poweroff' --------------------------------------------------------------- - unmount /data from hpc-s --------------------------------------------------------------- umount /data --------------------------------------------------------------- - poweroff all the NAS machines --------------------------------------------------------------- cf --tar=ALLNAS 'sync; sync; poweroff' # or tentakel ?? --------------------------------------------------------------- - shut down hpc-s, then bduc-login, and dabrick --------------------------------------------------------------- sync; sync; poweroff # from bduc-login (often using as ssh proxy) sudo bash #then sync; sync; poweroff ssh root@dabrick 'sync; sync; poweroff' --------------------------------------------------------------- - Now unplug everything at the PDU circuit level. Pull up the tiles, and uncouple the plugs from the main PDU feeds. == HPC CLuster Startup Essentially the reverse of the section above. == Directories [width="100%",cols="3,7",options="header"] |======================================================================================== | Directory Path | What lives here | /data/apps | all (most) software programs on the cluster which is made available to all compute nodes. | /data/apps/compilations | compile scripts. These are the scripts which have all of the instructions to compile a program from source code. I have several (20) examples available there. If possible, please keep the same format. | /data/apps/sources | all source codes. | /data/shell-syswide-setup | shell scripts that are read by all users on the cluster at login. We need these scripts in order to easily make changes to the user base that have system wide impact. | /data/node-setup | all scripts that configure the compute nodes. For example, script "setup-ipmi.sh" sets up IPMI on each the node. The main script is "node-setup.sh" - this calls all other scripts as needed. Each script has plenty of internal documentation. | /data/system-files | Location for system files like kernels, OFED drivers, etc. | /data/head-node-scripts | self explanatory. | /data/modulefiles | environment module files, segregated by type of code. Use the file '/data/modulefiles/apps/module.skeleton' as a template for creating new application modules for similar behavior. | /data/download | downloaded software. Any RPM, tar file, etc, that you download and is needed to compile/configure the cluster, please leave a copy in here. This makes it easy to get a copy instead of hunting it down later when needed. | /data/users | Home directories for all users. |======================================================================================== == ROCKS and Imaging === IPMI with the compute nodes You can use IPMI if the node hardware supports it (all 64-core nodes support it) to reboot a node remotely, or to power cycle the node if it is stuck with a kernel panic for example. Here are some commands for IPMI. Ask Joseph for the password. Using compute-1-1 as an example: ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P power status ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P chassis status ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P sensor ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P power off ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P power on ipmitool -I lan -H compute-1-1.ipmi -U ADMIN -P power cycle === HOWTO Image a node * Joseph, this was copied over from your admin guide, but it's a little unclear how to do this* Below are some basic instructions on how to image a node on the HPC cluster. When imaging an HPC Rack, start at the bottom working up. Most compute node names on HPC have the format of "compute-r-n" where "r" is the rack number and "n" is the compute number. (This formal numbering system has decayed recently, but may be re-established once the 'Green Planet Transition' is past.) Set the BIOS to boot from the network first, then the local drive. Also, set the BIO to power up on a power failure. Let's say you want to Image Rack #5 which has 20 nodes. You log into 'hpc-s.oit.uci.edu', become root and enter: insert-ethers --rack=5 --rank=1 Now select "compute node". The "rack=5" is rack #5 and "rank=1" is compute #1. The resulting name after the node is imaged will be named "compute-5-1". Start with the bottom node working your way up. Power up one node and WAIT until Rocks recognizes the node you powered up having the correct MAC address. Once you verify this, you can proceed to power the next one and repeat the proces. If you need to remove a node, for example, say compute-5-1, use: insert-ethers --remove compute-5-1 The main node configuration script is: '/data/node-setup/node-setup.sh', a shell script that is called by "/etc/init.d/node-first-boot-setup" by all nodes on first boot. 'node-setup.sh' configures the node with our setup, reboots the node and then the node becomes ready for computation on the cluster. To re-image a compute node on the cluster, from the HPC-S node (compute-1-1 as the example target): rocks set host boot compute-1-1 action=install rocks run host compute-1-1 reboot === HOWTO add default apps to the reimage script To be filled. == SGE To be filled. === Startup and shutdown To be filled. === Common Misbehaviors To be filled. === Upgrading To be filled. == Networking To be filled. === Ethernet To be filled. === Infiniband To be filled. == Storage & Filesystems To be filled. === RobinHood To be filled. === Gluster To be filled. ==== Support / Help To be filled. ==== bring down/up gluster To be filled. ==== Gluster client installs To be filled. ==== Upgrading To be filled. === Fraunhofer To be filled. ==== Support / Help To be filled. ==== bring down/up Fhgfs To be filled. ==== Fhgfs client installs To be filled. ==== Monitoring with 'admon' To be filled. ==== Upgrading To be filled. === GPFS To be filled. ==== Testing Notes To be filled. === NFS To be filled. ==== Automount scripts To be filled. ==== Common misbehaviors To be filled. === Local RAIDs To be filled. ==== mdadm scripts and checks To be filled. ==== hardware RAID and scripts To be filled. === RAID checks To be filled. == Environment Modules We are using modules version 3.29 in order to easily provide and setup the software environment for various types of software packages and also to easily support different shell environments such as bash, ksh, zsh, sh, csh, tcsh, etc. One of the things I have always liked (and many users as well) in any computing environment, is to have a set of default software available. Nothing worse than logging into a computing system and have nothing / minimal software available. So I created a module called "Cluster_Defaults" and what this module does, is that it will load up a set of non-conflicting software for the user on login. Users will also be able to be excluded from having the "Cluster_Defaults" being loaded, but it will be the default. This also enables us to automatically upgrade all users software packages by simply changing one module. So when a new version of PGI Compilers comes along, we update "Cluster_Defaults" and all users will get the latest PGI compiler automatically. The Cluster modules are located in the following locations: + /data/modulefiles <-- General module files location. /data/modulefiles/software/Cluster_Defaults <-- "Cluster_Defaults" module. === HOWTO create a module file correctly To be filled. == MPI To be filled. === MPICH To be filled. === OpenMPI To be filled. == Compilers To be filled. === GNU To be filled. === Intel To be filled. === PGC To be filled. == Application Notes To be filled. === R To be filled. === Perl To be filled. === Python To be filled. === Gromacs To be filled. === NAMD To be filled. === AMBER To be filled. === Galaxy // added by AEB - Nov 5 10:06 Galaxy currently runs on compute-3-5 and is stored in /data/apps/galaxy/. Compute-3-5 acts as the frontend to galaxy via nginx as a reverse proxy. Galaxy offically runs on port 8080 and nginx offers a cache for static content back to port 80. Galaxy should be accessed via http://galaxy-hpc.oit.uci.edu/ Galaxy uses PostgreSQL and it is also installed on compute-3-5. ==== Upgrade Galaxy To upgrade galaxy, follow the information here: http://wiki.galaxyproject.org/Admin/Get%20Galaxy#Keep_your_code_up_to_date Make sure the alias of hg is setup correctly. Stop the galaxy service via service galaxy stop. Always backup the code cp -Rv /data/apps/galaxy/dist /data/apps/galaxy/dist-backup