Here are a listing of some non-trivial tools I’ve written for HPC. Or trivial tools that are still useful.
1. User Tools
These can be useful for mere users. Some have additional features for root.
1.1. mayday
mayday is a tool for users to report issues with HPC. It autocollects as much info as possible and then allows the user to report the issue, filling in the bits that we typically need to know.
(informational, generates email)
1.2. dfsquotas
dfsquotas will dump the help. Improved by Francisco, shows user group quotas on any of the /dfs[123] systems, as well as the selective backup filesystem (/sbak, available to users on the interactive node compute-1-13) and allows root to change them simultaneously. Installed in /data/hpc/bin
(informational)
1.3. qbetta
A merge of the more useful parts of the SGE qhost and qstat as well as some additional info. Anything after the command is used as a regex to filter the output. So "qbetta bigmem" will filter the output for any line containing bigmem. If you want to get fancy, use it alone and then pipe output to your favorite grep.
(informational)
1.4. clusterload
Utility to show how loaded the cluster is (using the admittedly bad measures of loadavg and hyperthreaded cores) and then cycle thru each of the nodes in loaded order to show what’s running via top.
(informational)
1.5. profilemyjobs & pmj
profilemyjobs & pmj a utility and wrapper for long-run system profiling of apps and complex workflows. The pmj wrapper can be used to submit the profiling with a batch job, and visualize the results afterwards. Or you can viz the results realtime with an auto-refreshing gnuplot.
(informational, can generate large logs)
1.6. scut & cols
scut is a more flexible (but slower - Perl) combination of cut and join.
cols is like column/columns, but easier to use and provides more usability in terms of which columns are which.
(informational)
1.7. stats
stats will provide most Descriptive Statistics of anything numeric fed to it, as well as 95% conf intervals. It will also do many transformations on the data that can be either used internally or emitted to be used in other apps - log10, ln, sqrt, x2, x3, 1/x, sin, cos, tan, asin, acos, atan, round, abs, exp, pass(thru), trunc (integer part), frac (decimal part).
Fairly trivial code, but I use it 20x a day tp get a sense of a large numeric input.
(informational)
1.8. the qdirstat tools
qdirstat is one of the best filesystem visualizers available. diskover may be better, but I haven’t evaluated it much. Regardless, qdirstat is much easier to use and also comes with a great Perl filesystem recurser qdirstat-cache-writer that is both easy to use and modify.
The following are tools that I’ve written that allow you to add functionality to the qdirstat interface - see a doc called kdirstat for Clusters. Unfortuantely, kdirstat (now qdirstat) was written as a personal tool so adding this functionality is on a user-by-user basis, unless someone wants to mod the original tool. I’ve added this functionality for root on login-1-2 and nas-7-1
-
kds-qsub-tarchive.sh - a tarchiver that submits jobs to the staff Q on nas-7-1. Provides a fair amount of checking and oversight for avoiding catastrophe, but certainly not foolproof. (creates possibly large tarchives).
-
kds-pigzem.sh - utility to use pigz to compress individual compressible files in situ.
-
kds-askLDAP.sh - answers the question who owns this file and are they still at UCI?
-
kds-cleanup.sh - identifies several types of highly compressible (mostly bioinfo) files and offers to tarchive them.
1.9. find/gzip compressibles
These 2 Perl scripts find and then can compress defined types of files. Largely superceded by kds-cleanup.sh
1.10. parsyncfp
parsyncfp is a parallel wrapper for rsync to overcome TCP delays and speed up recursive indexing via the fpart file chunker. Special options for GPFS and file lists.
(nondestructive; filters any --delete rsync options)
1.11. tacg
tacg is a grep for nucleic acids, as well as generating restriction cuts, searching for patterns of patterns, and lots more. Installed in a module.
(informational)
2. Admin
These tools are generally only useful for root users. This listing omits those mentioned above.
2.1. clusterfork
clusterfork (aka cf) installed and working on hpc-s. I think Francisco uses it as well as me. A parallel ssh, like pssh, tentakel, etc, but a lot better - automatically arranges the output as similar, identical, deletes zero content. Can generate arbitrary groupings of servers and groupings of those groupings and listings thereof.
(can issue any command and therefore can be enormously destructive; be careful what you ask for)
2.2. userstatus
Not quite production-ready, but provides a good overview of a user being considered for removal. Installed in /data/hpc/bin.
Checks and reports on:
-
last login stats
-
LDAP status
-
group membership (with overlaps filtered out)
-
data on all known main and NFS mounts and latest files in each
-
latest 10 files in HOME
-
optional listing of full HOME tree avail as a termbin link
-
optional tarchiving of HOME and any other dirs
-
optional deletion of user
-
optional generation of email to send to PI about what to do with data.
(informational)
2.3. modulizer
Along with the template /data/modulefiles/module.skeleton, provides most of the steps to set up a environment module file, including instructions and rarely but sometimes critical additional steps.
(creates module files)
2.4. wipentest.sh
A small script to zero and SMART-test disks when verifying new disks or recovering them from EOL’ed arrays. Installed in /data/hpc/bin
(tho it tries to warn approprioately, could completely destroy a disk; be very, VERY careful)
2.5. rdma-tcp-stat
utility like ifstat to view both TCP and RDMA bytes in the same streaming format. Installed in /data/hpc/bin
(informational)
2.6. slowjobs
A near-trivial, but useful script to identify all slow jobs on the cluster. A "slow" job is one which is using less than 80% of a CPU. Bits later incorporated into qbetta, but this is still useful.
(informational; generates email)