= Harry's HPC tools Harry Mangalam v1.0 June 20, 2019 :icons: // fileroot="/home/hjm/nacs/hpc/hjm-apps"; asciidoc -a icons -a toc2 -a toclevels=3 -b html5 -a numbered ${fileroot}.txt; scp ${fileroot}.html ${fileroot}.txt moo:~/public_html Here are a listing of some non-trivial tools I've written for HPC. Or trivial tools that are still useful. == User Tools These can be useful for mere users. Some have additional features for root. === mayday http://moo.nac.uci.edu/~hjm/hpc/HPC-Mayday.html[mayday] is a tool for users to report issues with HPC. It autocollects as much info as possible and then allows the user to report the issue, filling in the bits that we typically need to know. (informational, generates email) === dfsquotas 'dfsquotas' will dump the help. Improved by Francisco, shows user group quotas on any of the /dfs[123] systems, as well as the selective backup filesystem (/sbak, available to users on the interactive node 'compute-1-13') and allows root to change them simultaneously. Installed in /data/hpc/bin (informational) === qbetta A merge of the more useful parts of the SGE 'qhost' and 'qstat' as well as some additional info. Anything after the command is used as a regex to filter the output. So "'qbetta bigmem'" will filter the output for any line containing 'bigmem'. If you want to get fancy, use it alone and then pipe output to your favorite grep. (informational) === clusterload Utility to show how loaded the cluster is (using the admittedly bad measures of loadavg and hyperthreaded cores) and then cycle thru each of the nodes in loaded order to show what's running via 'top'. (informational) === profilemyjobs & pmj http://moo.nac.uci.edu/~hjm/hpc/profilemyjobs/profilemyjobs.html[profilemyjobs & pmj] a utility and wrapper for long-run system profiling of apps and complex workflows. The 'pmj' wrapper can be used to submit the profiling with a batch job, and visualize the results afterwards. Or you can viz the results realtime with an auto-refreshing gnuplot. (informational, can generate large logs) === scut & cols http://moo.nac.uci.edu/~hjm/scut_cols_HOWTO.html[scut] is a more flexible (but slower - Perl) combination of 'cut' and 'join'. 'cols' is like column/columns, but easier to use and provides more usability in terms of which columns are which. (informational) === stats stats will provide most https://en.wikipedia.org/wiki/Descriptive_statistics[Descriptive Statistics] of anything numeric fed to it, as well as 95% conf intervals. It will also do many transformations on the data that can be either used internally or emitted to be used in other apps - log10, ln, sqrt, x^2, x^3, 1/x, sin, cos, tan, asin, acos, atan, round, abs, exp, pass(thru), trunc (integer part), frac (decimal part). Fairly trivial code, but I use it 20x a day tp get a sense of a large numeric input. (informational) === the qdirstat tools https://github.com/shundhammer/qdirstat[qdirstat] is one of the best filesystem visualizers available. https://github.com/shirosaidev/diskover[diskover] may be better, but I haven't evaluated it much. Regardless, qdirstat is much easier to use and also comes with a great Perl filesystem recurser 'qdirstat-cache-writer' that is both easy to use and modify. The following are tools that I've written that allow you to add functionality to the qdirstat interface - see a doc called http://moo.nac.uci.edu/~hjm/kdirstat/kdirstat-for-clusters.html[kdirstat for Clusters]. Unfortuantely, kdirstat (now qdirstat) was written as a personal tool so adding this functionality is on a user-by-user basis, unless someone wants to mod the original tool. I've added this functionality for root on 'login-1-2' and 'nas-7-1' - 'kds-qsub-tarchive.sh' - a tarchiver that submits jobs to the staff Q on nas-7-1. Provides a fair amount of checking and oversight for avoiding catastrophe, but certainly not foolproof. (creates possibly large tarchives). - 'kds-pigzem.sh' - utility to use pigz to compress individual compressible files in situ. - 'kds-askLDAP.sh' - answers the question 'who owns this file and are they still at UCI?' - 'kds-cleanup.sh' - identifies several types of highly compressible (mostly bioinfo) files and offers to tarchive them. === find/gzip compressibles These 2 Perl scripts find and then can compress defined types of files. Largely superceded by 'kds-cleanup.sh' === parsyncfp https://github.com/hjmangalam/parsyncfp[parsyncfp] is a parallel wrapper for rsync to overcome TCP delays and speed up recursive indexing via the 'fpart' file chunker. Special options for GPFS and file lists. (nondestructive; filters any '--delete' rsync options) === tacg https://github.com/hjmangalam/tacg[tacg] is a 'grep' for nucleic acids, as well as generating restriction cuts, searching for 'patterns of patterns', and lots more. Installed in a module. (informational) == Admin These tools are generally only useful for 'root' users. This listing omits those mentioned above. === clusterfork https://github.com/hjmangalam/clusterfork[clusterfork] (aka cf) installed and working on hpc-s. I think Francisco uses it as well as me. A 'parallel ssh', like pssh, tentakel, etc, but a lot better - automatically arranges the output as similar, identical, deletes zero content. Can generate arbitrary groupings of servers and groupings of those groupings and listings thereof. (can issue any command and therefore can be enormously destructive; be careful what you ask for) === userstatus Not quite production-ready, but provides a good overview of a user being considered for removal. Installed in /data/hpc/bin. Checks and reports on: - last login stats - LDAP status - group membership (with overlaps filtered out) - data on all known main and NFS mounts and latest files in each - latest 10 files in HOME - optional listing of full HOME tree avail as a termbin link - optional tarchiving of HOME and any other dirs - optional deletion of user - optional generation of email to send to PI about what to do with data. (informational) === modulizer Along with the template '/data/modulefiles/module.skeleton', provides most of the steps to set up a environment module file, including instructions and rarely but sometimes critical additional steps. (creates module files) === wipentest.sh A small script to zero and SMART-test disks when verifying new disks or recovering them from EOL'ed arrays. Installed in /data/hpc/bin (tho it tries to warn approprioately, could completely destroy a disk; be very, VERY careful) === rdma-tcp-stat utility like 'ifstat' to view both TCP and RDMA bytes in the same streaming format. Installed in /data/hpc/bin (informational) === slowjobs A near-trivial, but useful script to identify all slow jobs on the cluster. A "slow" job is one which is using less than 80% of a CPU. Bits later incorporated into qbetta, but this is still useful. (informational; generates email)