1. Introduction

This HOWTO describes the structure of the HPC mayday script that can be called from a terminal anywhere on the HPC system. It’s meant to provide us with system and context info we need that we often do not get from the initial email.

2. Start it from the problem dir

To record as much relevant info as possible automagically, please cd to the directory where the problem happened and ONLY THEN start this mayday script.

3. Use byobu to open multiple terminals.

If you need another shell window (to type into, or get info from) while you fill out this bug report

  • BEFORE you start the mayday script, type byobu to start the terminal multiplexer.

  • then open another window with ^ac (Ctrl+a, then c)

  • then cd to the problem dir

  • then start the bug report in the current window with mayday

  • then bounce back & forth with ^a ^a. (Ctrl+a, Ctrl+a)

Confused? Here’s a good screen cheatsheet. (byobu is a convenience wrapper around screen).

4. Use termbin for long output.

If the problem code generates pages of output that you think may be relevant, don’t include them in the Bug Report file. Instead, in the other terminal, and use the termbin utility to generate a link and include THAT link in the file.

Template:

problem_command -opt1 -opt2 -opt3 < someinput | nc termbin.com 9999
                                              ^^^^^^^^^^^^^^^^^^^^^

Specific example:

ls -lR | nc termbin.com 9999  # try it!
       ^^^^^^^^^^^^^^^^^^^^^
http://termbin.com/xxxx   <- this is returned; the link to send us

Just append the ^-marked string above to your command and it will return a link to that output. Obviously, DO NOT use it if the output includes sensitive info such as passwords, SSNs, magic incantations. It will be visible to the Internet.

5. What mayday emails to us

mayday collects as much diagnostic real-time info about the user making the bug report as is possible so we don’t spend lots of emails requesting this data in a back and forth.

The value the email contains is described in the [brackets] following the header.

User:  [$USER]
hmangala

Groups: [id $USER]
uid=785(hmangala) gid=200(staff) groups=48(apache),134(stata),140(som),142(biolinux),148(gene),164(stata13),174(comsol),398(fuse),399(x2gousers),401(macvaw),418(clc),423(hjmtest),508(stata14),200(staff)

HOME Quota: [quota $USER]
Disk quotas for user hmangala (uid 785): none

/dfs[12] Quotas (via termbin) [see the script (~line 70) for the BeeGFS commands]
http://termbin.com/5c4c

Working Dir: [pwd]
/data/users/hmangala

Hostname: [hostname]
hpc-s.oit.uci.edu

Uptime: [uptime]
 13:07:12 up 94 days, 59 min, 30 users,  load average: 0.30, 0.19, 0.12

Load Average: [/proc/loadavg]
0.30 0.19 0.12 2/1619 14917

RAM in use: [free -g]
             total       used       free     shared    buffers     cached
Mem:            62         47         15          0          0         43
-/+ buffers/cache:          3         58
Swap:          146          0        146

Modules: [module list]
No Modulefiles Currently Loaded.

Queued Jobs: [qstat -u $USER]
No Queued Jobs

=========================================================
==       Please leave the stanza above intact.         ==
== Expand on your problem as much as you'd like below: ==
== This is freeform text; just type in as much extra   ==
== info as you can and then exit this editor normally. ==
== ie, with 'nano', ^X, then 'y'; other editors differ ==
=========================================================

== What email address you'd prefer us to use, if not [hmangala@uci.edu]


== What OS are you connecting from?  (Windows,Mac,Linux)


== Are you connecting from ON-CAMPUS or from OFF-CAMPUS?
   If OFF, thru the campus VPN or via an on-campus machine?


== What application are you using to connect with? (Terminal, putty, x2go, etc)


== Overall Problem:


== Specific scripts, programs or files causing the problem:
   (if the files aren't in this dir, please provide their FULL PATH.)


== EXACT Program execution line causing the problem or error:


== EXACT Error Output:
(Please paste in the error output if short, or provide a 'termbin' link
to it as described in the HOWTO <https://goo.gl/ARIQRP>)