= Block Diagram of Initial HPC Backup System
Version 1.0, Feb 14, 2016
:icons:

// fileroot="/home/hjm/nacs/rci-vision/hpc-backup-system"; asciidoc -a icons -a toc2 -a toclevels=3 -b html5 -a numbered ${fileroot}.txt;

== Hardware


Below is a diagram of the system, placed in the ICS DC.  

image:images/hpc-backup.png[Campus Storage Pool technical summary]
 
It's a small BeeGFS system and communicates via Lightpath with a single node of HPC, diagrammed as NAS-7-1, but it could also be a dedicated node.
 
The backup node (NAS-7-1 in the diagram), communicates with the rest of HPC and all the FSs over QDR IB as normal.  

The BeeGFS Backup System is currently diagrammed as composed of multiples of an Active+Passive chassis.  The active chassis is something like our Supermicro 36slot chassis' and the Passive boxes are the equiv in 44 slot JBODs with their own controller (in the Active box, with a SAS expander cable).  Both are LSI HBAs driving ZFS pools that are aggregated to make the BeeGFS.  This allows us to have an expandable FS that can be used for both backup and in emergencies, active FS for HPC to buffer chassis swaps, etc.
 
The Active boxes are essentially the same as our spare storage chassis.
Using 6TB disks, the hardware costs for the storage boxes are:

------------------------------------------------------------------------

Infrastructure:
                     Storage            Total    Raw  Usable
Nodes                Cost               disks     TB    TB*
 -----------------------------------------------------------------------
1 Active:            $18.4K : 36           36 -> 216   162 
1 Active, 1 Passive: $36.4K : 36+44 =      80 -> 480   360 
2 Active:            $36.8K : 36+36 =      72 -> 432   324
2 Active, 1 Passive: $54.8K : (2x36)+44 = 116 -> 696   522
3 Active:            $55.2K : 3x36 =      108 -> 648   486
2 Active, 2 Passive  $57.4K : 2x36+2x22 = 116 -> 696   522 (Pass. 1/2 pop)
2 Active, 2 Passive: $72.8K : 2x(36+44) = 160 -> 960   720

 * uncompressed; will gain back to ~raw capacity with lz4 compression
------------------------------------------------------------------------
(using 6TB disks; can mod the # to push the price down a bit.)   
 
Missing from the above:

- Metadata server - can use a TGS server, possibly with more RAM
- small 10G network switch (~$1500)
- Network cards, if needed (already have 2 spares for 
  MD server).  (~$1000)
- SSDs for MD server (4x500GB = 2x2 RAID 10) (~$500)
- software will be either self-written or Open Source (Amanda or BackupPC - see below)


== Software 

The script running on NAS-7-1 consults the RobinHood db (which is local to NAS-7-1) on a weekly basis and based on a set of rules (yet to be determined), runs rsync/parsync commands to move data from the HPC FSs to the backup system. 
 
An alternative is to use Amanda which is OSS.  Would have to re-eval to see how it deals with the types of backups we need.
 
If not using Amanda, a reaper script will have to be running on the Backup system (perhaps on the Backup MDS) to 

- remove files that have expired.
- move files to other systems or the cloud if requested
- notify users of backup changes.

As I mentioned in the morning meeting with Joseph & Allen, the human time requirements for setting up the backup hardware are dwarfed by those for setting up and dealing with how the backups are run.
 
Dana has decided that the backup will not be complete (I agree, bc of zotfiles), so we have to create a ruleset that is 'fair & balanced' & can be scripted in a reasonable way.  Here's a stab at it; please make it better.

=== Backup Rules
(a ZOTdir is one that contains more than 5000 files in it, regardless of sizes)

(??) - asks whether this is a good idea
 
- Every user gets his HOME dir backed up (50GB max), unless there are ZOTdirs; if so; they are ignored.
- Every user then gets another 100GB backed up; unless ZOTDirs.
- Groups (incl Schools, Depts, Labs) who have bought compute nodes get an additional 3TB (?) per node.  (How would this be sub-allocated? By group?)
- Groups who have contributed storage nodes get 50TB (?) per node. 
- Users & Groups can buy extra backup storage at a one-off cost of $100/TB/3yrs [6TB disks currently cost about $270 (~$45/TB)].
- Backup storage is not directly writable by users (prevents using it as active storage) but is readable by them.
- (??) For a file to exist on the Backup FS, it must exist on the primary FS.  This prevents pushing data to backup which would then exist as the only copy.
- Backups are active only at night, but are not guaranteed to be daily (a full FS scan by Robinhood takes about 1 day, so an incremental backup would probably take at least a couple of days to complete.
- only direct-mounted files are backed up (to prevent ppl from remote-mounting their laptops to be backed up)
- Users have the option of explicitly denoting Dirs to exclude and to Backup by naming them in a config file in their HOME dirs. ie: ~/.BACKUP.CFG, which might look like:
------------------------------------------------------
BACKUP
/full/path/to/really/important/dir
/full/path/to/all/my/code
/full/path/to/my/thesis/plots
 
EXCLUDE
/full/path/to/dir/to/exclude1
/full/path/to/dir/to/exclude/maybe/deeper
/full/path/to/dir/to/exclude/in/another/dir
------------------------------------------------------
Otherwise everything is backed up, up to their limit, excluding ZOTdirs. That limit is not alphabetically deterministic - if a user has more than 100GB, a backup will just stop at the 1st file that exceeds that size.