= A Reference Model Outline for Research Computing by Harry Mangalam v1.00 Nov 3, 2015 // export fileroot="/home/hjm/nacs/Reference_Model_UC_Research_Computing"; asciidoc -a icons -a toc2 -b html5 -a numbered ${fileroot}.txt; scp ${fileroot}.[ht]* moo.nac.uci.edu:~/public_html == Intro & Intended Value This document was underway before the meeting of Oct 13th at SDSC, but that meeting made it seem more relevant. David Greenbaum (UCB) moderated a session on defining a 'Reference Model' for Research Computing, which took input from a number of UC ppl. This doc doesn't purport to represent all the views expressed, but a few are reflected herein. Like many places, we're continually trying to define the optimum model for Research Computing, the optimum number of FTEs for its support, as well as ways to pay for both. This model is going to be different for every institution, but there will probably be a fair amount of overlap and indeed, may even identify areas for collaboration. The intent of this doc is to encourage you to mark it up as much as possible, use it as a series of discussion points, and then comment on it or send it back to me where I'll incorporate it into a single doc. If you have additional docs that you've already prepared on this topic, I'd very much like to see them if they aren't proprietary. If you have a better suggestion as to how to do this - wiki? BBS? Q&A? Doodle/Survey Monkey Poll? - please let me know. I'd be happy to try another format. == What is Research Computing? The lists below name areas that defines (for me) the components of 'Research Computing'. Your take is probably different, so I'd encourage you to modify this list or comment on the points on which you have a strong opinion, especially the funding parts. Obviously, there's not a lot of content in this doc besides the bullet points, but that's something that will be filled out as ppl respond & critique. == Brief description of your Institution - name: - # faculty: - # staff: - # undergrads: - # grad students: - Funding level as a % of total IT budget: - # Research Computing Staff - Current mechanism of Research Computing Support (open-ended description of your RCS and/or links to existing docs) .Short Example: UCI's Research Computing infrastructure [NOTE] ======================================================= - # faculty: 1,100 - # staff: 9,400 - # undergrads: 25,000 - # grad students: 6,000 - Funding level as a % of total IT budget: ~3% - # Research Computing Staff: 1 GIS, 2.5 Cluster and Programming - Current mechanism of Research Computing Support: Staff is paid out of OIT funding; hardware support is based on catastrophic failure. Cluster growth is funded by lab groups buying hardware with OIT buying rack, power, networking infrastructure. https://hpc.oit.uci.edu/HPC_Overview.html[See this overview] ======================================================= == Computation The mechanisms/methods/environment by which users do their computation. Which ones are growing, slowing, or static? Now and as you look ahead. Also, how important are they in both science and for jobs? - via user desktop - via user laptop - via BYOD(?) - via cluster/shared/server & the Linux ecosystem - via cloud services that we provide interfaces to. - 'Research Computing Desktop' (aka RCD) - a pre-defined Desktop option (defined as a VM or container) that could be accessed via an exported display (via x2go, NoMachine, or similar). Useful or wasteful? - Command Line Interface (CLI) vs Graphical User Interface (GUI) - analytical reproducibility and data provenance == Storage How we store, backup, and archive data. - active files - the files in which the user is currently interested. - object storage - similar to above, but via an object store instead of a POSIX filesystem. - hpc storage: POSIX filesystems vs hadoop vs other filesystem-like storage - cloud storage, both private and public. - sync & share - the way most faculty exchange file-based info now. - web services - includes both file exchange and web-available code and services. - LAN vs WAN distances - how the above services are made available to local and remote users. - Backup (for research & teaching faculty) * short term * archive * disaster recovery == Networking The parts of RC that allow ppl and machines to exchange data when the bits are in different places. - Science DMZ / fast Internet channels for data - Commodity Internet LANs - Spending $ on fast external channels that benefit few vs increasing the endpoint speeds that benefit more. (ie 100GbE backbones vs 1GbE endpoints) - Bulk data movement, specialized transport nodes. - Can networks be made fast enough to act as a data bus so that large data sets don't need to be transferred, but can be used in-place? - Trunk vs Leaf networks: speeds and access == Security How to keep the good things in and the bad things out. - Secure Storage & Computation. Important for any Personal Health Info (PHI) related data. - What benchmarks have to be met? HIPAA/FISMA? - Physical security, Firewalls and configs, active SELinux, Intrusion Detection Systems, multi-factor auth, encrypted filesystems, enforced encrypted data utilities? - Firewalled vs unprotected/DMZ nets - What platforms are more/less effective from a security POV? - What other issues are affected by having to support less secure systems? - How much $ could be saved on licensing, legal, and support costs by reducing our dependence on such systems? == Software How to assemble, compile, share, organize the software we all need for easiest access and suport. - Proprietary SW & long term licensing costs - vs Open Source Software - Packaging for various platforms - Packaging for most efficient sharing - Server vs desktop platforms - Commandline vs Graphical - Services that act as 'Collaboration Hubs' or 'Collaboratories'. Like Science Content Management Systems. Examples include https://hubzero.org[HubZero], https://www.zimbra.com/[Zimbra], and mods to standard CMSs like https://www.joomla.org/[Joomla] and and even project management sites like https://asana.com/[Asana] and https://www.huddle.com/[Huddle] == Expertise / Support The human expertise that makes everything work. - concierge service (what resources are where and how to exploit them) - programming * short term/catalytic vs long term * specialized technical programming and assistance (GIS, chemistry, GPU codes) * web (data distribution and active data/cgi) * database and other approaches to structured data. * porting from one platform to another (Windows -> Linux) * optimization & parallelization - hardware * cluster management * individual machine config * contracted sysadmin - data movement, debugging - Teaching / Instruction * classes in Linux, Programming, Analysis, BigData, etc. * should there be an undergrad/grad recommendation or requirement for Linux, stats, data, reproducibility, provenance? (Yes!) * what's the minimum instruction required and how to include in the curriculum? == Scaling RC & RC Support What is the optimal scale for implenting RC? - Departmental? - School? - Campus? - multiCampus? (all of UC) - National? (NSF / XSEDE) - International? (Open Science Grid) == Funding The hard part. How do we create and continue the support for Research Computing? - Who should pay for it? - How much? how much of what? (total IT spend? research budget? *other metric?) - How much should the end-user pay vs the institution? * free intro accounts and then per-user cost? * per-week cost allocation (per CPU-hr, GB-mo, etc) * coarser grained? - Seed & Sustain: is this a good approach? - how to sustain it? * direct recharge (and how is that calculated?) * administrative investment * departmental input * Faculty input via direct grant allocation * Self-writ or collaborative Grants. - What is 'Return on Investment'? * how is that calculated? == Oversight & Guidance Who overses the Research Computing group on campus? - Who indeed? * Academic faculty? * IT staff? * combination of above (and what ratio?)? - Where should such a resource be placed? * in an existing IT structures? * as a separate org? * as part of an existing dept or school? - Length of appointment to oversight group - Optimum meeting schedule - reporting and responsibilities * grant-writing assistance * yearly reports to EVC and/or RC community