Contact: Harry Mangalam <hjmangalam@gmail.com>
Address: 1 Whistler Ct, Irvine, CA, 92617
Phone: 949 478-4487(c)
Immigration Status: Canadian Citizen, Permanent Resident of the US
View this document with sidebar Table of Contents at this link: http://moo.nac.uci.edu/~hjm/Mangalam_2016.html
1. What I do
I am familiar with a wide range of research computing domains and work with researchers to help them accelerate their work in data movement, storage, and analysis. I also assist in the areas of bioinformatics, evolutionary biology, high-throughput sequencing, large scale data storage in networked and parallel filesystems, data processing, compute cluster planning & implementation, system administration, and data center planning. I am comfortable dealing with research scientists in many fields (having been one and married one). I have led and assisted in successful grant preparation and applications for non-profit and academic institutions. Also, see below.
2. Education
PostDoc: Medical Research Council of Canada Post-Doctoral Fellowship, with John B Thomas of the Molecular NeuroBiology Laboratory, Salk Institute, La Jolla, 1991. Genetics and biochemistry of the Drosophila singleminded gene and its gene products.
PhD: with Michael G Rosenfeld, HHMI, UCSD. Physiology and Pharmacology, University of California at San Diego, La Jolla, Dec. 1989. Transcriptional Regulation of the Growth Hormone and Prolactin genes; Cloning and characterization of Pit-1, one of the first POU-homeo genes.
MSc: with David R Jones, Zoology, University of British Columbia, Vancouver, May, 1985. Physiology and pharmacology involved in the diving response in mammals and birds.
BSc: Zoology, University of British Columbia, Vancouver, May. 1980. Honours thesis with David R. Jones.
3. Employment History
Dates | Position | Organization |
---|---|---|
5.2006-present |
Research Computing Specialist |
Office of Information Technology, UC Irvine |
2.2005-5.2006 |
Research Associate |
Earth System Science, UC Irvine |
7.2001-2.2005 |
Principal |
tacg Informatics LLC, Irvine, CA |
1.2002-7.2002 |
Sr Bioinformatics Scientist |
Acero Inc, Menlo Park, CA |
8.2001-12.2001 |
Principal |
tacg Informatics LLC, Irvine, CA |
2000-7.2001 |
Sr Research Scientist & Project Manager (Gene Expression) |
National Center for Genome Research, Santa Fe, NM |
Earth System Science: I worked with Charlie Zender of Earth System Science at UC Irvine on the tuning and analysis of the NetCDF Operators (NCOs), a suite of utilities for sub-setting and manipulating climate model data in the form of NetCDF formatted files. These data sets are among the largest in the world and the NCO programs are responsible for a significant amount of the data reduction used in analysing them. See publication.
tacg Informatics: I have done bioinformatics contract work for the Epidemic Outbreak Surveillance taskforce (now part of the Homeland Security Department), GeneCodes, the CDC, Allergan, Accelerys, and startups.
Acero: I worked in the Science group on their Genomics Knowledge Platform (GKP), which provided both syntactic and semantic integration of biological information from a number of sources through their "Biological Object Model".
NCGR: I worked on the GeneX gene expression database project, an Open Source Gene Expression database for storing and analyzing results from large scale gene expression projects. See publication.
4. Previous Relevant Experience
4.1. HPC Compute Cluster
With the help of 2.5 other SysAdmins, I help run UC Irvine’s HPC compute cluster, the physical characteristics of which are described here in more detail. I do most of the BeeGFS-related work (5 separate filesystems ranging from 1 -6 subvolumes, totalling about 1.5PB, on ext4, XFS, & ZFS). I also respond to most of the biology-related software questions and installations (we support about 1000 custom-installed software packages via environment modules. Also the Python, Perl, R, and MATLAB installs and questions.
I’m comfortable coding and debugging bash, Perl, Python, C, R, tho I’ve been doing progressively original programming (tho see the section below for specific examples. As well, I’ve installed and maintained various web-based systems for tech support (Trac, Request Tracker) and the OwnCloud private cloud system.
I also particiate in UC-wide Research Computing discussions and Organizations including the Research IT Committee and the UCSD-led Pacific Rim Platform, to create structures that allow faster and more effective cross-campus support and collaboration. I was responsible for writing and updating the UC Irvine Cyberinfrastructure Plan and also largely structured and edited UCI’s Research CyberInfrastructure Center proposal
An increasing amount of modern research involves moving large chunks of data from one place to another; in this job, I’ve debugged many of the parameters of that movement, including internal to programs, on a system level, and on the networks, both LAN and WAN, including writing a parallel version of rsync and an improved mechanism to transfer data quickly if insecurely, tnc. Recently, I’ve been involved in large amounts of data across WANS, mostly to Google’s cloud storage, using both their gsutil and the rclone utility. This experience is currently being added to the data movement doc mentioned above.
4.2. Consulting
In addition to the Systems Administration, I also do consulting with researchers at all levels and for all domains. Many such problems have to do with data movement, analytical workflows, getting applications to work (or to work together), debugging scheduler scripts, and occasional catalytic programming. By catalytic, I mean small amounts of programming that unblocks a key step in a workflow that is holding up an analysis. Also, catalytic implies not being consumed in the process.
A frequent task is finding software to support a research need. Being research, quite often there isn’t a packaged or commercial system available, so I have to knit together multiple pieces of software into a pipeline or script, usually addressed with Perl, Python, and R/Bioconductor. If it involves computational bottlenecks I have experience in several profiling tools (Oprofile, perf, HPC Toolkit, valgrind, etc), can program in C and am familiar with the GNU build toolchain.
4.3. Infrastructure
I also assist in general infrastructure problems such as campus-wide software distribution & licensing (for which I initiated the 1st completely electronic software distribution system on campus (example here), and storage & backup issues, network problems, outreach to faculty to identify and resolve bottlenecks, representation to campus groups and UC-wide discussion & planning groups.
5. Awards
5.1. Mediated Donations
Due to my ongoing relationship with TGS Management, our group has been the recipient of 7 racks of compute nodes (300 cores/rack, 3 racks currently in operation as part of HPC, 2 racks donated to ICS), 6 multicore (32-128 core) servers, several tape robots, and a 396-port (72 ports active) QDR Infiniband switch. The approximate depreciated value of this equipment was about $250K.
5.2. Invited Presentations
-
Sept 21, 2016, Basel Life Science Week, Data Security Section. Storage for Inforgs.
-
Nov 15, 2016, Supercomputing16, BigData BOF, BeeGFS in Real Life.
5.3. Current
Co-author & co-editor (with the RCI Vision Workshop) of the document A Vision For Research CyberInfrastructure that was recently funded from the UCI internal budget to provide about $1.4M for the establishment of the RCIC, hiring of several ongoing FTEs and significant hardware upgrades for Research Computing at UCI.
Co-PI (with Dana Roode, Allen Schiano) of a 2 yr, $400,000 NSF CyberInfrastructure grant to support an engineer to review and implement a number of networked storage technologies. (Sept, 2015 - Sept, 2017).
5.4. Previous
Co-PI (with Drs. Carl Cotman, Ken Longmuir, Tatsuya Suda, David Walker, all of UC Irvine) of a Pacific Bell CalREN grant: Medical Informatics Use of Asynchronous Transfer Mode (ATM) service across the LA basin (June '94-September '97)
Co-PI (with Dr. Thomas Cesario) of Irvine Health Foundation Grant to implement ATM-based telemedicine applications using the infrastructure of the above CalREN grant (July, 1995 - September, 1997). This resulted in one of the first telemedicine multicast video broadcasts via the MBONE Internet system in 1994.
Medical Research Council of Canada Post-doctoral Fellowship to study Drosophila genetics with Dr. John Thomas at the Salk Institute (1990-1991).
6. Journal Publications
via Google Scholar (probably incomplete) or The longer, but older version
7. Coding and Technical Writing Examples
tacg, a grep for DNA
A comparison of the Fraunhaufer and Gluster Parallel Filesystems
The HPC Cluster Users Manual - a HOWTO for Linux cluster users at UCI
Manipulating Data on Linux - an introduction to Data Processing for Linux novices
How to transfer large amounts of data via network
Mind your NegaBIT$ - opinion piece advocating for more use of Open Source Software at UC.
The Perceus Provisioning System
Short Guide on How to Evaluate Open Source Software for infrastructure use.
The Storage Brick - Fast, Cheap, Reliable Terabytes - technical evaluation of using cheap storage bricks for research storage.
clusterfork: a cluster admin tool
scut & cols - utilities to slice, re-order, join, and view columnar data
kdirstat for Clusters
An R Cheat Sheet
The economics of Open Source Software use in Municipal IT
A Reference Model Outline for Research Computing
8. References
Available on Request