Contact: Harry Mangalam <>
Address: 1 Whistler Ct, Irvine, CA, 92617
Phone: 949 285-4487(c)
Immigration Status: Canadian Citizen, Permanent Resident of the US
1. What I do
I am familiar with a wide range of research computing domains and work with researchers to help them accelerate their work in bioinformatics, evolutionary biology, high-throughput sequencing, large scale data storage, movement, and processing, compute cluster planning & implementation, system administration, data center planning, and grant preparation. I am comfortable dealing with research scientists in many fields (having been one and having married one). I have led and assisted in successful grant preparation and application for non-profit and academic institutions. Also, see below.
2. Education
PostDoc: Medical Research Council of Canada Post-Doctoral Fellowship, with John B Thomas of the Molecular NeuroBiology Laboratory, Salk Institute, La Jolla, 1991. Genetics and biochemistry of the Drosophila singleminded gene and its gene products.
PhD: with Michael G Rosenfeld, HHMI, UCSD. Physiology and Pharmacology, University of California at San Diego, La Jolla, Dec. 1989. Transcriptional Regulation of the Growth Hormone and Prolactin genes; Cloning and characterization of Pit-1, one of the first POU-homeo genes.
MSc: with David R Jones, Zoology, University of British Columbia, Vancouver, May, 1985. Physiology and pharmacology involved in the diving response in mammals and birds.
BSc: Zoology, University of British Columbia, Vancouver, May. 1980. Honours thesis with David R. Jones.
3. Employment History
Dates | Position | Organization |
5.2006-present |
Research Computing Specialist |
Office of Information Technology, UC Irvine |
2.2005-5.2006 |
Research Associate |
Earth System Science, UC Irvine |
7.2001-2.2005 |
Principal |
tacg Informatics LLC, Irvine, CA |
1.2002-7.2002 |
Sr Bioinformatics Scientist |
Acero Inc, Menlo Park, CA |
8.2001-12.2001 |
Principal |
tacg Informatics LLC, Irvine, CA |
2000-7.2001 |
Sr Research Scientist & Project Manager (Gene Expression) |
National Center for Genome Research, Santa Fe, NM |
4. Previous Relevant Experience
HPC Compute Cluster: With the help of 1.5 other Sysadmins, I run UC Irvine’s HPC compute cluster, consisting of ~6500 64b cores, ~38TB aggregate RAM, QDR Infiniband, and ~1PB storage, including 650TB in the parallel BeeGFS filesystem. We use Son of GridEngine as the scheduler, Environment modules to keep our ~500 self-compiled applications from stepping on each other, with ROCKS as the primary provisioning system (also familiar with Perceus).
In addition to the necessary sysadmin, I also do consulting with researchers at all levels and for all domains. Many such problems have to do with data movement, analytical workflows, getting applications to work (or to work together), debugging qsub scripts, and occasional (usually catalytic) programming. By catalytic, I mean small amounts of programming that unblocks a key step in a workflow that is holding up an analysis. I also teach the Linux part of an introductory HPC/Bioinformatics course called BioLinux roughly once a month, which includes both lecture and tutorial
A frequent task is finding software to support a research need. Being research, quite often there isn’t a packaged or commercial system available, so I have to knit together multiple pieces of software into a pipeline or script, usually addressed with Perl, Python, and R/Bioconductor. If it involves computational bottlenecks I have experience in several profiling tools (Oprofile, perf, HPC Toolkit, valgrind, etc), can program in C and am familiar with the GNU build toolchain. I also assist in general infrastructure problems such as campus-wide software distribution & licensing, storage & backup issues, network problems, outreach to faculty to identify and resolve bottlenecks, representation to campus groups and UC-wide discussion & planning groups.
Earth System Science: I worked with Charlie Zender of Earth System Science at UC Irvine on the tuning and analysis of the NetCDF Operators (NCOs), a suite of utilities for subsetting and manipulating climate model data in the form of NetCDF formatted files. These data sets are among the largest in the world and the NCO programs are responsible for a significant amount of the data reduction used in analysing them. See publication.
tacg Informatics: I have done bioinformatics contract work for the Epidemic Outbreak Surveillance taskforce (now part of the Homeland Security Department), GeneCodes, the CDC, Allergan, Accelerys, and startups.
Acero: I worked in the Science group on their Genomics Knowledge Platform (GKP), which provided both syntactic and semantic integration of biological information from a number of sources through their "Biological Object Model".
NCGR: I worked on the GeneX gene expression database project, an Open Source Gene Expression database for storing and analyzing results from large scale gene expression projects. See publication.
tacg: I created the sequence analysis application tacg and it’s CGI Web interface tacgi to make a small, fast (~30X faster than GCG or EMBOSS), free, and capable molecular biology tool available for Linux/Unix. See publication and the Sourceforge site (tho code is now distributed from the github site..
5. Awards
5.1. Current
Co-PI (with Dana Roode, Allen Schiano) of a 2 yr NSF CyberInfrastructure grant to support an engineer to review and implement a number of networked storage technologies. (Sept, 2015 - Sept, 2017).
5.2. Previous
Co-PI (with Drs. Carl Cotman, Ken Longmuir, Tatsuya Suda, David Walker, all of UC Irvine) of a Pacific Bell CalREN grant: Medical Informatics Use of Asynchronous Transfer Mode (ATM) service across the LA basin (June '94-September '97)
Co-PI (with Dr. Thomas Cesario) of Irvine Health Foundation Grant to implement ATM-based telemedicine applications using the infrastructure of the above CalREN grant (July, 1995 - September, 1997).
Medical Research Council of Canada Post-doctoral Fellowship to study Drosophila genetics with Dr. John Thomas at the Salk Institute (1990-1991).
6. Journal Publications
7. Coding and Technical Writing Examples
A Reference Model Outline for Research Computing
A comparison of the Fraunhaufer and Gluster Parallel Filesystems
The HPC Cluster Users Manual - a HOWTO for Linux cluster users at UCI
Manipulating Data on Linux - an introduction to Data Processing for Linux novices
How to transfer large amounts of data via network
Mind your NegaBIT$ - opinion piece advocating for more use of Open Source Software at UC.
The Perceus Provisioning System
Short Guide on How to Evaluate Open Source Software for infrastructure use.
The Storage Brick - Fast, Cheap, Reliable Terabytes - technical evaluation of using cheap storage bricks for research storage.
clusterfork: a cluster admin tool
scut & cols - utilities to slice, re-order, join, and view columnar data
kdirstat for Clusters
An R Cheat Sheet
The economics of Open Source Software use in Municipal IT
8. References
Available on Request
9. Latest Version
The latest version of this document can be retrieved here in HTML.