NSF CIE Storage Recruitment

We are looking for a storage-oriented CyberInfrastructure Engineer as part of a fulfillment for a National Science Foundation grant.

1. Dates

As of today (Thursday, Sept 24, 2015), we are actively searching for a person to fulfill this opening.

This contract position is funded for 2 years by the NSF grant, and for at least 1 more year from the Office of Information Technology. In other words, it is a soft money position that we hope transitions to a permanent position, but we cannot guarantee a further period at this time. The submitted grant (stripped of NSF administrivia) is available here as a PDF.

2. General Description

The job involves investigating, evaluating, and implementing a wide range of storage technologies including large scale primary file storage, backups, high speed file exchange and syncing across both LANs and WANs, and using APIs from Google, Amazon, Backblaze, and other Cloud services to provide similar services.

You would be joining a 4 person Research Computing team that admins a 6500 core, 1500 user, 1PB, QDRIB cluster. We provide programming help, install, update,and modify lots of (mostly Open Source) software, provide introductory classes on Linux and Research Computing Techniques, provide custom configuration for other compute systems. You would be be contributing your expertise to this group and hopefully gaining some expertise from us.

You would also be responsible for communicating to research faculty about these services and assisting them in using them, via face-to-face meetings, presentations, user documentation via the web, email, and other assistive technologies.

3. Specific Responsibilities

You would be the person most (but not solely) responsible for fulfilling the goals of the NSF grant including:

Enhancing the deployment and uptake of our core 10GbE DMZ by making its bandwidth more widely available and usable to researchers.
Design and small scale evaluation of a multi-protocol Campus Storage Pool.
Design and Deployment of a Hybrid (local and Cloud components) Backup system
Evaluating, implementing, and documenting, tools to facilitate large scale data movement, syncing, and sharing.
Design and evaluation of a Cluster Cloudbursting protocol that can use public cloud services to address private cloud overloads.
Implementing introductory to advanced training courses for faculty, staff and students on Linux-based systems for data/BigData analysis, programming, visualization.
Outreach to researchers and direct support for their projects, especially using catalytic techniques - small amounts of effort that yield large effects, without yourself being exhausted in the process.

Admittedly, these short points are not entirely self-explanatory. Email me if you want clarification.

4. Required skills:

Advanced Linux Sysadmin experience with CentOS & Debian-derived distros
Demonstrated very good skills (via existing projects) in several of the following programming languages, especially in the areas of system calls, remote execution, networking, and sysadmin: Python, Perl, Bash, Ruby, JavaScript, Java, R (code samples required).
Familiarity with multiple installation, configuration and build tools (CMake, autoconf, easy_install, CPAN, apt/dpkg, yum/rpm). Experience with Environment Modules a definite plus. Good understanding of library conflicts and resolutions, ldd, rpath, LD_LIBARY_PATH, etc.
Familiarity with the Linux kernel & drivers - how they are configured, built, modified, and integrated with each other.
Deep familiarity with Compute Cluster techniques including:
- provisioning technologies like ROCKS, XCAT, Warewulf, Perceus
- Linux filesystems such as btrfs, ext4, XFS, ZFS, etc.
- admin and debugging experience with single-server networked File Systems such as NFS, SMB/CIFS, WebDAV, sshfs
- experience with multi-server distributed filesystems such as FhgFS/BeeGFS, Gluster, Lustre, GPFS, Ceph, Swift
- experience with Cloud server technologies such as OpenStack storage, Google’s commercial offerings (Nearline, Drive), Amazon’s AWS storage systems (S3, Glacier), and associated technologies.
- admin and debugging experience with multi-switch networking technologies, including 10/1GbE, Infiniband, etc.
Hardware configuration and debugging with cluster-scale hardware - PDU, power strip, and copper and fiber cabling etiquette, disk controllers, SATA/SAS disk utilities, Infiniband, all types of ethernet.
experience with data movement utilities such as rsync, scp, netcat, bbcp, tar, Aspera, Globus, gridftp, etc.
excellent communication and especially technical writing abilities (samples required).
good quantitative science skills and the ability to communicate results succinctly and compellingly to faculty and other researchers who are brilliant in their fields but ignorant of the most basic informatics.
you bathe regularly.
you can converse and present in full sentences.
you can both take and provide constructive criticism without offense.

5. You might be a good match if ..

you were part of an academic or commercial cluster administration team.
you were involved in filesystem or distributed storage research.
You helped build the Google Filesystem but now want to re-enter the academic area.
you contributed to open source storage projects.
you have a multi-Petabyte home media server.

6. To apply

Please send your Resume in PDF format to:

*harry.mangalam@uci.edu*

We appreciate brevity, but we won’t penalize you if you expand on favorite projects or subjects.

Besides the usual summary of education, work experience, and expertise, we require examples of your code (your choice of presentation, your choice of languages), documentation you’ve written (ditto), references from (and/or contact info for) 3 professional associates unrelated to you, and if it gets that far, a background check.