1. Current Status

We present this class approximately every 2 months based on demand and inquiries. The classes may be taken together (recommended if you don’t know Linux) or separately if you already have Linux experience. If you have specific questions about the Bioinformatics-related parts of the course, please contact Jenny Wu <jiew5@uci.edu> for more information. If it’s Linux/Programming related, please contact Harry Mangalam <harry.mangalam@uci.edu>.

2. General Information

These are full-day classes that teach the basics of Bioinformatics on UCI’s HPC compute cluster which runs the Linux Operating System. If you are only interested in Linux, you can skip the Bioinformatics session. Conversely, if you already know your way around Linux and the HPC cluster, you may wish to skip the Linux parts. You can tell if you are up to speed by browsing the Linux Lecture slides and Tutorial scripts below.

3. Class Lecture Slides and Tutorial Notes

You can determine if you’re interested in taking the class by reviewing the class slides and tutorial scripts below.

3.1. Linux

As preparation for the Linux part, we strongly suggest viewing the Software Carpentry introduction to the shell videos

3.3. Tutorial Data

Input and example data files for the tutorials are stored here, which is a browsable directory. If there is a file called MANIFEST, please read it for descriptions of the files you find there.

4. Target Audience

This class was designed for faculty, postdocs and graduate students who are working on genomics and other large analytical projects and want a quick introduction to cluster computing with Linux and Bioinformatics. It assumes that the participants will be naive Linux users with some idea of the analysis they want to achieve. This time the course has been split to into a Linux part (which may be of interest to non-BioSci students as well) and a Bioinformatics part.

Mixing metaphors, this is not a Computer Science course. We will not be teaching you how the engine works; we will be teaching you how to drive.

5. Class Calendar & Location

5.1. When

It depends on demand. The courses are usually held on 9am - 5am, Tuesday and Thursday of the same week

5.2. Where

Bren Hall 3011 Bldg 314 (H8 on the Campus Map), or here on Google Maps

Look for signs at the main doors.

6. Availability, Cost, & Deadlines

The class will consist of a morning lecture followed by an extended tutorial that will last thru the afternoon. Both sessions are available to the entire UCI community, but YOU MUST BE REGISTERED TO ATTEND. The Linux session is free to the attendees (thanks to the Data Sciences program). The Bioinformatics session costs $50 so if your interest is in learning Linux on the HPC system, but not Bioinformatics, you can attend only the Linux part. Both sessions include coffee, mid-session snacks and lunch.

Sign up for both sessions and pay if needed at this link by Jan 16th to assure your place in the class.

The Linux class is offered more frequently than the Bioinformatics class and if scheduled, is also offered via the Data Sciences Events page.

For the tutorial, you must also have an HPC cluster account, which is free to all UCI researchers. Send email to <hpc-support@uci.edu> to obtain one.

7. Day 1 - Linux and the HPC cluster

7.1. Lecture

Introduction to Linux and why you should use it. Overview of commands and getting around. What is/isn’t a cluster, logging in with ssh, setting up your environment, text editors, quotas, data management, graphics, useful bash shell commands, environment variables, pattern matching and regular expressions, programs: how to find them, find out about them, run them, simple debugging.

7.2. Tutorial

Logging in with ssh, commandline editing, setting your prompt, transferring data in, editing, de/compressing, unpacking, basic bash and utility commands, cluster status commands, software modules, Grid Engine commands. Introduction to simple data manipulation and scripting/ programming with bash, Perl, and R.

Bring your specific problems to discuss with the Instructor & TAs.

8. Day 2 - Bioinformatics

8.1. Lecture

NGS workflow, general data analysis pipeline, data format, short read mapping, alignment software, general workflow for DNA-seq, RNA-seq and ChIP-seq and corresponding software resources. Data visualization tools.

RNA-seq data analysis: data normalization, spliced mapping, transcriptome assembly and abundance estimation. Novel transcripts. Tophat/cufflinks/cuffdiff parameter details. Open Source RNA-seq software with graphical user interface. ChIP-seq work flow and software.

8.2. Tutorial

Getting reference genome sequences to HPC, getting reference annotation. FASTA and FASTQ format of data. Prepareing reference sequence for alignment. Using tophat/bowtie to align short reads. Use the alignment obtained and run cuffmerge, cufflinks and cuffdiff. Examine cuffdiff output. Depending on the time and level of students, cummeRbund can also be covered. ChIP-seq with Galaxy if needed.

9. Tutorial Data Sets

The data set directory is browsable here

10. Prerequisites for the tutorial:

  • a Mac, PC, or Linux laptop with wifi pre-registered with the UCI Mobile network.

  • If Mac:
    CyberDuck or other graphical file transfer program (GFTP) + the Mac x2go client. We have had problems with the 4.0 release; please use the 3.99.2.1 release linked above. Also, to get it working correctly with your X11 software, please start the X11 software first, THEN start x2go.

  • If Windows:
    the putty terminal program
    CyberDuck, WinSCP or other GFTP client.
    Optionally, the the Windows x2go client, altho we have discovered that there are some applications that refuse to work with it.

  • If Linux:
    The x2go client allows you to view graphical output from the HPC cluster.