HOWTO - CLCBio Genomics Workbench on the HPC cluster ===================================================== by Harry Mangalam v1.7 Oct 19th, 2015 :icons: //Harry Mangalam mailto:harry.mangalam@uci.edu[harry.mangalam@uci.edu] // this file is converted to the HTML via the command: // export dir="/home/hjm/nacs/ghtf"; fileroot="${dir}/HOWTO-CLCBioOnHPC"; asciidoc -a toc2 -b html5 -a numbered ${fileroot}.txt; scp ${fileroot}.[ht]* $dir/CLC_Network_Licensing.png moo:~/public_html; // update svn from HPC // scp ${fileroot}.txt hmangala@claw1:~/bduc/trunk/sge; ssh hmangala@bduc-login 'cd ~/bduc/trunk/sge; svn update; svn commit -m "new mods to CLCBio HOWTO"' // and push it to Wordpress: // blogpost.py update -c HowTos ${fileroot}.txt // don't forget that the HTML equiv of '~' = '%7e' // asciidoc cheatsheet: http://powerman.name/doc/asciidoc // asciidoc user guide: http://www.methods.co.nz/asciidoc/userguide.html Introduction ------------ The CLC software that UCI has licensed includes the Java-based http://www.clcbio.com/products/clc-genomics-workbench/[Genomics Workbench] which can be installed on your personal Mac, Windows, or Linux PC and provides a Graphical User Interface (GUI) to a number of frequently used analyses. In this respect, it is similar to http://www.dnastar.com/t-products-lasergene.aspx[DNASTAR's Lasergene software] which is available from BioSci (if interested, please contact mailto:mrm@uci.edu[Matthew Martinez]). Setting the License Server ~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to run this software 'on your personal Mac/PC', you will have to http://www.clcbio.com/products/clc-genomics-workbench-direct-download/[download the software directly from CLCBio], install it appropriately for your platform, and then point it at the Biochemistry license server which will allow you to check out a token to run it. The workbench will ask you to identify it when it starts up, but if you miss it on the 1st startup or need to change it, you can edit the License Server configuration by clicking: Menu item *HELP* -> *License Manager* -> *Configure Network License* -> click *Manually specify license server* and provide the information shown below: image:CLC_Network_Licensing.png[CLC Network Licensing screenshot] If the image above is missing, the required info is shown below in text form ------------------------------------------------------ [x] Enable license server connection ( ) Automatically detect license server (o) Manually specify license server Hostname/IP-address [128.200.4.52] Port [6200] [ ] Disable license borrowing If you choose this option, users of this computer will not be able to borrow licenses from the License Server. ------------------------------------------------------ Another way in which the CLC package differs from the available Lasergene is that UCI has also licensed the http://www.clcbio.com/index.php?id=1376[CLC Genomics Server Assembler], which provides an integrated multicore assembler functionality, supporting the Illumina format among others. The machine hosting it on the HPC cluster has 64 64bit Opteron cores and 512GB RAM, so it should have sufficient resources to handle most assemblies. There are 2 ways to use the CLC software. . You can use it in standalone mode, on your own machine, which supports all of the functionality. If you decide to run it on your own personal computer, just download and install http://clcbio.com/download_genomics[the Genomics Workbench], then when starting it, direct it at the Biochemistry license server: 128.200.4.52, port 6200, as described above . You can use it on the HPC cluster node which means that the GUI will run a bit slower, but you will have access to the hardware resources of that 64core machine which are larger than most personal machines. The following describes how to run it on the HPC machine. Pre-Requisites -------------- To use the CLCbio Genomics Workbench from the HPC server, you must first have an account on the HPC cluster. Mail mailto:harry.mangalam@uci.edu[Harry Mangalam] to request one if needed. Also, your Mac or PC 'must be set up to use http://en.wikipedia.org/wiki/Secure_Shell[ssh] and support http://en.wikipedia.org/wiki/X_Window_System[X11 graphics]'. The CLCBio GUI is a Java application that uses the above-mentioned 'X11 graphics' to provide the application from the HPC cluster. To log into HPC, you 'must use ssh', configured to tunnel X11 graphics (on Linux and MacOSX, 'ssh -Y'); on Windows, it must be explicitly configured as described below) Windows ~~~~~~~ If you use *Windows*, you'll need to set up the x2go software http://moo.nac.uci.edu/~hjm/biolinux/Linux_Tutorial_12.html#_x2go[as described here]. Macintosh ~~~~~~~~~ If you use a *Mac*, you'll need to install the http://xquartz.macosforge.org[XQuartz software] (no longer bundled with the OS). All you have to do is install it and start it running in the background to accept the X11 windows ('Applications -> Utilities -> XQuartz'). You can then either use it standalone, or use the http://moo.nac.uci.edu/~hjm/biolinux/Linux_Tutorial_12.html#_x2go[Mac version of x2go], which is faster (but requires the XQuartz software to work). Linux ~~~~~ If you use *Linux*, you should be good to go already. These tools and packages are already installed on all popular distributions of Linux. Logging in to the CLC server ---------------------------- You will have to first log into the 'HPC login node' node and from there, 'qrsh' into compute-3-5. - First connect to 'hpc.oit.uci.edu', then connect to the 'compute-3-5' node via ---------------------------------------------------------- qrsh -q ghtf@compute-3-5 ---------------------------------------------------------- This will register an 'ssh -Y' session with the scheduler and connect you to node 'compute-3-5' where the CLCBio app is licensed. Using the x2go client ~~~~~~~~~~~~~~~~~~~~~ The http://moo.nac.uci.edu/~hjm/biolinux/Linux_Tutorial_12.html#_x2go[x2go free software] enabled you to start and maintain a 'Terminal' connection to compute-3-5, even when you disconnect (for example, to close your laptop to go home or even abroad). It's somewhat complicated to set up but it gives usable performance even across continents (most of the time). When you start this connection, you'll have a terminal application (the '/usr/bin/gnome-terminal' described in the x2go setup) so continue as described below. Starting the CLC Workbench ~~~~~~~~~~~~~~~~~~~~~~~~~~ Once you have qrsh'ed into 'compute-3-5', the process is identical for all clients. The system will spew some informational lines and the identify itself as 'compute-3-5'. ---------------------------------------------------------- # then you type clcgenomicswb8 ---------------------------------------------------------- If you've done everthing right, the CLCBio splash screen will pop up and shortly thereafter you'll see the whole application window. It has already been directed to the Biochemistry license server. Genome data ----------- Some genome reference data is stored in '/data/apps/commondata/', which currently has 'human, rat, mouse, yeast, and elegans' genome sequences available in compressed fasta files on a per-chromosome basis.