= How to Install Programs on Linux, [small]#Research Edition#. Harry Mangalam v0.0 pre-alpha, Dec 19, 2017 :icons: // fileroot="/home/hjm/nacs/How_to_Install_Programs_on_Linux"; asciidoc -a icons -a toc2 -a toclevels=3 -b html5 -a numbered ${fileroot}.txt; scp ${fileroot}.html ${fileroot}.txt moo:~/public_html // http://www.methods.co.nz/asciidoc/userguide.html // http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html // https://stackoverflow.com/questions/3996651/what-is-compiler-linker-loader == THIS IS IN PROGRESS, INCOMPLETE (as of Dec 20, 2017) == Introduction The vast majority of program installations onto a Linux system will be for your own PC and you'll almost certainly use the system software installer to do so. All of the >100 Linux distributions have graphical software managers that make it trivial to install repository software in binary form onto your PC. This usually includes a large amount of popular scientific software as well: R, Python/SciPy, Jupyter, gnuplot, Octave, Scilab, etc. However, if you're reading this, you're probably involved in research, and as such, the software you want to use has almost certainly not reached your Linux distribution's repositories yet. (It's RESEARCH, fergadsake). Also, for reasons of speed, RAM, and disk, you may very well be analyzing your data on a shared computing platform like a cluster where you do not have root permissions and the sysadmins may very well be straight out of a http://bofh.bjash.com/[BOFH episode]. ie: they may not be responsive, polite, caring, sober, or even awake. If this is the case, there may be a long time between a request to install a piece of software and fulfillment. The rest of this document will briefly review the mechanisms to installing software from repositories if you have root permissions, but then spend most of the content describing how to install software without being root. Sometimes the approaches apply to both. == Font Conventions - 'italic fonts' -> reserved for text emphasis ('Arghhhh') - *bold fonts* -> programs or utilities (*ldconfig, nm*) - *'bold & italic'* -> Environment variables (*'LD_LIBRARY_PATH'*) - [underline]#underline font# -> something meaningful ( [underline]#like this example#) - [red]#Red text# -> a notable user ([red]#root, postgres#) - [green]#Green text# -> files or paths in the text body ([green]#/usr/include, /data/users/hmangala#) // - [fuchsia]#Fuchsia text#-> something meaningful ([fuchsia]#like this example#) // - [blue]#Blue text#-> something meaningful ([blue]#like this example#) // - [purple]#Purple text#-> something meaningful ([purple]#like this example#) - and sometimes I'll ignore all the above to inject emphasis on non-Linux terms. == How programs are distributed Standard utilities and applications are made available to the various Linux Distributions (distros) in different forms but they all have similar functionality. All popular distros have graphical software installation managers, sometimes multiple ones, often related to the Desktop system that you have selected (GNOME, KDE, Cinnamon, MATE, etc). Using these so-called 'Graphical Software Managers (GSMs)', you can search the repository databases (repo's) for https://en.wikipedia.org/wiki/Regular_expression[regular expression] patterns of files, applications, utilities, etc. The search utility will return the package name and then you can use the same GSM to download, unpack, install, and often configure the package for use. All the top-level package managers also resolve dependencies so that they will also download and install all the requirements of the end-target application. These GSMs are usually wrappers around the core Commandline Package Managers (CPMs) that handle the actual manipulation of the constituent binary packages. This includes *yum* for RedHat-derived distros and *apt* for Debian ones. These CPMs usually have a separate and specific Commandline Package Installer (CPI) (*rpm* for RedHat, *dpkg* for Debian) to install the specific packages. Generally users should never have to use those lower-level installers, but they can be quite powerful if you need to do specific gymnastics with an installation or fix a bungled installation. So the model is: 'Graphical Software Managers (GSMs)' -> Commandline Package Managers (CPMs) -> Commandline Package Installer (CPI) (for those with OCD). Also, the applications that these package managers install are 'binary installations' that are pre-compiled (much like a Windows package or Macintosh DMG). You will never have to compile a package downloaded in this way, altho sometimes the installation will need to do a small compilation to integrate it into your specific platform (unless you insist on using https://en.wikipedia.org/wiki/Gentoo_Linux[Gentoo Linux] that takes the masochistic POV that everyone should experience the sweet torture of compiling every single package that is installed on your system.) === Distribution-Specific There are at least 100 different https://distrowatch.com/[Linux Distributions], but most of them fall into variants of about 4 major delineations: - Debian: -> Ubuntu, Mint, Kali, Elementary, etc - RedHat: -> CentOS, Fedora, Scientific Linux - Arch: -> Manjaro, Antergos, etc - Open SUSE -> GeckoLinux, etc I'll provide some information about the first 2 distributions since we currently use CentOS on our HPC cluster at UCI and Debian variants are the most popular for personal use, especially Ubuntu and the derived Mint. The native mechanisms are preferred for personal installation since they can be used to install scripts, programs, libraries, configuration files. However, they often cannot be used on large multi-user systems since they require root permissions that a normal user won't have on a large system. ==== Debian-derived (Debian, Ubuntu, Mint) - the different distros often use different GSMs (Ubuntu's 'Software Center'), Mint's 'mintinstall Software Manager', but they all manipulate the underlying package managers to so the same thing (described above). - *apt* (similar to https://itsfoss.com/apt-vs-apt-get-difference/[but different from] *apt-get*) - is the CPM used to search for, download, and install applications, by default resolving dependencies along the way, such that a single command can install very complex applications such as *R/rstudio* (a statistical and data science-oriented language and its GUI) and various add-ons such as *Bioconductor* (a large R package for bioinformatics and biostatistics). - *dpkg* is part of the *apt* ecosystem that actually installs/removes the individual 'deb' packages as Debian packages are called. If you are a normal user, you would probably never use *dpkg*, but if you're installing individual, non-repository packages, you might. - GUI variants of the above. - http://www.nongnu.org/synaptic/action.html[synaptic] - full GTK GUI that wraps *apt* (requires an X11 graphics screen) - https://wiki.debian.org/Aptitude[aptitude] - a curses/text-based UI that wraps apt (works in a terminal) How you would search for and install xemacs, a powerfully weird text editor on a 'Ubuntu/Mint' system. First, find the package names that include the functionality you want. -------------------------------------------------------------------------------------------- $ apt search xemacs p xemacs21 - highly customizable text editor v xemacs21:i386 - p xemacs21-basesupport - Editor and kitchen sink -- compiled elisp support files p xemacs21-basesupport-el - Editor and kitchen sink -- source elisp support files p xemacs21-bin - highly customizable text editor -- support binaries p xemacs21-bin:i386 - highly customizable text editor -- support binaries p xemacs21-mule - highly customizable text editor -- Mule binary p xemacs21-mule:i386 - highly customizable text editor -- Mule binary p xemacs21-mule-canna-wnn - highly customizable text editor -- Mule binary compiled with Canna and Wn p xemacs21-mule-canna-wnn:i386 - highly customizable text editor -- Mule binary compiled with Canna and Wn p xemacs21-mulesupport - Editor and kitchen sink -- Mule elisp support files p xemacs21-mulesupport-el - Editor and kitchen sink -- source elisp support files p xemacs21-nomule - highly customizable text editor -- Non-mule binary p xemacs21-nomule:i386 - highly customizable text editor -- Non-mule binary p xemacs21-support - highly customizable text editor -- architecture independent support files p xemacs21-supportel - highly customizable text editor -- non-required library files -------------------------------------------------------------------------------------------- Then once you've identified the package ('xemacs21' in this case), install it using the *apt* command. -------------------------------------------------------------------------------------------- $ sudo apt install xemacs21 Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libcompfaceg1 xemacs21-basesupport xemacs21-bin xemacs21-mule xemacs21-mulesupport xemacs21-support Suggested packages: xemacs21-supportel The following NEW packages will be installed: libcompfaceg1 xemacs21 xemacs21-basesupport xemacs21-bin xemacs21-mule xemacs21-mulesupport xemacs21-support ... Install pydb for xemacs21 Install systemtap-common for xemacs21 install/systemtap-common: Ignoring unsupported flavor xemacs21 Install dictionaries-common for xemacs21 install/dictionaries-common: Byte-compiling for emacsen flavour xemacs21 Compiling /usr/share/xemacs21/site-lisp/dictionaries-common/debian-ispell.el... Wrote /usr/share/xemacs21/site-lisp/dictionaries-common/debian-ispell.elc Compiling /usr/share/xemacs21/site-lisp/dictionaries-common/ispell.el... Wrote /usr/share/xemacs21/site-lisp/dictionaries-common/ispell.elc Compiling /usr/share/xemacs21/site-lisp/dictionaries-common/flyspell.el... Wrote /usr/share/xemacs21/site-lisp/dictionaries-common/flyspell.elc Done Setting up xemacs21 (21.4.22-14ubuntu1) ... update-alternatives: using /usr/bin/xemacs21 to provide /usr/bin/xemacs (xemacs) in auto mode # xemacs is entirely installed and configured. -------------------------------------------------------------------------------------------- ==== RedHat-derived (RHEL, CentOS, Fedora) - *yum* (from 'Yellowdog Updater, Modified) functions similarly to the *apt* package above, but typically a bit slower. - *rpm* is the RedHat rough equivalent of dpkg NB: the Debian repos are larger by a factor of 2-10 than the RedHat repos (depending on which repos you query). This is significant especially for researcher who want to have access to more packages and also to developers who need access to more, and more 'recent', libraries and development tools. How you would search for and install xemacs on a 'CentOS' system. First use *yum' to search for the appropriate package (and note that the packages are usually named differently on RedHat and Debian-derived systems ('xemacs21' on the Debian system above and 'xemacs' on the CentOS system below). -------------------------------------------------------------------------------------------- $ yum search xemacs ==================== N/S Matched: xemacs =========================== flim-xemacs.noarch : Basic library for handling email messages for XEmacs xemacs-common.x86_64 : Byte-compiled lisp files and other common files for XEmacs xemacs-devel.i686 : Development files for XEmacs xemacs-devel.x86_64 : Development files for XEmacs xemacs-el.x86_64 : Emacs lisp source files for XEmacs xemacs-erlang.noarch : Compiled elisp files for erlang-mode under XEmacs xemacs-erlang-el.noarch : Elisp source files for erlang-mode under XEmacs xemacs-filesystem.x86_64 : XEmacs filesystem layout xemacs-info.x86_64 : XEmacs documentation in GNU texinfo format xemacs-packages-base.noarch : Base lisp packages for XEmacs xemacs-packages-base-el.noarch : Emacs lisp source files for the base lisp packages for XEmacs xemacs-packages-extra.noarch : Collection of XEmacs lisp packages xemacs-packages-extra-el.noarch : Emacs lisp source files for XEmacs packages collection xemacs-packages-extra-info.noarch : XEmacs packages documentation in GNU texinfo format xemacs-w3m.noarch : Compiled elisp files to run Emacs-w3m Under XEmacs xemacs-w3m-el.noarch : Elisp source files for Emacs-w3m under XEmacs xemacs.i686 : Different version of Emacs xemacs.x86_64 : Different version of Emacs xemacs-nox.x86_64 : Different version of Emacs built without X Windows support xemacs-xft.x86_64 : Different version of Emacs built with Xft/fontconfig support Name and summary matches only, use "search all" for everything. -------------------------------------------------------------------------------------------- And now use yum to install it -------------------------------------------------------------------------------------------- $ yum install xemacs Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package xemacs.x86_64 0:21.5.31-5.el6 will be installed ... Verifying : xemacs-filesystem-21.5.31-5.el6.x86_64 2/7 Verifying : xemacs-packages-base-20100727-1.el6.noarch 3/7 Verifying : Canna-libs-3.7p3-28.el6.x86_64 4/7 Verifying : xemacs-21.5.31-5.el6.x86_64 5/7 Verifying : xemacs-common-21.5.31-5.el6.x86_64 6/7 Verifying : compface-1.5.2-11.el6.x86_64 7/7 Installed: xemacs.x86_64 0:21.5.31-5.el6 Dependency Installed: Canna-libs.x86_64 0:3.7p3-28.el6 compface.x86_64 0:1.5.2-11.el6 neXtaw.x86_64 0:0.15.1-14.el6 xemacs-common.x86_64 0:21.5.31-5.el6 xemacs-filesystem.x86_64 0:21.5.31-5.el6 xemacs-packages-base.noarch 0:20100727-1.el6 Complete! -------------------------------------------------------------------------------------------- The above utilities will only work with *root* access and only work to install applications from configured repositories. ==== Cross-Distro Utilities - *alien* is a utility that can interconvert a number of 'alien' packages to Debian-friendly 'debs'. It currently supports Red Hat 'rpm', Debian 'deb', Stampede 'slp', Slackware 'tgz', and Solaris 'pkg' formats, interconverting in both directions, altho it's not recommended to use it on base system packages. === Distribution-Independent There are a number of application sources that are not linked to a particular distribution, but have been invented strictly as a porting mechanism. ==== Linuxbrew http://linuxbrew.sh[Linuxbrew] is a silly name for a very impressive package manager that was derived from the https://brew.sh/[MacOSX Homebrew] and in fact supports a surprising number of scientific softwares, some much more recent than even the Debian *apt* system, which is very impressive. *brew* is fairly easy to install and manage, altho it dives fairly deep into the guts of how Linux applications are executed. //[example to follow] ==== pkgsrc https://www.pkgsrc.org/[pkgsrc] was originally part of the BSD Unix platform and also supports the BSD-derived IllumOS (a Solaris spinoff) and MacOSX, but has been ported to Linux as well. It emphasizes a build-from-source approach but there are binary distributions as well. It claims to support about http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/README-all.html[17K packages]. However it is not a native application for any of the Linux distros (ie. you can't yet 'apt install pkgsrc') and needs to be manually bootstrapped. It also requires builds to stay within the *pkgsrc* source tree for reliability, so it's a fairly separate application tree. It is starting to become stable enough to use for large projects, but it's not for beginners. Jason Bacon maintains a very good *pkgsrc* document at http://uwm.edu/hpc/software-management/[UW Madison]. //[example to follow] ==== easybuild https://easybuild.readthedocs.io/en/latest/[easybuild] is another build system specifically for scientific and research software. Writ in Python, it is similar to *pkgsrc* but more tilted to research applications. It tries to break the software build process into formal chunks ('easyblocks') and use such formalisms to make it easier to compile otherwise very complex software systems. //[example to follow] ==== spack https://spack.io/[Spack] is package manager for predominantly scientific software and can compile software trees for both Linux and MacOSX. *spack* can be installed with a simple 'git clone' & configure operation so it's relatively simple for such a system. *spack* is a relatively new system and while it builds most mainline packages well, we are still seeing numerous failures in day-to-day operations. //[example to follow] ==== Bitnami https://bitnami.com/[Bitnami] packages large application stacks in an easy-to-install superpackage and makes the configuration easier than a manual installation. The superpackages are typically web stacks such as https://en.wikipedia.org/wiki/Content_management_system[Content Management Systems] like https://wordpress.org/about/[WordPress] & https://www.joomla.org/[Joomla], eCommerce stacks such as https://www.prestashop.com/en[PrestaShop] and https://magento.com/[Magento], not so much for scientific applications, although it also does provide infrastructure for Linux Containers and some popular databases. === Linux Containers https://en.wikipedia.org/wiki/Linux_containers[Containers] generically can be considered very lightweight Virtual Machines (VMs) that share the kernel of the host OS but little else and in shedding that overhead, are usually considerably smaller than a true VM. In that manner, they can run variants of that OS. For example, you could run a Debian container on a CentOS base OS, as long as the Debian OS used the same kernel, but you COULD NOT run a 'Windows' container on a 'Linux' system of any kind. However you COULD run a complete Windows VM on a Linux base OS (and vice versa) since the VM brings with it the entire kernel as well as utilities, etc. I'll be referring only to Linux containers going forward. This sounds like (and is) an odd feature until you realize that for HPC, there is one OS that rules them all (Linux) but https://distrowatch.com/[zillions of variants] (some of which are referenced above). In reality, many complex scientific codes (http://neuro.debian.net/[NeuroDebian], http://environmentalomics.org/bio-linux/[BioLinux], http://www.thevirtualbrain.org/tvb/zwei[TVB], http://mriqc.readthedocs.io/en/stable/[mriqc] are developed and released for one of these distributions and if your cluster runs the wrong distribution (or even the wrong version of that distribution), it can require a huge amount of work to host the new software. Containers allow you use a released package much more easily, and especially keep a defined version of that software easily accessible and re-usable. Like Linux itself, there are various flavors of containers and container orchestration tools, but the main container systems I'll touch on are https://www.docker.com/[Docker], https://en.wikipedia.org/wiki/LXC[LXC] (Linux Containers), and http://singularity.lbl.gov/[Singularity]. Docker is the overwhelming leader in Container technology and has a number of things going for it. It has a 'git'-like organization and it has a lot of infrastructure for use, search, and utility. It's also free enough that it has been integrated into the software repositories for most Linux Distributions. However, in the HPC environment, it has some problems related to https://fosterelli.co/privilege-escalation-via-docker.html[root escalation] and http://singularity.lbl.gov/docs-security[other security problems]. Singularity is an alternative container which mostly addresses those problems and can even 'consume' Docker images to provide a more secure environment for running Docker images in HPC or multiuser environments. In the UCI HPC environment, we are just starting to enable Singularity images to be run by users. If you have a Containerized software system that you want to run, by all means talk to us and we'll try to arrange for it to be made into a Singularity image or we can also probably install it natively. One advantage of running a large, multi-user system is that we have many dependencies already available. == Installing Software As noted above, a lot of the basic software you'll need is already available to you via the distro's native installation mechanism. However, when you need to install software that is outside of those repositories, you'll be responsible for doing the installation yourself. Below, I describe some of the more popular (or more accurately 'frequently available') mechanisms for installing software. === By tarball In the past and still quite often, software was made available in what we refer to as 'tarballs' or 'tarchives'. These terms describe collections of files, often with a single root directory, which contain all the code necessary to install or compile a piece of software. They are made available in the 'tar' format (derived from the ancient terms 'tape archive') and are essentially a concatenation of all the files into one file. That 'tar' file is usually compressed as well by a variety of formats (gzip -> .gz, bzip2 -> .bz, xz -> xz, etc). Sometimes, especially if the software is multi-platform, it may also be packaged as a 'zip' archive, which came from the PC/Windows world. Such tarballs (as well as the 'zip' format) can contain a few different formats for distributing the software. ==== Precompiled executables Binary executable packages usually contain only the executable application plus some documentation. The executables are generally http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#staticdynamic[statically compiled] to run on the widest possible distributions of Linux. When these tarballs are unpacked (link:#untarring[see below]), all that has to be done is to: - *chown* the executable to the right ownership (if there are overlaps in User IDs from the packaging system to the unpacking system, and your https://www.cyberciti.biz/tips/understanding-linux-unix-umask-value-usage.html[umask] is incorrectly set, the package may become owned by another user.) - http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#_what_makes_a_program_executable[chmod] it to be executable. - and then *mv* it to a dir on your http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#PATH[PATH]. ==== Scripts Software based on interpreted scripts such as Python or Perl generally need to be treated as the 'Binary Executables' directly above. A difference that you may run into is that poorly packaged scripts may have a hard-coded http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#shebang[shebang] line that points to either the developer's preferred interpreter or an optional one you don't have. As http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html##usrbinenv[described here], change the 'shebang' line to a universal one. ==== Source Code installs A source code installation is often much more involved than a binary or interpreted script installation and usually needs the expertise described in the document http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html[How Programs Work on Linux]. I'll describe an idealized installation that goes over most of the steps needed to install a source code document, but certainly not all. I'll demonstrate using a package called *tacg*. ===== Get the tarball. The archive will usually be available via the Internet. Use *wget* to obtain the tarball: ------------------------------------------------------------------- export CT=compiletest mkdir $CT cd $CT wget http://moo.nac.uci.edu/~hjm/tacg-4.6.0-src.tar.bz2 ls -l total 960 -rw-rw-r-- 1 hjm hjm 631039 Nov 3 13:50 tacg-4.6.0-src.tar.bz2 ------------------------------------------------------------------- [[untarring]] Don't immediately unpack the tarball. The 'correct/polite' way to create a tarball is to have it rooted in a top-level directory so that when you unpack it, the files don't explode in the current dir. However, perhaps 5% of such tarballs are created so that everything is unpacked in the 'current' dir, creating a huge mess that has to be cleaned up. So check beforehand by using the '-t' (mnemonic 'tell') option to *tar* to show what's in the tarball BEFORE you untar it to disk. ------------------------------------------------------------------- # we're going to use the following options: # t = tell # j = use the bzip2 decompression ('j' calls the same compression when 'creating') # v = be verbose when operating # f = use the following file as input (tacg-4.6.0-src.tar.bz2 in this case) # the suffix (| head) only shows us the 1st 10 lines of output $ tar -tjvf tacg-4.6.0-src.tar.bz2 | head tacg-4.6.0-src/ tacg-4.6.0-src/Data/ tacg-4.6.0-src/Data/codon.data tacg-4.6.0-src/Data/matrix.data tacg-4.6.0-src/Data/rebase.dam+dcm.data tacg-4.6.0-src/Data/rebase.dam.data tacg-4.6.0-src/Data/rebase.data tacg-4.6.0-src/Data/rebase.dcm.data tacg-4.6.0-src/Data/regex.data tacg-4.6.0-src/Data/rules.data ... ------------------------------------------------------------------- The above shows us that the tarball is rooted in a separate dir (tacg-4.6.0-src) so we can go ahead and unpack it using a similar command: ------------------------------------------------------------------- # the only change is t -> x (for extract) $ tar -xjvf tacg-4.6.0-src.tar.bz2 tacg-4.6.0-src/ tacg-4.6.0-src/Data/ tacg-4.6.0-src/Data/codon.data tacg-4.6.0-src/Data/matrix.data ... # now look at what we have: the original tarball and the extracted dir. $ ls tacg-4.6.0-src/ tacg-4.6.0-src.tar.bz2 ------------------------------------------------------------------- Now to compile it. ------------------------------------------------------------------- $ cd tacg-4.6.0-src $ ls AUTHORS INSTALL ReadEnzFile.c config.guess* seqio.c COPYING Makefile.am ReadMatrix.c config.sub* seqio.h COPYRIGHT Makefile.in ReadRegex.c configure* tacg.c ChangeLog MatrixMatch.c RecentFuncs.c configure.in tacg.h Cutting.c NEWS SeqFuncs.c control* tacgi4/ Data/ ORF.c Seqs/ install-sh* test/ Docs/ Proximity.c SetFlags.c missing GelLadSumFrgSits.c README SlidWin.c mkinstalldirs* ------------------------------------------------------------------- The above is a fairly simple project - some configuration files (config, Makefile, install.sh), some Identifying files (AUTHORS, COPYING, COPYRIGHT) , some info files (README, Docs dir), the C source code (*.c, *.h), and some other dirs that contain auxiliary information. In any such layout, the 1st thing to do is to actually READ the [green]#README# file. It's usually useful, in this case simply telling what *tacg* is good for. After you absorb the [green]#README# and if you still want to compile the code, read the [green]#INSTALL# file which should tell you how to do just this.. In this case, it's pretty precise, but the main thing it says is to use the included [green]#configure# script to generate a [green]#Makefile# out of the [green]#Makefile.am# template. Unless the author has subverted the *./configure* script, it will also have a useful '--help' function. Be sure to try that option first to see if there are any 'gotchas' or special options you should be aware of. Also, there is one *./configure* option that is critical to installing software on a shared system where you don't have root permissions, the '--prefix' option which tells the [green]#Makefile# where to install the program once it's compiled. In the example below, I'm going to choose to install it in my *'HOME'* dir, so executables will go into [green]#~hjm/bin#, header files will go into [green]#~hjm/include#, libs will go into[green]#~hjm/lib#, manuals into [green]#~hjm/man#, and so on. [[sameasgit]] ------------------------------------------------------------------- $ ./configure --help | less # how to compile & where to install the program # then if nothing alarming or surprising to address, start it off with # the appropriate options $ ./configure --prefix=/home/hjm # install it in my home dir checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk ... etc # checks for all the libs, utilities, compilers, etc it will need to compile tacg # if it finds something wrong, it will stop with a fatal error and tell you how to fix it. ------------------------------------------------------------------- Now everything is ready to compile - if you look at the files again, you'll notice 3 new ones. ------------------------------------------------------------------- $ ls -lt | head total 2084 -rw-rw-r-- 1 hjm hjm 25883 Nov 3 14:22 config.log -rw-rw-r-- 1 hjm hjm 14494 Nov 3 14:22 Makefile -rwxrwxr-x 1 hjm hjm 29839 Nov 3 14:22 config.status* ... ------------------------------------------------------------------- - [green]#config.log# is the log from the configure script. If something went wrong, it will be described somewhere in that log. - [green]#Makefile# is the input to Gnu *make* which will actually drive the compilation. - [green]#config.status# is a shell script that will regenerate the current configuration; rarely needed. Now we can initiate the actual *make* ------------------------------------------------------------------- $ make -j4 # '-j4 builds in 4 separate parallel processes gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"tacg\" -DVERSION=\"4.6.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STRUCT_STAT_ST_BLKSIZE=1 -DHAVE_ST_BLKSIZE=1 -DHAVE_LIBM=1 -DHAVE_LIBPCRE=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_FCNTL_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDLIB_H=1 -DHAVE_UNISTD_H=1 -DHAVE_SYS_PARAM_H=1 -DHAVE_GETPAGESIZE=1 -DHAVE_MMAP=1 -DHAVE_STRFTIME=1 -DHAVE_VPRINTF=1 -DHAVE_ALLOCA_H=1 -DHAVE_ALLOCA=1 -DHAVE_PUTENV=1 -DHAVE_STRSEP=1 -DHAVE_STRDUP=1 -DHAVE_STRSPN=1 -DHAVE_STRSTR=1 -DHAVE_UNAME=1 -I. -I. -DBUILD_DATE=\"Fri\ Nov\ \ 3\ 14:45:50\ PDT\ 2017\" -DUNAME=\"Linux\ stunted\ 4.4.0-21-generic\ #37-Ubuntu\ SMP\ Mon\ Apr\ 18\ 18:33:37\ UTC\ 2016\ x86_64\ x86_64\ x86_64\ GNU/Linux\" -DGCC_VER=\"gcc\ Ubuntu\ 5.4.0-6ubuntu1~16.04.5\ 5.4.0\ 20160609\" -g -O2 -Wall -c seqio.c ... etc # depending on how the Makefile is written, it may show or hide much of the process shown above ... gcc -g -O2 -Wall -o tacg seqio.o Cutting.o Proximity.o GelLadSumFrgSits.o ReadEnzFile.o SeqFuncs.o MatrixMatch.o ReadMatrix.o SetFlags.o ORF.o ReadRegex.o SlidWin.o tacg.o RecentFuncs.o -lpcre -lm ------------------------------------------------------------------- The last line above shows the link step where all the newly compiled object files are linked together, with the required math (-lm) and regular expression (-lpcre) libraries, finally creating an application called *tacg*. ------------------------------------------------------------------- $ ls -lt | head total 5980 -rwxrwxr-x 1 hjm hjm 1484792 Nov 3 14:45 tacg* -rw-rw-r-- 1 hjm hjm 1189296 Nov 3 14:45 seqio.o -rw-rw-r-- 1 hjm hjm 144752 Nov 3 14:45 tacg.o -rw-rw-r-- 1 hjm hjm 94304 Nov 3 14:45 RecentFuncs.o ... ------------------------------------------------------------------- You'll notice that there are also object files ending in '.o' that correspond to each C source code file. Don't delete these yet. If you need to recompile the source files, only the source code files that are 'newer' than the object files will be re-compiled. This can be quite significant in large projects. If we run ldd on 'tacg', we'll see what libraries it needs to run: ------------------------------------------------------------------- $ ldd tacg linux-vdso.so.1 => (0x00007ffd69500000) libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fa06ddd7000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa06dace000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa06d703000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa06d4e6000) /lib64/ld-linux-x86-64.so.2 (0x0000556f8b73f000) ------------------------------------------------------------------- So in addition to the explicit libraries that it linked to (libm & libpcre), it also uses: - linux-vdso.so.1 -> vdso="virtual dynamic shared object", which links the different shared objects together - libc.so.6 -> the GNU C library which is the core of the Linux OS - libpthread.so.0 -> the GNU threading library (odd, bc tacg doesn't use explicit threading) - ld-linux-x86-64.so.2 -> the kernel handler that launches compiled applications So now, let's see what it does. A well-designed Linux application will tell you what it does if there's no input or output specified ------------------------------------------------------------------- $ ./tacg type 'tacg -h' or 'man tacg' for more help on program use or type: tacg -n6 -slLc -S -F2 < [your.input.file] for an example of what tacg can do. ------------------------------------------------------------------- Hmmm, let's try 'man tacg' to see if we can get more documentation: ------------------------------------------------------------------- $ man tacg No manual entry for tacg See 'man 7 undocumented' for help when manual pages are not available. ------------------------------------------------------------------- OK, that's because we haven't 'installed' tacg yet, so the man pages are in the document tree but not available, since they're not on the http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#_manpath[MANPATH]. We can either point 'man' directly to the manpage or modify *'MANPATH'* to include the dir where the man page is. The former is easier. ------------------------------------------------------------------- $ man Docs/tacg.1 # shows the following: .............................................................................. tacg(1) General Commands Manual tacg(1) NAME tacg - finds short patterns and specific combinations of patterns in nucleic acids, translates DNA <-> protein. SYNOPSIS tacg -flag [option] -flag [option] ... output.file tacg takes input from a file (--infile) or via stdin (| or <); spits output to screen (default), >file, | next command etc. .............................................................................. ------------------------------------------------------------------- Now we want to test *tacg* to make sure it's built correctly. The *make test* (sometimes *make check*) functionality is not often built into distributions, but sometimes it is. Try both, if it's not made explicit in the [green]#README# or [green]#INSTALL# files. The last step, once you've checked the compiled application, is to install it. If you used the '--prefix' option during the *./configure* stage, all you have to do it to *make install*. If the compilation does not include an installation step, you will have to copy the critical files into place yourself. This usually involves copying any executables and scripts into your $HOME/bin (if on a multi-user system) or into [green]#/usr/local/bin# if you're using your own laptop and have [red]#root# access for the install. === by git https://en.wikipedia.org/wiki/Git[git] is the worlds most popular https://en.wikipedia.org/wiki/Version_control[Version Control] or Source Code Management (SCM) system and https://en.wikipedia.org/wiki/GitHub[github] is a large, low-cost internet site that uses *git* as well as other web technologies to provide a kind of universal software nexus. There are many other SCMs (http://www.nongnu.org/cvs/[CVS], https://subversion.apache.org/[Subversion], http://www.bitkeeper.org/[Bitkeeper], https://www.mercurial-scm.org/[mercurial]), but *git* and 'github' are among the most popular ones. Extracting the source code from Github or other remote git-based site is slightly different from the source tarball mechanism described above. *git* is especially notable because when you "download" a git-based project, you're not only obtaining the source code and docs, but you're also establishing a local mirror repository of the original site. The command used to do this is highly descriptive and accurate: *git clone*. We'll use a different project called 'fpart' to demonstrate the *git* approach. When you view a github project page https://github.com/martymac/fpart[like this one for fpart], one of the distinctive features is a green button with 'Clone or Download' text on it. Clicking it allows you to copy the *git* repository URL to your computer's clipboard. So now we're ready. ------------------------------------------------------------------- # prep the directory $ cd $ export GD=git_test $ mkdir $GD $ cd $GD $ pwd /home/hjm/git_test # now clone the git repository, where the last string (starting with https://) # was copied from the github site. $ git clone https://github.com/martymac/fpart.git Cloning into 'fpart'... remote: Counting objects: 1391, done. remote: Compressing objects: 100% (18/18), done. remote: Total 1391 (delta 4), reused 8 (delta 3), pack-reused 1370 Receiving objects: 100% (1391/1391), 302.58 KiB | 0 bytes/s, done. Resolving deltas: 100% (917/917), done. Checking connectivity... done. $ ls fpart/ $ cd fpart $ ls -lat total 84 drwxrwxr-x 7 hjm hjm 0 Nov 3 16:00 ./ drwxrwxr-x 8 hjm hjm 0 Nov 3 16:00 .git/ < .git dir tracks checkouts, changes, etc drwxrwxr-x 2 hjm hjm 0 Nov 3 16:00 src/ drwxrwxr-x 2 hjm hjm 0 Nov 3 16:00 tools/ -rw-rw-r-- 1 hjm hjm 4223 Nov 3 16:00 Changelog -rw-rw-r-- 1 hjm hjm 110 Nov 3 16:00 Makefile.am -rw-rw-r-- 1 hjm hjm 11739 Nov 3 16:00 README -rw-rw-r-- 1 hjm hjm 1453 Nov 3 16:00 TODO -rw-rw-r-- 1 hjm hjm 2451 Nov 3 16:00 configure.ac drwxrwxr-x 3 hjm hjm 0 Nov 3 16:00 contribs/ drwxrwxr-x 2 hjm hjm 0 Nov 3 16:00 man/ -rw-rw-r-- 1 hjm hjm 1322 Nov 3 16:00 COPYING drwxrwxr-x 3 hjm hjm 0 Nov 3 16:00 ../ ------------------------------------------------------------------- In a *git* source checkout, there's one special dir called '.git' which contains a number of files and dirs that helps to track the files, their versions, monitored files, file differences with the original repository, and a number of other variables. We're not going to worry about it except where it helps us sync with the parent repository. In the listing of files above, you can see a few differences and similarities between this git repo and the tarball installation described above. - the source code is in tucked into its own 'src' dir. - there is no *configure* script, altho there is a [green]#configure.ac# file - there is a [green]#README# file which provides the same kind of information as does the [green]#README# and [green]#INSTALL# files in the tarball installation, altho this layout is much more flexible than the more formal GNU toolchain format that *tacg* uses. - there is another dir called 'tools' (specific to fpart) which has some additional utilities in it. The consolidation of source code files into its own 'src' dir is a personal choice, altho it does help segregate the source code from the supporting files. Similarly, adding additional dirs to hold configuration files, extra utilities, files for testing build integrity, etc is completely normal. It's only the first part of the *git* build that is slightly different. Because of the missing configure script, we have to build it anew using another GNU autoconf tool called *autoreconf* ------------------------------------------------------------------- # now we're in the fpart top-level dir $ pwd /home/hjm/git_test/fpart # now launch autoreconf $ autoreconf -i configure.ac:7: installing './compile' configure.ac:30: installing './config.guess' configure.ac:30: installing './config.sub' configure.ac:4: installing './install-sh' configure.ac:4: installing './missing' src/Makefile.am: installing './depcomp' # now we're set to go since we now have a 'configure' script.. ------------------------------------------------------------------- From this point on, it's just like the compile process described above link:#sameasgit[from this point on], substituting project and file names as needed. ==== Updating a git repo There is one aspect of git that is different than a tarball-based install and that it when the parent git is updated, you can sync your system with the parent with a simple 'git pull' executed in the main git dir (the one that contains the '.git' dir. ------------------------------------------------------------------- $ pwd /home/hjm/git_test/fpart $ git pull # after a few days remote: Counting objects: 65, done. remote: Compressing objects: 100% (36/36), done. remote: Total 65 (delta 35), reused 56 (delta 26), pack-reused 0 Unpacking objects: 100% (65/65), done. From https://github.com/martymac/fpart 9dd3784..71ccfab master -> origin/master * [new tag] fpart-1.0.0 -> fpart-1.0.0 Updating 9dd3784..71ccfab Fast-forward Changelog | 3 +- README | 211 +++++++++++++++++++++++++++++++++--------------- TODO | 3 + configure.ac | 2 +- contribs/package/rpm/fpart.spec | 6 +- docs/Solving_the_final_pass_challenge.txt | 261 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ man/fpart.1 | 2 +- man/fpsync.1 | 28 ++++++- src/fpart.c | 2 +- src/fpart.h | 2 +- tools/fpsync | 14 ++-- 11 files changed, 450 insertions(+), 84 deletions(-) create mode 100644 docs/Solving_the_final_pass_challenge.txt # and immediately afterwards, $ git pull Already up-to-date. # no more changes to sync ------------------------------------------------------------------- == Installing personal software As noted in the introduction, there are times when you need to install software on shared computing platforms where you can't act as root. This section describes how to do that, first in general, and then a brief description for several popular programming platforms. === In general Since the only place you can 'write' on most shared platforms is *$HOME*, that's the place where you should root your installation. This means if you use the command: ------------------------------------------------------------- $ ./configure --prefix=$HOME/sw ------------------------------------------------------------- in the configuration, after building the project, *make install* will copy the pieces relative to your $HOME. ie: - executables in [green]#~/sw/bin# (and then add that directory to your http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#PATH[PATH] - include (*.h) files in [green]#~/sw/include# - libraries in [green]#~/sw/lib# - documents in [green]#~/sw/share# - man pages in [green]#~/sw/man/manX# (X depending on what kind of package it it) Since you can allow or prevent others from accessing the different part of your *'$HOME'*, you can share or shield these component via the *chmod* command. === R The core *R* packages will usually be installed in a central location, often accessed by the use of the *module* command (ie: *module load R/3.4.2*). If that's the case and you need to install a library to compete an analysis, first load the R version you need, to set up all the environment variables. After that, there are 2 ways to install the software: - outside of R, using *R CMD INSTALL*, after downloading an R package (in this case Rmpi): + Note the format for passing in the *configure* options. NB: note especially the line '--configure-args="--prefix=$HOME"' which directs installation into a dir that YOU own and can write into. ------------------------------------------------------------- R CMD INSTALL --configure-vars='LIBS=-L/apps/openmpi/1.4.2/lib' \ --configure-args="--prefix=$HOME" \ --configure-args="--with-Rmpi-type=OPENMPI \ --with-Rmpi-libpath=/apps/openmpi/1.4.2/lib \ --with-mpi=/apps/openmpi/1.4.2 \ --with-Rmpi-include=/apps/openmpi/1.4.2/include" \ Rmpi_0.5-9.tar.gz ------------------------------------------------------------- - inside of R, using *install.packages* ------------------------------------------------------------- # as root if installing for whole platform $ R ... > install.packages("", dependencies = TRUE, repos="http://cran.cnr.Berkeley.edu") # eg: > install.packages("ggplot2", dependencies=TRUE) > install.packages("BiodiversityR", dependencies=TRUE) ------------------------------------------------------------- However, if you're not root, then the installation will detect your inability to install into the system areas and offer an alternative: ------------------------------------------------------------- # as a non-root user $ R ... > install.packages("ggplot2", dependencies=TRUE, repos="http://cran.cnr.Berkeley.edu") Warning in install.packages("ggplot2", dependencies = TRUE, repos = "http://cran.cnr.Berkeley.edu") : 'lib = "/data/apps/R/3.1.2/lib64/R/library"' is not writable Would you like to use a personal library instead? (y/n) y Would you like to create a personal library ~/R/x86_64-unknown-linux-gnu-library/3.1 to install packages into? (y/n) also installing the dependencies 'openssl', 'backports', 'Rcpp', 'viridisLite', 'rlang', 'rex', 'httr', 'crayon', 'sp', 'praise', 'knitr', 'yaml', 'htmltools', 'evaluate', 'rprojroot', 'stringr', 'gdtools', 'scales', 'tibble', 'covr', 'ggplot2movies', 'hexbin', 'mapproj', 'maps', 'maptools', 'testthat', 'rmarkdown', 'svglite' trying URL 'http://cran.cnr.Berkeley.edu/src/contrib/openssl_0.9.9.tar.gz' Content type 'application/x-gzip' length 1112927 bytes (1.1 Mb) opened URL ================================================== downloaded 1.1 Mb ... > library("ggplot2") # and there you are... ------------------------------------------------------------- In the above scenario, the other libraries are also installed in your local R/VERSION lib dir and unless you explicitly make them readable by other users (and they modify their environment variables to point to your installation), no one else will be able to make use of your *R* libraries. If you're using a shared resource like a cluster, it's worthwhile to ask your sysadmins to install popular packages rather than everyone installing them by themselves. === Python Python (both the generic & specific distributions (see below) use (or can use) 2 main installation mechanisms: - https://pip.pypa.io/en/stable/[pip] - http://peak.telecommunity.com/DevCenter/EasyInstall[easy_install] - (the differences between *pip* and *easy_install* https://packaging.python.org/discussions/pip-vs-easy-install/[are described here]), but *pip* is generally recommended. ==== Problems with packages with binary shared libs In both cases, if the package includes pre-compiled shared libs (ie: tensorflow), your laptop kernel is probably recent enough to support the http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html#GLIBC[GLIBC] version that the compiled libs require. However, machines running much older kernels such as cluster nodes may not be recent enough, so that even if you can install the package without error, when you try to run it, you'll hit the GLIBC error: -------------------------------------------------------------------------- $ python Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf Traceback (most recent call last): File "/data/apps/anaconda/3.6-4.3.1/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "/data/apps/anaconda/3.6-4.3.1/lib/python3.6/site-packages/tensorflow/python/ ImportError: /lib64/libc.so.6: version `GLIBC_2.16' not found (required by /data/apps/anaconda/3.6-4.3.1/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so) -------------------------------------------------------------------------- ==== Freemium Pythons Like many Open Source tools, Python has been 'Freemiumed' into multiple distributions. The 2 most popular ones are https://www.anaconda.com/[Anaconda] and https://www.enthought.com/product/canopy/[Enthought Canopy]. In addition, specific Python distributions have their own tools which query against their own repositories. The 2 most popular ones are the Enthought distribution and the Anaconda distribution. https://stackoverflow.com/questions/15762943/anaconda-vs-epd-enthought-vs-manual-installation-of-python[Opinions vary] on which is better for what. They provide: - https://conda.io/docs/[conda] for the https://www.anaconda.com/[Anaconda Python] series - https://goo.gl/csvSYT[enpkg] for the https://www.enthought.com/product/enthought-python-distribution/[Enthought Python] series. ===== Anaconda and conda https://www.anaconda.com/[Anaconda Python]is a science-targeted distribution and as such has a lot of performance-improved libs. It comes with its own installer which should be used preferentially over the generic installation tools (altho they can be used as well). The *conda* utility is used very much like *pip* but has a slightly different syntax and searches the Anaconda repositories rather than the PyPi repositories. In ==== Enthought Python / Canopy === MATLAB === Perl Like the rest of these systems, Perl is usually installed via the distro-specific tools mentioned above (yum, apt). You can search for the appropriate set of modules and install them easily into your own system Perl installation. However Perl also has its own installation tools, most based on https://www.cpan.org/[CPAN], the 'Comprehensive Perl Archive Network'. ==== by CPAN The commandline '*cpan* utility has an internal shell that allows you to search for and install all the component Perl modules, including dependencies like any other well-written tool. The 3 most useful *cpan* commands are 'h' (help), 'i' (info) and 'install' commands. ------------------------------------------------------ $ cpan cpan shell -- CPAN exploration and modules installation (v2.18) Enter 'h' for help. cpan[1]> h Display Information (ver 2.18) command argument description a,b,d,m WORD or /REGEXP/ about authors, bundles, distributions, modules ------------------------------------------------------ Now search for what you want: ------------------------------------------------------ cpan[2]> i /regex/ # where 'regex' is any regular expression cpan[2]> i /samtool/ Reading '/root/.local/share/.cpan/Metadata' Database was generated on Tue, 28 Nov 2017 02:17:03 GMT Distribution HARTZELL/Alien-SamTools-0.002.tar.gz Distribution LDS/Bio-SamTools-1.39.tar.gz Distribution LDS/Bio-SamTools-1.43.tar.gz Module < Alien::SamTools (HARTZELL/Alien-SamTools-0.002.tar.gz) Module < Bio::Tools::Run::Samtools (CJFIELDS/BioPerl-Run-1.007002.tar.gz) Module < Bio::Tools::Run::Samtools::Config (CJFIELDS/BioPerl-Run-1.007002.tar.gz) Module < Bio::Tradis::Samtools (AJPAGE/Bio-Tradis-1.3.3.tar.gz) 7 items found ------------------------------------------------------ And now install it: ------------------------------------------------------ cpan[4]> install Bio::Tools::Run::Samtools ------------------------------------------------------ ==== cpanminus [cpanminus] (aka *cpanm* a very handy utility that you can install (via cpan) which allows you to install modules easily form the commandline without entering the *cpan* shell ==== by tarball