1. Introduction
This document is for (non-CS) graduate students who:
-
are using Linux to do large scale data analysis on Linux systems
-
have little-to-no experience in computer programming
-
need to know how programs work
-
need to install programs writ by others
-
need how to fix common problems in installation
-
need to know how to bypass root requirements on large systems
-
and otherwise how to accelerate their own analysis without compromising the security of the larger system.
Hence this is a meta-document, not about How to Program (there’s almost nothing about programming in it) but about how programs are built and work on Linux systems and how to fix common problems associated with running them.
2. Why install your own programs?
Why indeed? Well, altho you should always try to be as lazy as possible, sometimes the utilities and programs that are available on your system don’t do the thing that you want. Altho the longer you program, the more you realize that there probably is an included utility that does pretty much what you want; you just have to find out about it - see Google.
That said, this is research, and quite often you do need a program that does something different and for that you can either beg someone else to do it, or do it yourself. By yourself can mean anything from copying and modifying a friend’s program to a gigantic hairball of an installation that requires a number of other dependencies, libraries, versions, etc - aka dependency hell.
So you need to install a program. Well …
3. What is a program?
A program is a set of instructions that tells the computer what to do. It can be a human-readable set of instructions, like the (badly written) perl program below:
#!/usr/bin/env perl # defines the interpreter to use; while (<>) { # consume & the input file on STDIN, line by line, until it ends # split the incoming line (the default variable on which to operate), on space (/ /) tokens # into elements of the array @A and store the token count in $N $N = @A = split (/ /); # loop thru the variables, printing them each on their own line for ($i=0; $i<$N; $i++){ print "Element [$i] = $A[$i]\n";} sleep 1; # pause for 1s between lines. }
This kind of program has to be dynamically translated into machine code by an interpreter (like Perl, Python, R, often defined by the '#!/usr/bin/env … ' line in the example above) or with different source code, compiled into a set of machine instructions with a compiler (as for C, C++, Fortran).
If you try to view compiled C++/C/Fortran code in a text editor or pager like less, you’ll see something like this:
^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^B^@^C^@^A^@^@^@P^D^H4^@^@^@d<F2>^L^@^@^@^@^@4^@ ^@^H^@(^@&^@#^@^F^@ ^H^D<AF> ^H<FC>^E^@^@^H6^@^@^F^@^@^@^@^P^@^@^B^@^@^@^X^_^F^@^X<AF> ^H^X<AF> ^H<D8>^@^@^@<D8>^@^@^@^F^@^@^@^D^@^@^@^D^@^@^@H^A^@^@H^D^HH^D^HD^@^@^@D^@^@^@^D^@^@^@^D^@^@^@ ^H^D<AF> ^H<FC>^@^@^@<FC>^@^@^@^D^@^@^@^A^@^@^@/lib/ld-linux.so.2^@^@^D^@^@^@^P^@^@^@^A^@^@^@GNU^@^@^@^@^@^B^@ ^H^@^@^@^@^P^@<F1><FF><A8>^A^@^@ <B5> ^H^D^@^@^@^Q^@^X^@N^@^@^@<C0><8F>^D^H^@^@^@^@^R^@^K^@1^@^@^@l ^@^@^@^@^R^@^N^@^V^B^@^@@<B5> ^H^D^@^@^@^Q^@^X^@^@libpcre.so.3^@__gmon_start__^@_Jv_RegisterClasses^@_fini^@pcre_exec^@pcre_compile <etc>
ie, unlike the source code, the compiled code gives little indication of the action of the program.
3.1. What makes a program executable?
For Linux, one thing that makes a program is that the file has its execute bit set. Without the execute permission bit set, even a compiled program will not be executed by the Operating System. ie:
$ ls -l tacg -rwxr-xr-x 2 hjm hjm 1495148 Oct 27 20:20 tacg* ^ ^ ^ the '^' above indicates the execute bits for the owner, group, and other. a permission line like this: --rwxr--r-- 2 hjm hjm 1495148 Oct 27 20:20 tacg* ^ allows only the owner to execute it ('group' and 'other' can't)
In an interpreted script, what makes a program executable is the presence of the shebang line pointing to the interpreter that you want to process the program. In the Perl example above, the shebang line was the 1st line, which is prefixed by a hash(#) and an exclamation sign(!) (or bang in geek). Hence shebang.
Often, thru carelessness or ignorance, files will be chmoded to be executable, when they aren’t valid programs. Libraries ('libsomething.so) are often set to be executable when they don’t need to be.
A file can’t be executed simply because the execute bit IS SET, but NOT having the execute bit set will prevent a program from running by calling its name. However, you CAN still execute a program that does not have the execute bit set by prefixing the name with the appropriate interpreter.
If irvinepines.pl is an otherwise functional perl program:
#!/usr/bin/env perl print "The Irvine pines are sublime..\n"; sleep 1; print "Until they burst into flames.\n";
Note that whether it is executable depends on the execution bits (as well as whether it’s a valid program).
# note no execute bits are set below $ ls -l irvinepines.pl -rw-rw-r-- 1 hjm hjm 116 Sep 19 15:54 irvinepines.pl # so when we try to execute it, we can't $ ./irvinepines.pl bash: ./irvinepines.pl: Permission denied # even tho 'file' identifies it as: $ file ./irvinepines.pl ./irvinepines.pl: a /usr/bin/env perl script, ASCII text executable # however, if we prefix a non-executable perl script with the interpreter name $ perl irvinepines.pl The Irvine pines are sublime.. Until they burst into flames. # or we can make it executable $ chmod +x irvinepines.pl $ ls -l irvinepines.pl -rwxrwxr-x 1 hjm hjm 118 Sep 19 15:56 irvinepines.pl* # ^ ^ ^ now everyone can execute it # now we can execute it simply by calling its name. $ ./irvinepines.pl The Irvine pines are sublime.. Until they burst into flames.
The above example holds true for most interpreted scripts, but not for compiled programs. ie you can’t cause a compiled program called tustintrees to execute simply by prefixing the name with a compiler. gcc tustintrees will not work.
4. Difference between scripts and programs
In this document, we’re going to call programs that require a separate interpreter a script (see Interpreter below) and those programs that are compiled (see Compiler below) into independent executable code a program.
The difference is apparent if you run file on them:
# this is a 'program' $ file /bin/ls /bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]= bd39c07194a778ccc066fc963ca152bdfaa3f971, stripped # this is a 'script' $ file ~/bin/clusterfork_1.75.pl /home/hjm/bin/clusterfork_1.75.pl: Perl script, ASCII text executable
5. Interpreters
Interpreters are used by languages like Perl, Python, R, Java, Julia, etc. An interpreter is a large program that takes human-readable source code, translates it into executable code, and then executes it directly, as opposed to a compiler (see below) which compiles source code into a file of executable code. Because of this repeated, real-time translation, interpreted code always run slower than compiled code, but often the difference is trivial, especially since many interpreted languages tend to have shortcuts or toolboxes of functionality available that enables the writer to generate useful code faster than with compiled code. Especially with one-off programs or those which deal with small datasets, the speed to utility overwhelms the speed of execution.
Java (and Microsoft’s C#) have advantages over more traditional interpreted languages; instead of progressing thru the traditional convert to machine code each time and then execute they both can compile (with javac in the case of Java) their source code to an intermediate, platform-independent byte-code which is then executed by a platform-specific interpreter. This can speed up Java to the point where the it is generally only a little slower than compiled programs for many things. Here’s a longer, but clear explanation.
All modern interpreted languages come with large libraries of extended functionality and often those libraries include compiled code such that any execution that uses them runs at compiled speeds. This is why Python, a relatively slow-to-execute language, is being used in many scientific areas, since the use of libraries like SciPy and NumPy enable compiled-speed execution times.
5.1. Scripts
The Perl (or Python or R) script is a file containing human-readable ASCII characters that the Interpreter can translate dynamically into executable code.
#!/usr/bin/env perl print "Hello World\n";
#!/usr/bin/env perl while (<>){ my @A = my $N = split; if ($N > 4) { print $_; } }
Use /usr/bin/env
Often, the shebang line will be explicit, like #!/usr/bin/perl or #!/usr/bin/python. To make your program more portable, use the more flexible #!/usr/bin/env interpeter format. ie #!/usr/bin/env perl. This causes the interpreter to be chosen by the environment variables set when you try to execute the program. This allows the program to be run both by the system default perl interpreter as well as another perl, perhaps loaded by a module command. NB: Typed into the bash shell, /usr/bin/env perl has almost no apparent effect since perl by itself is waiting for a script to act on, while /usr/bin/env python drops you into the python shell if there’s no code to interpret. In the same way, you can define the interpreters for Python, R, Julia, Java, Octave, etc
|
6. Compilers
Compilers are used by languages like C, C, Ada, Go, Fortran, etc. There are many compilers (Intel’s icc/icpc/ifort and PGI’s pgc/pgc/pcf make very good (and expensive) compilers, and there are also many free compilers (LLVM which Julia uses, for example)), but I’m going to stick to the free GNU compilers; the approach is similar for all of them.
gcc stands for the Gnu Compiler Collection and it is an astonishingly sophisticated toolkit that allows many languages to be used as inputs to the same engine to be turned into compiled chunks of machine object code, then linked to required libraries of functions with ld to produce an executable file. (This can be confusing since gcc can act as both compiler and linker, carrying out the various functionalities depending on how it is called.)
While they are not formally part of the gcc application, there are a number of tools that are associated with the process of writing and compiling code. Examples of these are the editors with which you write the code (and can also be integrated into the process by syntax-checking and color-coding as you type, calling documentation about functions, etc), the configuration tools (autoconf, configure, GNU make, Cmake, etc), as well as useful outliers like the SWIG Interface Generator and the patchelf utility for mixing and matching libraries (see below for an extended example of using patchelf). Many of these are described in more detail below.
6.1. Compiled programs
Creating a compiled program starts with writing the source code which looks similar to a script, except that it doesn’t begin with a shebang line. Here’s a very short C program
#include <stdio.h> #define STRING "Hello World" int main(void) { /* Using the macro defined above to print 'Hello World'*/ printf(STRING); return 0; }
There are more lines than in the Perl script above and there’s much more to the creation of the executable code, but rather than duplicating Google, let me reference a page that describes it well. That link describes a process that you probably don’t need to know at this stage, but may well become useful if you start writing your own code.
7. Utilities to build programs
There are typically several sets of tools used to build programs. Most of the ones described below are for compiling programs but some are often also used with interpreted programs, especially when they need a compiled library to provide functionality. Some languages, notably Java are hybrids, composed of portable byte code and OS and machine-specific interpreters.
7.1. Configuration tools
7.1.1. autogen & autoconf
These are the tools that take simple templates and generate full code to help generate the Makefile input templates and configure files described below. They are part of the GNU toolchain and are very useful (if initially confusing) in providing a semi-formal process for keeping nontrivial code projects under control. Here is a good description of how they work.
7.1.2. the ./configure script
The configure script is often supplied as part of a software distribution (either complete as configure or in template form as configure.ac - see below) and is used to query the system for locations of libraries, testing the libraries to see if they provide the functions it needs, and based on those results and the directives provided by the user, create a Makefile that directs the compiler(s) to create the necessary libraries and applications. It often has many options which can be viewed by supplying the --help option to it:
./configure --help
If the application is fairly simple, running ./configure by itself may often be enough to generate a workable Makefile. However, a complex configure script might look like this:
./configure --prefix=/data/apps/octave/4.2.1 --with-openssl=auto \ --with-java-homedir=/data/apps/java/jdk1.8.0_111 \ --with-java-includedir=/data/apps/java/jdk1.8.0_111/include \ --with-java-libdir=/data/apps/java/jdk1.8.0_111/jre/lib/amd64/server \ --enable-jit \ --with-lapack \ --with-blas \ --with-x --with-qt=5 \ --with-OSMesa-includedir=/usr/include/GL/ \ --with-OSMesa-libdir=/usr/lib64/ \ --with-blas --with-lapack \ --with-hdf5-includedir=/data/apps/hdf5/1.8.13/include \ --with-hdf5-libdir=/data/apps/hdf5/1.8.13/lib \ --with-fftw3-includedir=-I/data/apps/fftw/3.3.4-no-mpi/include \ --with-fftw3-libdir=/data/apps/fftw/3.3.4-no-mpi/lib \ --with-curl-includedir=/data/apps/curl/7.52.1/include \ --with-curl-libdir=/data/apps/curl/7.52.1/lib \ --with-magick=/data/apps/curl/7.52.1/lib \ --with-openssl=no
Wikipedia has more information on it as well.
7.1.3. Generating the Makefile and configure scripts
Often, especially in projects cloned from a github repository, the end-user configure and Makefile scripts don’t exist yet. Instead the git repository provides the template files configure.ac and Makefile.am that have to be converted into the usable scripts. If those precursor files do exist, the usual approach is to run autoreconf -i in that dir:
# Using 'fpart' utility as an example $ git clone https://github.com/martymac/fpart.git Cloning into 'fpart'... remote: Counting objects: 1370, done. remote: Total 1370 (delta 0), reused 0 (delta 0), pack-reused 1370 Receiving objects: 100% (1370/1370), 263.69 KiB | 0 bytes/s, done. Resolving deltas: 100% (913/913), done. Checking connectivity... done. $ cd fpart $ ls COPYING Changelog Makefile.am README TODO configure.ac contribs/ man/ src/ tools/ $ autoreconf -i configure.ac:7: installing './compile' configure.ac:30: installing './config.guess' configure.ac:30: installing './config.sub' configure.ac:4: installing './install-sh' configure.ac:4: installing './missing' src/Makefile.am: installing './depcomp' $ ls # note that the templates have been converted to 'Makefile.in' & 'configure' for the next stage COPYING README compile* configure.ac man/ Changelog TODO config.guess* contribs/ missing* Makefile.am aclocal.m4 config.sub* depcomp* src/ Makefile.in autom4te.cache/ configure* install-sh* tools/ # now run the configure script which generates the Makefile $ ./configure checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p ... configure: creating ./config.status config.status: creating Makefile config.status: creating src/Makefile config.status: creating tools/Makefile config.status: creating man/Makefile config.status: executing depfiles commands $ ls COPYING Makefile.in autom4te.cache/ config.status* contribs/ missing* Changelog README compile* config.sub* depcomp* src/ Makefile TODO config.guess* configure* install-sh* tools/ Makefile.am aclocal.m4 config.log configure.ac man/ # and now use 'make' to call the compiler to generate object code and link it all together. $ make Making all in src make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart/src' gcc -DPACKAGE_NAME=\"fpart\" -DPACKAGE_TARNAME=\"fpart\" -DPACKAGE_VERSION=\"0.9.4\" -DPACKAGE_STRING=\"fpart\ 0.9.4\" -DPACKAGE_BUGREPORT=\"ganael.laplanche@martymac.org\" -DPACKAGE_URL=\"\" -DPACKAGE=\"fpart\" -DVERSION=\"0.9.4\" -DHAVE_LIBM=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 ... ... make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart/src' Making all in tools make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart/tools' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart/tools' Making all in man make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart/man' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart/man' make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart' make[1]: Nothing to be done for 'all-am'. make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart' # the 'src' dir is often where the source code is located and # where the executable is left if the compile is successful $ ls src Makefile file_entry.c fpart-fpart.o fpart.h partition.c Makefile.am file_entry.h fpart-options.o fts.c partition.h Makefile.in fpart* fpart-partition.o fts.h types.h dispatch.c fpart-dispatch.o fpart-utils.o options.c utils.c dispatch.h fpart-file_entry.o fpart.c options.h utils.h # note the 'fpart*' executable (the '*' indicates that the execute bit is set. # check what it is - look! It's a 'real', compiled application $ file src/fpart src/fpart: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=39c88d115a6aa623694c0bf72ff08687d161aeb0, not stripped $ ldd src/fpart # it uses some std Linux shared libs linux-vdso.so.1 => (0x00007ffdb39fd000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7790f77000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7790bad000) /lib64/ld-linux-x86-64.so.2 (0x0000564e0a400000) # the fact that it's 'not stripped' means that you can see all the # function information with 'nm' $ nm src/fpart 0000000000608e18 d _DYNAMIC 0000000000609000 d _GLOBAL_OFFSET_TABLE_ 0000000000405d20 R _IO_stdin_used w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable w _Jv_RegisterClasses 0000000000407f40 r __FRAME_END__ 00000000004074f4 r __GNU_EH_FRAME_HDR 0000000000608e10 d __JCR_END__ ... <much more info omitted>
7.2. make
There are a number of utilities to make building programs easier. The most widely used are based on the make utility, a system that calculates dependencies and allows you to re-run only those part of the dependencies which have failed. In addition, it can run these builds in parallel which can tremendously speed up large compiles, especially when you’re in the debugging stages. See the section above for more info on generating usable Makefiles.
Alternative uses of Make
It can also be used to similarly automate and calculate the dependencies for any large system, such as an analytical tree for RNASeq or other arbitrary analysis, which allows for exact replication of analyses. See this search result for other approaches to this approach. |
There are several systems that use Makefiles. The 2 most popular are GNU make and CMake, tho they use Makefiles in different ways; GNU Make consumes Makefiles (typically produced by the autoconf toolchain) and CMake produces Makefiles to be processed by Gnu Make in the same way that it consumes Makefiles produced by other mechanisms.
7.2.1. GNU make
GNU Make is a core component of every Linux system and is a build system - it takes Makefiles and directs the compilers to generate, test, and install code. If a Makefile was built correctly by configure or CMake or was supplied already with an application, you could type make in the Makefile directory and make would direct the appropriate compiler (defined in the Makefile or systemwide) to build the application or library.
7.2.2. CMake
CMake is a system for building multiplatform build systems (a higher level than GNU Make), so if you were creating code that you wanted to run on Windows, Macs, and Linux, CMake would be a better choice than GNU Make. CMake does generate Makefiles that GNU Make can use, so from a Linux POV, you could think of CMake as roughly equivalent to the configure script described above.
You would typically use CMake to generate the Makefile and then have GNU make read that Makefile to generate the code, altho often CMake will do all that for you.
7.3. Java-specific "Makes"
Because Java is a hybrid system (usually partly compiled, partly interpreted), it uses tools from both the Interpreter world and the Compiler world. Ant, Maven, Gradle, and Gant are systems that work like Make/CMake, but are specific for Java, most often because of the multi-file aspect of Java and the startup time of the Java Compiler javac. C/C++/Fortran tends to have a flatter structure with fewer files and therefore work well with make/CMake. However Java projects tend to have lots of small files and calling javac on hundreds of small files has a substantial overhead. Ant/Maven and friends discover all the dependencies and requirements and call javac on all the files at the same time, sparing the file-by-file startup times.
7.4. Rake for Ruby
Rake is a make system for Ruby, another interpreted language.
7.5. the Preprocessor (cpp)
The preprocessor examines the preprocessor directives in various languages (the statements in C/C++ prefixed by "#" such as #define, #include, #undef, etc) and resolves them, notably the #include statements that point to the header files (*.h) that define interfaces for all external functions.
7.6. the Lexer (lex/flex)
A lexer is a program that generates tokens (scans a series of characters and extract and assigns values to them) as part of determining how a program should operate. For example, the process of determining the options fed to a program would require such lexing. See Wikipedia for more examples.
7.7. the Parser generator (yacc/bison)
The parser often operates with the lexer to automatically generate the code to process the lexer tokens. Because of what it does, it’s often referred to a compiler compiler - it generates the structures and formalism to process an arbitrary language into computer code. The lexer and parser are often used in option processing and especially to generate the language parsing that an internal language or commands that a complex program might use. Gnuplot and all interpreters would require such functionality. ie, in Gnuplot, the lexer/parser would enable you to write routines that differentiate between plot and plod.
7.8. the Linker / Loader (ld)
The linker ld is the program that resolves all the function calls to either code you wrote or to functions in the system libraries and imports them either entirely (in a static linkage) or a reference or symbol (in a shared or dynamic library linkage). Read more at Wikipedia. The Loader also is responsible for starting the program, by loading it into RAM, notifying the kernel that a new process is running, resolving symbols & functions, requesting memory to run in, and making sure that the process obeys the rules for execution.
7.9. List Dynamic Dependencies (ldd)
ldd helps to debug program failures due to missing shared libraries. If a program cannot find a necessary library, it will fail; ldd will identify the missing library and possibly provide hints to where it should be. (see also the RPATH environment variable)
8. GLIBC versions & patchelf
patchelf is a magical utility that allows you to execute programs that use different (usually newer) libc libraries on other Linux kernels.
Some background: If you’re working on a cluster that typically uses an older kernel and libc than your laptop Linux, then running a compiled executable copied from your laptop to the cluster generally will not work. If you try to do this, you’ll get the dreaded GLIBC error.
In the example below, I’m using a program called tacg, compiled on my Ubuntu Xenial (16.04) laptop which supports GLIBC_2.23. I’m trying to get it running on our ancient CentOS 6.9 cluster which only supports GLIBC_2.12. The laptop-compiled executable tacg has been copied to the cluster so the files are bit-for-bit identical.
If I try to run the laptop-compiled executable on the cluster..
hmangala@cluster : ./tacg ./tacg: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./tacg)
We can verify that with ldd as described above:
hmangala@cluster : $ ldd tacg ./tacg: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./tacg) linux-vdso.so.1 => (0x00007ffef02c1000) libpcre.so.3 => not found <<<< libm.so.6 => /lib64/libm.so.6 (0x00007fa5225c9000) libc.so.6 => /lib64/libc.so.6 (0x00007fa522229000) /lib64/ld-linux-x86-64.so.2 (0x00007fa522881000)
So there are at least 2 problems indicated immediately here: - the GLIBC_2.14 incompatibility - not finding a libpcre.so.3 (Perl Compatible ReGex)
The libpcre.so.3 problem can be addressed if our creaky CentOS repositories have such a library. Since it’s a very popular lib, it probably does exist. Hmmmmm. Yes it does, but the version is very old (libpcre.so.0 - compare with the libpcre.so.3 requirement) - too old for our use.
We could try a false-flag symlink (creating a falsely versioned symlink to the existing library), and it seems to have kinda/worked, or at least shut up the error.
root@cluster: # ln -s /lib64/libpcre.so.0.0.1 /lib64/libpcre.so.3 # in this case done as root. hmangala@cluster : $ ldd tacg ./tacg: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./tacg) ^^^^^^ this is still a problem linux-vdso.so.1 => (0x00007ffef7551000) libpcre.so.3 => /lib64/libpcre.so.3 (0x00007f2ad4601000) ^^^^^^^^^^^^ But now the libpcre is found, even tho it's really the wrong version libm.so.6 => /lib64/libm.so.6 (0x00007f2ad4371000) libc.so.6 => /lib64/libc.so.6 (0x00007f2ad3fd1000) /lib64/ld-linux-x86-64.so.2 (0x00007f2ad4861000)
So how to address the GLIBC problem?
Some more background: Linux libraries are extremely backwardly compatible; newer libraries are very rarely incompatible with older ones (which is why the false-flag symlinking above often works). However, when a program runs, it checks the version of the libraries it’s using to verify that they’re new enough. In cases where the libraries are too old, the program refuses to run and generates the error seen directly above.
You CAN provide newer libraries for newer programs on older systems via the above-referenced patchelf. but you have to provide not only the new libc.so library, but ALL the core libraries that go along with it. These are stored in the /lib64 dir on RedHat distros and in the /lib/x86_64-linux-gnu dir on Debian distros. Fortunately, this is a fairly small set of libraries (about 30MB) altho you may have provide additional newer libraries if the executable demands them (an example of that is also shown below).
These newer system libraries must be stored separately from the core GLIBC libraries. If you overwrite the standard libraries, you’ll probably end up with a dead kernel and system.
The accessory libraries - libpcre.so, libpthread.so, libbooger.so (those required by the application but NOT in the /lib64 or /lib dirs), are typically stored separately in your personal ~/lib dir and referenced via the LD_LIBRARY_PATH variable, set explicitly for the execution of the newer executable.
The patchelf utility allows you to change the linker called by the program to an alternative that you supply, as well as setting the RPATH variable to tell the program to search non-standard paths for the libc libraries.
Here’s an example using a generic program called tacg to show what I mean.
# 1st, let's note the size of the executable on my laptop: hjm@laptop : $ ls -l tacg -rwxr-xr-x 1 root root 1484760 May 17 2018 tacg* # then check it, freshly copied on the cluster: hmangala@cluster : $ ls -l tacg -rwxr-xr-x 1 hmangala staff 1484760 Apr 13 14:00 tacg* # looks like it's the same size, but is it? hjm@laptop : $ md5sum tacg e00aef6495d24dd622c3cb5d3d640c57 tacg hmangala@cluster : $ md5sum tacg e00aef6495d24dd622c3cb5d3d640c57 tacg # yup, it is. # now let's do the patchelf magic: hmangala@cluster : $ patchelf \ --set-interpreter /data/users/hmangala/tacg_dir/ld-linux-x86-64.so.2 \ --set-rpath /data/users/hmangala/tacg_dir /data/users/hmangala/tacg # now note what hash of the cluster copy is: hmangala@cluster : $ md5sum tacg e0b2c9f1cc901c4693a8cb651ad62f6b tacg # different than on the laptop above # and now the loader specified is: hmangala@cluster : $ strings tacg | grep ld-linux /data/users/hmangala/tacg_dir/ld-linux-x86-64.so.2 # instead of the laptop version: hjm@laptop : $ strings tacg | grep ld-linux /lib64/ld-linux-x86-64.so.2 # now when we try to run tacg on the cluster hmangala@cluster : $ ./tacg ./tacg: relocation error: /lib64/libpthread.so.0: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
Hurray!! (sort of).. An error, but NOT the GLIBC error. The one above is from a required library libpthread.so.0 that is not compatible with the libc.so that we copied in. The resolution of this requires the libpthread that is compatible with the newer GLIBC. Solving this requires copying the libpthread.so library to the cluster as well and placing it with the libpcre.so in my private ~/lib dir. This could have been foreseen if I had paid attention to the output of ldd initially on my laptop:
hjm@laptop : $ ldd tacg linux-vdso.so.1 => (0x00007ffd097db000) libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fc07353b000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc073232000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc072e68000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc072c4a000) <<<< Aha! /lib64/ld-linux-x86-64.so.2 (0x0000558564494000)
OK - I’ve found and copied in the new libpthread.so library (and also the newer libpcre.so) to my private ~/lib dir. Note that I copied them with their integral symlinks (using rsync -a).
hmangala@cluster : $ ls -l lib total 588 lrwxrwxrwx 1 hmangala staff 17 Apr 13 13:49 libpcre.so.3 -> libpcre.so.3.13.2 -rw-r--r-- 1 hmangala staff 456632 Mar 24 2016 libpcre.so.3.13.2 -rwxr-xr-x 1 hmangala staff 138696 Feb 5 12:11 libpthread-2.23.so* lrwxrwxrwx 1 hmangala staff 18 Apr 13 13:49 libpthread.so.0 -> libpthread-2.23.so* # once we've adjusted the LD_LIBRARY_PATH variable export LD_LIBRARY_PATH=/data/users/hmangala/lib:$PATH # we try to execute tacg again, and ... hmangala@cluster : $ ./tacg type 'tacg -h' or 'man tacg' for more help on program use or type: tacg -n6 -slLc -S -F2 < [your.input.file] for an example of what tacg can do.
The above output is what is expected when tacg is executed without any options. ie, IT WORKED!! We’re now running a formerly incompatible executable on the old kernel.
9. Static vs Dynamic Linking
There are 2 types of compiled programs: statically linked and dynamically linked (aka shared lib or simply shared). A statically linked program has all the libraries and functions included in the binary package that you execute, resulting in a much larger file on disk. However, this dramatically increases the probablility that a statically linked program will run on an OS. A dynamically linked program only contains the calls to the shared libraries, relying on the OS to provide the libraries and the user to provide the approporate environment PATHs to them (via LD_LIBRARY_PATH or RPATH)
So (assuming you cared) how do you tell whether a program is shared or static? You point the ldd (list dynamic dependencies) program at the program and it will tell you:
-
whether the program is static or dynamic
-
whether the immediate library requirements of a dynamic program are met by the current environment (if a shared library requires a further library, ldd will also identify it.
A dynamic linking is like going on vacation to Brazil and packing only your personal clothes, assuming that everything else will be provided for you. A static linking assumes that you need your clothes of course, but also the towels, sheets, kitchen utensils, furniture, and your car. Instead of traveling with a suitcase, you travel with a shipping container.
Sometimes, especially with complex programs, the program you think you’re executing is not the actual program. For a number of reasons, the actual executable is CALLED BY a wrapper script, often written in bash (the de facto command language of Linux). This allows the wrapper script to set a number of parameters based on how the program is being called, determine the exit status of the program and do various supporting jobs based on how the job exited. If you’ve ever seen a popup window that says something to the effect "Sorry. The program salmonspots has crashed. Would you like to send a crash report back to the developers?", then you’re probably dealing with a program that has been called with a wrapper script (or is communicating on an application bus (like dbus).
So should you run ldd on such a wrapper, you’ll get:
ldd /data/apps/R/3.4.1/bin/R not a dynamic executable
In this case, the only way to determin where the actual executable lives is to page thru the wrapper script. In the above case, we see:
#!/bin/sh # Shell wrapper for R executable. R_HOME_DIR=/data/apps/R/3.4.1/lib64/R if test "${R_HOME_DIR}" = "/data/apps/R/3.4.1/lib64/R"; then case "linux-gnu" in linux*) run_arch=`uname -m` case "$run_arch" in x86_64|mips64|ppc64|powerpc64|sparc64|s390x) libnn=lib64 libnn_fallback=lib <etc>
-
And far below, we find that the actual R executable is buried in:
R_binary="${R_HOME}/bin/exec${R_ARCH}/R" which in this case means /data/apps/R/3.4.1/lib64/R/bin/exec/R
If we run 'file' and then ldd on this 'terminal R', we find:
$ file R R: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped $ ldd R linux-vdso.so.1 => (0x00007ffd5ed90000) libgfortran.so.3 => /data/apps/gcc/5.3.0/lib64/libgfortran.so.3 (0x00007fe1274b0000) libgomp.so.1 => /data/apps/gcc/5.3.0/lib64/libgomp.so.1 (0x00007fe127288000) libR.so => /data/apps/R/3.2.3/lib64/R/lib/libR.so (0x00007fe126c30000) libmpi.so.1 => /data/apps/mpi/openmpi-1.8.8/gcc/5.3.0/lib/libmpi.so.1 (0x00007fe126940000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe126720000) libc.so.6 => /lib64/libc.so.6 (0x00007fe126388000) libquadmath.so.0 => /data/apps/gcc/5.3.0/lib/../lib64/libquadmath.so.0 (0x00007fe126148000) libm.so.6 => /lib64/libm.so.6 (0x00007fe125ec0000) libgcc_s.so.1 => /data/apps/gcc/5.3.0/lib/../lib64/libgcc_s.so.1 (0x00007fe125ca8000) librt.so.1 => /lib64/librt.so.1 (0x00007fe125aa0000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe125898000) libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fe125640000) libreadline.so.6 => /lib64/libreadline.so.6 (0x00007fe1253f8000) libicuuc.so.42 => /usr/lib64/libicuuc.so.42 (0x00007fe1250a0000) libicui18n.so.42 => /usr/lib64/libicui18n.so.42 (0x00007fe124d08000) /lib64/ld-linux-x86-64.so.2 (0x00007fe1277d0000) libopen-rte.so.7 => /data/apps/mpi/openmpi-1.8.8/gcc/5.3.0/lib/libopen-rte.so.7 (0x00007fe124a80000) libopen-pal.so.6 => /data/apps/mpi/openmpi-1.8.8/gcc/5.3.0/lib/libopen-pal.so.6 (0x00007fe124798000) libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007fe124588000) libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x00007fe124378000) libutil.so.1 => /lib64/libutil.so.1 (0x00007fe124170000) libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fe123f48000) libicudata.so.42 => /usr/lib64/libicudata.so.42 (0x00007fe122df8000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fe122af0000)
The above file command shows that it is dynamically linked (uses shared libs) and the ldd command below shows that all the required shared libs have been resolved during the build process. If they had not been resolveable, you would see an entry like this:
.. libicudata.so.42 => not found ..
which would send you off on a quest to find out how the damn thing had been built in the first place and where the missing library has gone to. (In many cases, the reason is bc the module supplying the missing library wasn’t loaded by by the responsible module file.)
This is a case which will make you wish that you (or the author) had built it as a static executable
9.1. Fixing Missing Libraries
Quite often, especially when you’ve copied a shared application from another Linux distribution, you will find that it complains about missing libs when you know that you already have that library or you have a similar lib that may be a version higher or lower than the specific one demanded.
9.2. Modifying LD_LIBRARY_PATH
If you know that you have the missing lib, your current LD_LIBRARY_PATH is probably missing or misconfigured. You can determine if this is so by first locating the missing lib and then checking the value of LD_LIBRARY_PATH. If the missing lib is called libpcre.so.3.12.1, try:
$ locate libpcre.so /lib/i386-linux-gnu/libpcre.so.3 /lib/i386-linux-gnu/libpcre.so.3.13.2 /lib/x86_64-linux-gnu/libpcre.so.3 /lib/x86_64-linux-gnu/libpcre.so.3.13.2 /ohome/hjm/.singularity-cache/tacg/c/lib/x86_64-linux-gnu/libpcre.so.3 /ohome/hjm/Downloads/kdirstat-2.5.3/kdirstatlibs/libpcre.so.3 /ohome/hjm/Downloads/kdirstat-2.5.3/kdirstatlibs/libpcre.so.3.12.1 /usr/lib/x86_64-linux-gnu/libpcre.so $ printenv LD_LIBRARY_PATH <nothing>
The envar LD_LIBRARY_PATH is not set, so the system is relying on the default set of paths set in the files in /etc/ld.so.conf.d. These files are editable by root, so if you have your libs in a non-standard location, you can include it by editing those files and then running ldconfig which will add the libraries therein to the cache.
If you wanted to add the lib above to the LD_LIBRARY_PATH, you can by explicitly setting it:
export LD_LIBRARY_PATH+=/ohome/hjm/Downloads/kdirstat-2.5.3/kdirstatlibs
The application would then be able to find the missing lib and execute (at least to the next failure).
9.3. Symlinking close matches
You can often fake a fix if you don’t have the specific lib for which the application is looking. Linux libs tend to be very conservative, so that if you need libpcre.so.3.13.2 and you have libpcre.so.3.12.1, the application may very well be able to work just fine. All you need to do is provide a symlink to the older lib.
ln -s /path/to/libpcre.so.3.12.1 /path/to/libpcre.so.3.13.1 # lie to the application
9.4. GLIBC problems
GLIBC is the Gnu Standard C library. It’s part of every Linux distribution and is very stable. However, if you build a program on one Linux system (especially a modern one, such as your laptop) and then copy that application to an older system (a cluster or larger system that may not be as up-to-date as your laptop, you’ll often see this error:
./myprog-install: /lib/tls/libc.so.6: version `GLIBC_2.4' not found (required by ./myprog-install)
This is due to having a GLIBC compatibility problem. GLIBC is almost always backward compatible (so you can always run an old program on a newer system) but can’t be forward compatible (so running a new program on an old system will often yield this error. There are 2 somewhat easy solutions and one more difficult one. The easiest solution is to recompile your program on your laptop as a static executable. That will force your program to carry with it all the functionality that it will ever need. The other somewhat easy alternative is to re-compile your program on the old system, which will resolve all the symbols into the older GLIBC. The harder alternative is to provide your current shared program with the correct GLIBC version by supplying the required libs separately and using [patchelf] to supply a newer loader to point to them. You can do this as a regular user.
You can also do a similar thing in a https://en.wikipedia.org/wiki/Chroot[chroot environment] that has the correct GLIBC or more conventionally, use a [container] such as [Docker] or [Singularity] to do the same thing.
10. Environment Variables
Environment Variables (envars) are shell variables set at login or during an interactive session that define or change the behavior of your shell and the execution of programs that are started from it. You can set envars to be available to subshells (using the prefix export) or to be restricted to the local context by omitting export. In both cases, the value of GOOBERDIR and VEGGIEDIR will vanish when you logout & login again unless:
export GOOBERDIR=/home/hjm/nuts/goober # GOOBERDIR is now available to subshell and programs started from those subshells VEGGIDIR=/home/hjm/veg/carrots # VEGGIEDIR is only available to THE CURRENT shell, not to subshells
Here are some critical envars that will change the behavior of programs that you try to execute.
10.1. PATH
PATH defines where the OS looks for executables. A default PATH is set when you log in, typically something like:
/home/hjm/bin:/usr/local/bin:/bin:/usr/bin:/usr/sbin:/usr/X11R6/bin
which prepends my (hjm) private bin dir in front of anything else. PATH can be expanded and modified arbitarily. Here’s what my laptop PATH looks like.
/home/hjm/bin:/home/hjm/eclipse:/usr/NX/bin:/usr/local/sbin:/usr/local/bin:/bin:\ /sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/home/hjm/intel/bin:/linux86/bin
It can also be changed programmatically to point to alternative applications, especially in large systems that might use something like the environment modules or lmod systems, which also manipulate the following variables to the same end.
10.2. LD_LIBRARY_PATH
LD_LIBRARY_PATH defines the directories thru which the loader will search to find shared libraries. The following demonstrates the changes a module load can have on the LD_LIBRARY_PATH.
$ printenv LD_LIBRARY_PATH # nothing shown above $ module load R/3.4.1 # ... R is a language optimized for statistics and mathematics, and with the BioConductor package (installed), Bioinformatics and genomics. # ... $ printenv LD_LIBRARY_PATH /data/apps/R/3.4.1/lib64/R/lib:/data/apps/cern_root/5.34.36/lib:/data/apps/hdf5/1.8.11/lib:/data/apps/curl/7.52.1/lib:/data/apps/pcre/8.40/lib:/data/apps/xz/5.2.3/lib:/data/apps/bzip2/1.0.6/lib:/data/apps/zlib/1.2.8/lib:/data/apps/fftw/3.3.4-no-mpi/lib:/data/apps/tcl-tk/8.6.4/lib:/data/apps/gcc/5.3.0/lib64
10.3. RPATH
RPATH is the search path hard-coded into a library when it’s built, to point to required libraries needed to resolve missing symbols. Because it’s hard-coded, it’s a fairly fragile mechanism for resolving such symbols and it generally better to use the LD_LIBRARY_PATH to point to library locations. If you have to use it, you can define the envar LD_RUN_PATH to be read by the linker ld or supply it explicitly with -rpath=/path/to/libraries.
10.4. LDFLAGS,LIBS
LDFLAGS is an envar that often contains both the -lname and -Llocation of the libraries that need to be found in order to satisfy the compilation (and so can partly replace the LD_LIBRARY_PATH in the compilation phase, but NOT in the execution phase).
The name is constructed like: -l*name* where name is the diagnostic part of the library. So if the library was named 'libgomp.so.4.5, the LDFLAGS abbrieviation would be -lgomp. The specification of the -lname is also often associated with the envar LIBS, depending on who is writing the code.
The location of the library is specified in LDFLAGS with the prefix -L/full/path/to/lib/dir
# if the libraries of interest were the bzip2 and pcre libs, the envar would be set: export LDFLAGS+="-L/data/apps/bzip2/1.0.6/lib -lbz2 -L/data/apps/pcre/8.40/lib -lpcre" # the above line adds the locations and libnames of libbz2.so and libpcre.so to the existing LDFLAGS envar
10.5. CPPFLAGS
CPPFLAGS is used to provide paths to header (*.h) files that are not on the standard include path (usually /usr/include). The format used is similar to the LDFLAGS above:
# if the libraries of interest were the bzip2 and pcre libs, the envar would be set: export CPPFLAGS+="-I/data/apps/bzip2/1.0.6/include -I/data/apps/pcre/8.40/include" # the above line adds the locations of the relevant header files to the CPPFLAGS envar
10.6. MANPATH
MANPATH is simply the path that the man program should search in order to find the man pages for an entry.
# if you needed to add a specific path to find the man pages for bzip2 and pcre, the envar would be set: export MANPATH=":/data/apps/bzip2/1.0.6/man:/data/apps/pcre/8.40/man" # the above line adds the locations of the relevant man pages to the MANPATH envar # note that the string starts with ':/data/apps...' That syntax appends the given # MANPATH to the already defined one
11. Debugging
This section remains to be written (or at least consolidated into this doc).
11.1. printf
Describe the use of print/printf in debugging
11.2. __LINE__
Associated with the __LINE__ CPP var.
11.3. gdb
The Gnu Debugger is awesome and awful.
11.4. ddd
ddd puts a friendly face on gdb.
11.5. valgrind
The monster memory monster debugger.
12. Improving a program
What programs can be optimized and by what mechanisms? When should you spend time optimizing?
12.1. Timing a program
12.1.1. time
12.1.2. /usr/bin/time
12.1.3. coding in high def timers
12.2. Profiling
The warning about premature optimization Don’t engage in optimization until you find out where your program is spending time.
12.2.1. oprofile
12.2.2. perf
12.2.3. PAPI
12.2.4. HPC Toolkit & Visualizer
13. How are programs distributed and installed
13.1. Via a distribution-dependent installer
These mechanisms are highly preferred for personal installation since they can be used to install scripts, programs, libraries, configuration files, and can also set up the installation, alias, and initialization files. They often cannot be used on large multi-user systems since they require root permissions that a normal user won’t have on a large system.
13.1.1. RedHat-derived
(RHEL, CentOS, Fedora)
-
rpm
-
yum
13.1.2. Debian-derived
(Debian, Ubuntu, Mint)
-
apt-get
-
synaptic
-
GUI variants of the above.
13.1.3. Alternatives
Alien, tarballs, etc
Scripts and programs are often bundled into archives of some kind, usually a tar or zip archive containing:
-
the executable script(s) or program(s) if they have been compiled
-
source code if they have not been compiled
-
the instructions for compiling and/or installing it
14. What is a code repository?
Modern code development is often (and should always be) done using a code repository. This is a system for organizing, sharing, backing and cooperating on the development of the source code and associated documents More frequently, code is being distributed directly from code repositories.
-
from repositories
-
git
-
svn
-
cvs (ok, maybe not cvs)
-
15. How to install your own program
or contribute to this outline and I’ll write it.