How Programs Work on Linux
==========================
Harry Mangalam <harry.mangalam@uci.edu>
v1.1 alpha, April 14, 2019
:icons:

// fileroot="/home/hjm/nacs/How_Programs_Work_On_Linux_2"; asciidoc -a icons -a toc2 -a toclevels=3 -b html5 -a numbered ${fileroot}.txt;  scp ${fileroot}.html ${fileroot}.txt  moo:~/public_html

// http://moo.nac.uci.edu/~hjm/HPC-Programs.HOWTO.html

// https://stackoverflow.com/questions/3996651/what-is-compiler-linker-loader
// patchelf
// https://stackoverflow.com/questions/847179/multiple-glibc-libraries-on-a-single-host

== Introduction
This document is for (non-CS) graduate students who:

- are using Linux to do large scale data analysis on Linux systems
- have little-to-no experience in computer programming
- need to know how programs work
- need to install programs writ by others
- need how to fix common problems in installation
- need to know how to bypass 'root' requirements on large systems
- and otherwise how to accelerate their own analysis without compromising the security of the larger system.  

Hence this is a meta-document, not about 'How to Program' (there's almost nothing about programming in it) but about how programs are built and work on Linux systems and how to fix common problems associated with running them.

== Why install your own programs?

Why indeed?  Well, altho you should always try to be http://threevirtues.com/[as lazy as possible], sometimes 
the utilities and programs that are available on your system don't do the thing that you want.
Altho the longer you program, the more you realize that there probably *is* an included utility that 
does pretty much what you want; you just have to find out about it - see Google.

That said, this is research, and quite often you do need a program that does 
something different and for that you can either beg someone else to do it, or do it yourself.  
'By yourself' can mean anything from copying and modifying a friend's program to a gigantic hairball of an 
installation that requires a number of other dependencies, libraries, versions, etc - 
aka 'dependency hell'.

So you need to install a program.  Well ...


== What is a program?
A program is a set of instructions that tells the computer what to do.  It can be 
a human-readable set of instructions, like the (badly written) perl program below:

[source,perl]
-----------------------------------------------------------
#!/usr/bin/env perl   # defines the interpreter to use; 
while (<>) {  # consume & the input file on STDIN, line by line, until it ends
  # split the incoming line (the default variable on which to operate), on space (/ /) tokens
  # into elements of the array @A and store the token count in $N
  $N = @A = split (/ /);
  # loop thru the variables, printing them each on their own line
  for ($i=0; $i<$N; $i++){ print "Element [$i] = $A[$i]\n";}
  sleep 1;  # pause for 1s between lines.
}
-----------------------------------------------------------

This kind of program has to be 'dynamically' translated into 
machine code by an 'interpreter' (like Perl, Python, R, often defined by the '#!/usr/bin/env  ... ' line  in the example above) or with different source code, compiled into a set of machine instructions with a compiler (as for C, C++, Fortran).

If you try to view *compiled* C++/C/Fortran code in a text editor or pager like 'less', you'll see something like this:
-----------------------------------------------------------
^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^B^@^C^@^A^@^@^@P<95>^D^H4^@^@^@d<F2>^L^@^@^@^@^@4^@ ^@^H^@(^@&^@#^@^F^@
^H^D<AF>
^H<FC>^E^@^@^H6^@^@^F^@^@^@^@^P^@^@^B^@^@^@^X^_^F^@^X<AF>
^H^X<AF>
^H<D8>^@^@^@<D8>^@^@^@^F^@^@^@^D^@^@^@^D^@^@^@H^A^@^@H<81>^D^HH<81>^D^HD^@^@^@D^@^@^@^D^@^@^@^D^@^@^@
^H^D<AF>
^H<FC>^@^@^@<FC>^@^@^@^D^@^@^@^A^@^@^@/lib/ld-linux.so.2^@^@^D^@^@^@^P^@^@^@^A^@^@^@GNU^@^@^@^@^@^B^@
^H^@^@^@^@^P^@<F1><FF><A8>^A^@^@ <B5>
^H^D^@^@^@^Q^@^X^@N^@^@^@<C0><8F>^D^H^@^@^@^@^R^@^K^@1^@^@^@l<82>      ^@^@^@^@^R^@^N^@^V^B^@^@@<B5>
^H^D^@^@^@^Q^@^X^@^@libpcre.so.3^@__gmon_start__^@_Jv_RegisterClasses^@_fini^@pcre_exec^@pcre_compile
<etc>
-----------------------------------------------------------

ie, unlike the source code, the compiled code gives little indication of the action of the program.

=== What makes a program executable?

For Linux, one thing that 'makes a program' is that the file has its *execute bit* set.  Without the 
http://moo.nac.uci.edu/~hjm/biolinux/Linux_Tutorial_12.html#_permissions_chmod_amp_chown[execute 
permission bit] set, even a compiled program will not be executed by the Operating System. ie:

-------------------------------------------------------------------------------
$ ls -l tacg
-rwxr-xr-x 2 hjm hjm 1495148 Oct 27 20:20 tacg*
   ^  ^  ^  
the '^' above indicates the execute bits for the owner, group, and other.

a permission line like this:
--rwxr--r-- 2 hjm hjm 1495148 Oct 27 20:20 tacg*
    ^  allows only the owner to execute it ('group' and 'other' can't)
------------------------------------------------------------------------------

In an interpreted script, what makes a program 'executable' is the presence of the 'shebang' 
line  pointing to the interpreter that you want to process the program.
In the Perl example above, the 'shebang' line was the 1st line, which is 
prefixed by a hash(#) and an exclamation sign(!) (or 'bang' in geek).  Hence 'shebang'.

Often, thru carelessness or ignorance, files will be https://en.wikipedia.org/wiki/Chmod[chmod]'ed to be executable, when they aren't valid programs.  Libraries ('libsomething.so') are often set to be executable when they don't need to be.

A file can't be executed simply because the execute bit IS SET, but NOT having the execute bit set will prevent a program from running by calling its name.  However, you CAN still execute a program that does not have the execute bit set by prefixing the name with the appropriate interpreter.

If 'irvinepines.pl' is an otherwise functional perl program:

[source,perl]
-----------------------------------------------------------
#!/usr/bin/env perl

print "The Irvine pines are sublime..\n";
sleep 1;
print "Until they burst into flames.\n";
-----------------------------------------------------------

Note that whether it is executable depends on the execution bits (as well as whether it's a valid program).

-----------------------------------------------------------
# note no execute bits are set below
$ ls -l irvinepines.pl 
-rw-rw-r-- 1 hjm hjm 116 Sep 19 15:54 irvinepines.pl

# so when we try to execute it, we can't
$ ./irvinepines.pl
bash: ./irvinepines.pl: Permission denied

# even tho 'file' identifies it as:
$ file ./irvinepines.pl
./irvinepines.pl: a /usr/bin/env perl script, ASCII text executable

# however, if we prefix a non-executable perl script with the interpreter name
$ perl irvinepines.pl 
The Irvine pines are sublime..
Until they burst into flames.

# or we can make it executable
$ chmod +x irvinepines.pl
$ ls -l irvinepines.pl 
-rwxrwxr-x 1 hjm hjm 118 Sep 19 15:56 irvinepines.pl*
#  ^  ^  ^  now everyone can execute it

# now we can execute it simply by calling its name.
$ ./irvinepines.pl 
The Irvine pines are sublime..
Until they burst into flames.
-----------------------------------------------------------
The above example holds true for most interpreted scripts, but *not for compiled programs*. 
ie you can't cause a compiled program called 'tustintrees' to execute simply by prefixing 
the name with a compiler.  'gcc tustintrees' will not work.


== Difference between scripts and programs

In this document, we're going to call programs that require a separate 
interpreter a *script* (see link:#interpreter[Interpreter] below) and those 
programs that are compiled (see link:#compiler[Compiler] below) into 
independent executable code a *program*.

The difference is apparent if you run 'file' on them:


----------------------------------------------------------
# this is a 'program'
$ file /bin/ls
/bin/ls: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked 
(uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=
bd39c07194a778ccc066fc963ca152bdfaa3f971, stripped

# this is a 'script'
$ file ~/bin/clusterfork_1.75.pl
/home/hjm/bin/clusterfork_1.75.pl: Perl script, ASCII text executable
----------------------------------------------------------


[[interpreter]]
==  Interpreters
'Interpreters' are used by languages like https://www.perl.org/[Perl], https://www.python.org/[Python], 
https://www.r-project.org/[R], https://java.com/en/[Java], https://julialang.org/[Julia], etc.  
An interpreter is a large program that takes human-readable source code, translates it into 
executable code, and then executes it directly, as opposed to a compiler (see below) which 
compiles source code into a file of executable code.  Because of this repeated, real-time 
translation, interpreted code always run slower than compiled code, but often the difference 
is trivial, especially since many interpreted languages tend to have shortcuts or toolboxes 
of functionality available that enables the writer to generate useful code faster than with 
compiled code.  Especially with one-off programs or those which deal with small datasets, 
the 'speed to utility' overwhelms the 'speed of execution'.

Java (and Microsoft's C#) have advantages over more traditional interpreted languages; 
instead of progressing thru the traditional 'convert to machine code each time and then 
execute' they both can compile (with 'javac' in the case of Java) their source code to 
an intermediate, platform-independent 'byte-code' which is then executed by a 
'platform-specific' interpreter.
This can speed up Java to the point where the it is generally only a little slower 
than compiled programs for many things. 
http://www.cs.cmu.edu/~jcarroll/15-100-s05/supps/basics/history.html[Here's a longer, but clear explanation.]

All modern interpreted languages come with large libraries of extended functionality and 
often those libraries include compiled code such that any execution that uses them runs 
at compiled speeds.  This is why Python, a relatively slow-to-execute language, is being 
used in many scientific areas, since the use of libraries like https://www.scipy.org/[SciPy] 
and http://www.numpy.org/[NumPy] enable compiled-speed execution times.

=== Scripts
The Perl (or Python or R) script is a file containing human-readable ASCII characters that the 
Interpreter can translate dynamically into executable code.  

.'Hello World' in Perl
[source,perl]
----------------------------------------------------------
#!/usr/bin/env perl 
print "Hello World\n";
----------------------------------------------------------


.Conditional line printer in Perl
[source,perl]
----------------------------------------------------------
#!/usr/bin/env perl 
while (<>){
  my @A = my $N = split;
  if ($N > 4) {
    print $_;
  }
}
----------------------------------------------------------

.Use /usr/bin/env
[NOTE]
==============================================================================
Often, the 'shebang' line will be explicit, like *\#!/usr/bin/perl* or *\#!/usr/bin/python*.  
To make your program more portable, use the more flexible *\#!/usr/bin/env interpeter* format. 
ie *#!/usr/bin/env perl*.  This causes the interpreter to be chosen by the environment variables 
set when you try to execute the program.  This allows the program to be run both by the system 
default 'perl' interpreter as well as another 'perl', perhaps loaded by a 'module' command.

NB: Typed into the bash shell, '/usr/bin/env perl' has almost no apparent effect 
since 'perl' by itself is waiting for a script to act on, while '/usr/bin/env python' 
drops you into the python shell if there's no code to interpret.

In the same way, you can define the interpreters for Python, R, Julia, Java, Octave, etc

- #!/usr/bin/env python
- #!/usr/bin/env R
- etc

==============================================================================


[[compiler]]
== Compilers
'Compilers' are used by languages like C, C++, Ada, Go, Fortran, etc. 
There are many compilers (https://software.intel.com/en-us/c-compilers[Intel's icc/icpc/ifort] 
and http://www.pgroup.com/[PGI's pgc/pgc++/pcf] make very good (and expensive) compilers, and 
there are also many free compilers (https://llvm.org/[LLVM] which Julia uses, for example)), 
but I'm going to stick to the free GNU compilers; the approach is similar for all of them.

'gcc' stands for the 'Gnu Compiler Collection' and it is an astonishingly sophisticated 
toolkit that allows many languages to be used as inputs to the same engine to be turned 
into compiled chunks of machine object code, then linked to required libraries of functions 
with 'ld' to produce an executable file.  (This can be confusing since gcc can act as both 
compiler and linker, carrying out the various functionalities depending on how it is called.)  

While they are not formally part of the gcc application, there are a number of tools 
that are associated with the process of writing and compiling code.  Examples of these 
are the editors with which you write the code (and can also be integrated into the 
process by syntax-checking and color-coding as you type, calling documentation about 
functions, etc), the configuration tools (https://www.gnu.org/software/autoconf/[autoconf], https://airs.com/ian/configure/[configure], https://www.gnu.org/software/make/[GNU make], https://cmake.org/[Cmake], etc), as 
well as useful outliers like the http://www.swig.org/[SWIG Interface Generator] and 
the https://nixos.org/patchelf.html[patchelf] utility for mixing and matching libraries (link:#patchelf[see below] 
for an extended example of using 'patchelf').  Many of these are described in more detail below.

=== Compiled programs

Creating a compiled program starts with writing the source code which looks similar to a script, 
except that it doesn't begin with a 'shebang' line.  Here's a very short C program

.'Hello World' in C
[source,c]
----------------------------------------------------------
#include <stdio.h>
#define STRING "Hello World"
int main(void)
{
  /* Using the macro defined above to print 'Hello World'*/
  printf(STRING);
  return 0;
}
----------------------------------------------------------

There are more lines than in the Perl script above and there's much more to the 
creation of the executable code, but rather than duplicating Google, let me reference a 
http://www.thegeekstuff.com/2011/10/c-program-to-an-executable/[page that describes it well].  
That link describes a process that you probably don't need to know at this stage, but may well become 
useful if you start writing your own code.


== Utilities to build programs

There are typically several sets of tools used to build programs. Most of the ones described below are for 'compiling' programs but some are often also used with 'interpreted' programs, especially when they need a compiled library to provide functionality.  Some languages, notably Java are hybrids, composed of portable byte code and OS and machine-specific interpreters.


=== Configuration tools
==== autogen & autoconf
These are the tools that take simple templates and generate full code to help generate the 'Makefile' input templates and 'configure' files described below.  They are part of the https://en.wikipedia.org/wiki/GNU_toolchain[GNU toolchain] and are very useful (if initially confusing) in providing a semi-formal process for keeping nontrivial code projects under control.
http://inti.sourceforge.net/tutorial/libinti/autotoolsproject.html[Here is a good description of how they work.]

==== the ./configure script
The 'configure' script is often supplied as part of a software distribution (either complete as 'configure' or in template form as 'configure.ac' - link:#makingmakefiles[see below]) and is used to query the system for locations of libraries, testing the libraries to see if they provide the functions it needs, and based on those results and the directives provided by the user, create a 'Makefile' that directs the compiler(s) to create the necessary libraries and applications.  It often has many options which can be viewed by supplying the '--help' option to it:

  ./configure --help
  
If the application is fairly simple, running './configure' by itself may often be enough to generate a workable 'Makefile'.  However, a complex configure script might look like this:

----------------------------------------------------------
./configure --prefix=/data/apps/octave/4.2.1  --with-openssl=auto \
 --with-java-homedir=/data/apps/java/jdk1.8.0_111 \
 --with-java-includedir=/data/apps/java/jdk1.8.0_111/include \
 --with-java-libdir=/data/apps/java/jdk1.8.0_111/jre/lib/amd64/server \
 --enable-jit \
 --with-lapack \
 --with-blas \
 --with-x --with-qt=5 \
 --with-OSMesa-includedir=/usr/include/GL/ \
 --with-OSMesa-libdir=/usr/lib64/ \
 --with-blas --with-lapack \
 --with-hdf5-includedir=/data/apps/hdf5/1.8.13/include \
 --with-hdf5-libdir=/data/apps/hdf5/1.8.13/lib \
 --with-fftw3-includedir=-I/data/apps/fftw/3.3.4-no-mpi/include \
 --with-fftw3-libdir=/data/apps/fftw/3.3.4-no-mpi/lib \
 --with-curl-includedir=/data/apps/curl/7.52.1/include \
 --with-curl-libdir=/data/apps/curl/7.52.1/lib \
 --with-magick=/data/apps/curl/7.52.1/lib \
 --with-openssl=no 
----------------------------------------------------------
https://en.wikipedia.org/wiki/Configure_script[Wikipedia has more information] on it as well.

[[makingmakefiles]]
==== Generating the Makefile and configure scripts
Often, especially in projects 'cloned' from a github repository, the end-user 'configure' and Makefile scripts don't exist yet.  Instead the git repository provides the template files 'configure.ac' and 'Makefile.am'  that have to be converted into the usable  scripts.  If those precursor files do exist, the usual approach is to run 'autoreconf -i' in that dir:

----------------------------------------------------------
# Using 'fpart' utility as an example

$ git clone https://github.com/martymac/fpart.git
Cloning into 'fpart'...
remote: Counting objects: 1370, done.
remote: Total 1370 (delta 0), reused 0 (delta 0), pack-reused 1370
Receiving objects: 100% (1370/1370), 263.69 KiB | 0 bytes/s, done.
Resolving deltas: 100% (913/913), done.
Checking connectivity... done.

$ cd fpart
$ ls
COPYING  Changelog  Makefile.am  README  TODO  configure.ac  contribs/  man/  src/  tools/

$ autoreconf -i
configure.ac:7: installing './compile'
configure.ac:30: installing './config.guess'
configure.ac:30: installing './config.sub'
configure.ac:4: installing './install-sh'
configure.ac:4: installing './missing'
src/Makefile.am: installing './depcomp'

$ ls # note that the templates have been converted to 'Makefile.in' & 'configure' for the next stage
COPYING      README           compile*       configure.ac  man/
Changelog    TODO             config.guess*  contribs/     missing*
Makefile.am  aclocal.m4       config.sub*    depcomp*      src/
Makefile.in  autom4te.cache/  configure*     install-sh*   tools/

# now run the configure script which generates the Makefile
$ ./configure  
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
...
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating tools/Makefile
config.status: creating man/Makefile
config.status: executing depfiles commands

$ ls  
COPYING      Makefile.in  autom4te.cache/  config.status*  contribs/    missing*
Changelog    README       compile*         config.sub*     depcomp*     src/
Makefile     TODO         config.guess*    configure*      install-sh*  tools/
Makefile.am  aclocal.m4   config.log       configure.ac    man/

# and now use 'make' to call the compiler to generate object code and link it all together.
$ make  
Making all in src
make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart/src'
gcc -DPACKAGE_NAME=\"fpart\" -DPACKAGE_TARNAME=\"fpart\" -DPACKAGE_VERSION=\"0.9.4\" -DPACKAGE_STRING=\"fpart\ 0.9.4\" -DPACKAGE_BUGREPORT=\"ganael.laplanche@martymac.org\" -DPACKAGE_URL=\"\" -DPACKAGE=\"fpart\" -DVERSION=\"0.9.4\" -DHAVE_LIBM=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 ...
...

make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart/src'
Making all in tools
make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart/tools'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart/tools'
Making all in man
make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart/man'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart/man'
make[1]: Entering directory '/home/hjm/Downloads/fpart/fpart'
make[1]: Nothing to be done for 'all-am'.
make[1]: Leaving directory '/home/hjm/Downloads/fpart/fpart'

# the 'src' dir is often where the source code is located and 
# where the executable is left if the compile is successful
$ ls src 
Makefile     file_entry.c        fpart-fpart.o      fpart.h    partition.c
Makefile.am  file_entry.h        fpart-options.o    fts.c      partition.h
Makefile.in  fpart*              fpart-partition.o  fts.h      types.h
dispatch.c   fpart-dispatch.o    fpart-utils.o      options.c  utils.c
dispatch.h   fpart-file_entry.o  fpart.c            options.h  utils.h

# note the 'fpart*' executable (the '*' indicates that the execute bit is set.

# check what it is - look! It's a 'real', compiled application
$ file src/fpart 
src/fpart: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
BuildID[sha1]=39c88d115a6aa623694c0bf72ff08687d161aeb0, not stripped

$ ldd src/fpart   # it uses some std Linux shared libs
        linux-vdso.so.1 =>  (0x00007ffdb39fd000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7790f77000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7790bad000)
        /lib64/ld-linux-x86-64.so.2 (0x0000564e0a400000)

# the fact that it's 'not stripped' means that you can see all the 
# function information with 'nm'
$ nm src/fpart  
0000000000608e18 d _DYNAMIC
0000000000609000 d _GLOBAL_OFFSET_TABLE_
0000000000405d20 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w _Jv_RegisterClasses
0000000000407f40 r __FRAME_END__
00000000004074f4 r __GNU_EH_FRAME_HDR
0000000000608e10 d __JCR_END__
... 
<much more info omitted>

----------------------------------------------------------


=== make
There are a number of utilities to make building programs easier.  The most widely 
used are based on the https://en.wikipedia.org/wiki/Make_(software)['make' utility], 
a system that calculates dependencies and allows you to re-run only those part 
of the dependencies which have failed. In addition, it can run these builds in 
parallel which can tremendously speed up large compiles, especially when you're 
in the debugging stages.  See link:#makingmakefiles[the section above] for more 
info on generating usable Makefiles.

.Alternative uses of Make 
[NOTE]
=============================================================
It can also be used to similarly automate and calculate the dependencies 
for any large system, such as an analytical tree for RNASeq or other 
arbitrary analysis, which allows for exact replication of analyses. 
https://goo.gl/pDQvRT[See this search result] for other approaches to this approach.
=============================================================

There are several systems that use 'Makefiles'. The 2 most popular are 'GNU make' and 'CMake', 
tho they use Makefiles in different ways; 'GNU Make' *consumes* Makefiles (typically 
produced by the autoconf toolchain) and 'CMake' *produces* Makefiles to be processed 
by 'Gnu Make' in the same way that it consumes Makefiles produced by other mechanisms.

==== GNU make
https://www.gnu.org/software/make/[GNU Make] is a core component of every Linux system 
and is a *build system* - it takes 'Makefiles' and directs the compilers to generate, 
test, and install code.  If a Makefile was built correctly by 'configure' or 'CMake' 
or was supplied already with an application, you could type 'make' in the Makefile 
directory and 'make' would direct the appropriate compiler (defined in the Makefile 
or systemwide) to build the application or library.

==== CMake
https://cmake.org/[CMake] is a *system for building multiplatform build systems* 
(a higher level than 'GNU Make'), so if you were creating code that you wanted to 
run on Windows, Macs, and Linux, 'CMake' would be a better choice than 'GNU Make'.
'CMake' does generate Makefiles that 'GNU Make' can use, so from a Linux POV, you 
could think of 'CMake' as roughly equivalent to the 'configure' script described above.

You would typically use 'CMake' to generate the Makefile and then have 'GNU make' 
read that Makefile to generate the code, altho often 'CMake' will do all that for you.


=== Java-specific "Makes"
Because Java is a hybrid system (usually partly compiled, partly interpreted), it uses 
tools from both the 'Interpreter' world and the 'Compiler' world.
http://ant.apache.org/[Ant], https://maven.apache.org/[Maven], https://gradle.org/[Gradle], 
and https://gant.github.io/[Gant]  are systems that work like 'Make/CMake', but are 
specific for Java, most often because of the multi-file aspect of Java and the startup 
time of the Java Compiler 'javac'.  C/C++/Fortran tends to have a flatter structure with 
fewer files and therefore work well with make/CMake. However Java projects tend to 
have lots of small files and calling 'javac' on hundreds of small files has a substantial 
overhead.  Ant/Maven and friends discover all the dependencies and requirements and 
call 'javac' on all the files at the same time, sparing the file-by-file startup times.

=== Rake for Ruby
https://en.wikipedia.org/wiki/Rake_(software)[Rake] is a 'make' system for Ruby, 
another interpreted language.


=== the 'Preprocessor' (cpp)
The 'preprocessor' examines the http://www.cplusplus.com/doc/tutorial/preprocessor/[preprocessor 
directives] in various languages (the statements in C/C++ prefixed by "#" such as 
'#define, #include, #undef', etc) and resolves them, notably the '#include' 
statements that point to the header files (*.h) that define interfaces for all 
external functions.

=== the 'Lexer' (lex/flex)
A 'lexer' is a program that generates tokens (scans a series of characters and extract 
and assigns values to them) as part of determining how a program should operate. For 
example, the process of determining the options fed to a program would require such 
'lexing'.  https://en.wikipedia.org/wiki/Lexical_analysis[See Wikipedia for more examples.]

=== the 'Parser' generator (yacc/bison)
The 'parser' often operates with the lexer to automatically generate the code to process 
the lexer tokens.  Because of what it does, it's often referred to a 
https://en.wikipedia.org/wiki/Compiler-compiler[compiler compiler] - it generates the 
structures and formalism to process an arbitrary language into computer code.  
The lexer and parser are often used in option processing and especially to generate 
the language parsing that an internal language or commands that a complex program
might use. Gnuplot and all interpreters would require such functionality.  ie, in 
Gnuplot, the lexer/parser would enable you to write routines that differentiate 
between 'plot' and 'plod'.


=== the 'Linker' / 'Loader' (ld)
The linker 'ld' is the program that resolves all the function calls to either code 
you wrote or to functions in the system libraries and imports them either entirely 
(in a https://en.wikipedia.org/wiki/Static_build[static linkage]) or a reference or 
symbol (in a https://en.wikipedia.org/wiki/Library_(computing)#Dynamic_linking[shared 
or dynamic library linkage]). https://en.wikipedia.org/wiki/Linker_(computing)[Read 
more at Wikipedia].  The Loader also is responsible for starting the program, by 
loading it into RAM, notifying the kernel that a new process is running, resolving 
symbols & functions, requesting memory to run in, and making sure that the process 
obeys the rules for execution.


=== 'List Dynamic Dependencies' (ldd)
'ldd' helps to debug program failures due to missing shared libraries.  If a program 
cannot find a necessary library, it will fail; 'ldd' will identify the missing library 
and possibly provide hints to where it should be. (see also the link:#rpath[RPATH 
environment variable])

[[patchelf]]
== GLIBC versions & *patchelf*

https://nixos.org/patchelf.html[patchelf] is a magical utility that allows you to 
execute programs that use different (usually newer) 'libc' libraries on other Linux kernels.  

*Some background:*  If you're 
working on a cluster that typically uses an older kernel and 'libc' than your laptop 
Linux, then running a compiled executable copied from your laptop to the cluster 
generally will not work.  If you try to do this, you'll get the dreaded *GLIBC* error.

In the example below, I'm using a program called 'tacg', compiled on my Ubuntu Xenial (16.04) 
laptop which supports GLIBC_2.23. I'm trying to  get it running on our ancient CentOS 6.9 cluster 
which only supports GLIBC_2.12.  The laptop-compiled executable 'tacg' has been copied to the cluster
so the files are bit-for-bit identical.

If I try to run the laptop-compiled executable on the cluster..
-----------------------------------------------------------------------------
hmangala@cluster : ./tacg  
./tacg: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./tacg)
-----------------------------------------------------------------------------

We can verify that with 'ldd' as described above:

-----------------------------------------------------------------------------
hmangala@cluster : $ ldd tacg
./tacg: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./tacg)
        linux-vdso.so.1 =>  (0x00007ffef02c1000)
        libpcre.so.3 => not found  <<<< 
        libm.so.6 => /lib64/libm.so.6 (0x00007fa5225c9000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa522229000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa522881000)
-----------------------------------------------------------------------------

So there are at least 2 problems indicated immediately here:
- the GLIBC_2.14 incompatibility
- not finding a 'libpcre.so.3' (Perl Compatible ReGex)

The 'libpcre.so.3' problem can be addressed if our creaky CentOS repositories have such a library.  
Since it's a very popular lib, it probably does exist. Hmmmmm.  
Yes it does, but the version is very old ('libpcre.so.0' - compare with the 'libpcre.so.3' 
requirement) - too old for our use.  

We could try a 'false-flag' symlink (creating a falsely versioned symlink to the existing library), 
and it seems to have kinda/worked, or at least shut up the error.

-----------------------------------------------------------------------------
root@cluster: # ln -s /lib64/libpcre.so.0.0.1 /lib64/libpcre.so.3  # in this case done as root.

hmangala@cluster : $ ldd tacg
./tacg: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./tacg)  
                                   ^^^^^^ this is still a problem
        linux-vdso.so.1 =>  (0x00007ffef7551000)
        libpcre.so.3 => /lib64/libpcre.so.3 (0x00007f2ad4601000)  
        ^^^^^^^^^^^^  But now the libpcre is found, even tho
                      it's really the wrong version
        libm.so.6 => /lib64/libm.so.6 (0x00007f2ad4371000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2ad3fd1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2ad4861000)
-----------------------------------------------------------------------------
So how to address the GLIBC problem?

*Some more background:*  Linux libraries are extremely backwardly compatible; newer libraries are 
very rarely incompatible with older ones (which is why the 'false-flag symlinking' above often works).  
However, when a program runs, it checks the version of the libraries it's using to verify that 
they're new enough. In cases where the libraries are too old, the program refuses to run and generates 
the error seen directly above.

You CAN provide newer libraries for newer programs on older systems via the above-referenced 'patchelf'. but you 
have to provide not only the new 'libc.so' library, but *ALL the core libraries* that go along with it. 
These are stored in the '/lib64' dir on RedHat distros and in the '/lib/x86_64-linux-gnu' dir on Debian distros.  
Fortunately, this is a fairly small set of libraries (about 30MB) altho you may have
provide additional newer libraries if the executable demands them (an example of that is also shown below).

These newer system libraries *must*  be stored separately from the core GLIBC libraries.  
If you overwrite the standard libraries, you'll probably end up with a dead kernel and system.

The accessory libraries - 'libpcre.so', 'libpthread.so', 'libbooger.so' (those required by the application but 
NOT in the '/lib64' or '/lib' dirs), are typically stored separately in your personal 
'~/lib' dir and referenced via the link:#ld_library_path[LD_LIBRARY_PATH] variable, 
set explicitly for the execution of the newer executable.

The 'patchelf' utility allows you to change the linker called by the program to an alternative that you supply, as well 
as setting the link:#rpath[RPATH] variable to tell the program to search non-standard paths for the 'libc' libraries.

Here's an example using a generic program called 'tacg' to show what I mean.

-----------------------------------------------------------------------------
# 1st, let's note the size of the executable on my laptop:
hjm@laptop : $ ls -l tacg              
-rwxr-xr-x 1 root root 1484760 May 17  2018 tacg*

# then check it, freshly copied on the cluster:
hmangala@cluster : $ ls -l tacg
-rwxr-xr-x 1 hmangala staff 1484760 Apr 13 14:00 tacg*

# looks like it's the same size, but is it?
hjm@laptop : $ md5sum tacg
e00aef6495d24dd622c3cb5d3d640c57  tacg

hmangala@cluster : $ md5sum tacg
e00aef6495d24dd622c3cb5d3d640c57  tacg   # yup, it is.

# now let's do the patchelf magic:
hmangala@cluster : 
$ patchelf \
 --set-interpreter /data/users/hmangala/tacg_dir/ld-linux-x86-64.so.2 \
 --set-rpath /data/users/hmangala/tacg_dir /data/users/hmangala/tacg

# now note what hash of the cluster copy is:
hmangala@cluster : $ md5sum tacg
e0b2c9f1cc901c4693a8cb651ad62f6b  tacg  # different than on the laptop above

# and now the loader specified is:
hmangala@cluster : $ strings tacg | grep ld-linux
/data/users/hmangala/tacg_dir/ld-linux-x86-64.so.2

# instead of the laptop version:
hjm@laptop : $ strings tacg | grep ld-linux
/lib64/ld-linux-x86-64.so.2

# now when we try to run tacg on the cluster
hmangala@cluster : $ ./tacg
./tacg: relocation error: /lib64/libpthread.so.0: symbol __vdso_clock_gettime, 
version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
-----------------------------------------------------------------------------

Hurray!! (sort of).. An error, but NOT the GLIBC error.  
The one above is from a required library 'libpthread.so.0' that is not compatible with 
the libc.so that we copied in.  The resolution of this requires the 'libpthread' that 
is compatible with the newer GLIBC.  Solving this requires copying the 'libpthread.so'
library to the cluster as well and placing it with the 'libpcre.so' in my private 
~/lib dir.  This could have been foreseen if I had paid attention to the output 
of 'ldd' initially on my laptop:

-----------------------------------------------------------------------------
hjm@laptop : $ ldd tacg
        linux-vdso.so.1 =>  (0x00007ffd097db000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fc07353b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc073232000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc072e68000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc072c4a000) <<<< Aha!
        /lib64/ld-linux-x86-64.so.2 (0x0000558564494000)
-----------------------------------------------------------------------------

OK - I've found and copied in the new 'libpthread.so' library (and also the newer 'libpcre.so')
to my private '~/lib' dir.  Note that I copied them with their integral symlinks (using 'rsync -a').

-----------------------------------------------------------------------------
hmangala@cluster : $ ls -l lib
total 588
lrwxrwxrwx 1 hmangala staff     17 Apr 13 13:49 libpcre.so.3 -> libpcre.so.3.13.2
-rw-r--r-- 1 hmangala staff 456632 Mar 24  2016 libpcre.so.3.13.2
-rwxr-xr-x 1 hmangala staff 138696 Feb  5 12:11 libpthread-2.23.so*
lrwxrwxrwx 1 hmangala staff     18 Apr 13 13:49 libpthread.so.0 -> libpthread-2.23.so*

# once we've adjusted the LD_LIBRARY_PATH variable

export LD_LIBRARY_PATH=/data/users/hmangala/lib:$PATH

# we try to execute tacg again, and ...

hmangala@cluster : $ ./tacg
type 'tacg -h' or 'man tacg' for more help on program use or type: 
tacg -n6 -slLc -S -F2 < [your.input.file]
for an example of what tacg can do.
-----------------------------------------------------------------------------

The above output is what is expected when 'tacg' is executed without any options.
ie, IT WORKED!!  We're now running a formerly incompatible executable on the old 
kernel.

[[staticdynamic]]
== Static vs Dynamic Linking
There are 2 types of compiled programs: https://en.wikipedia.org/wiki/Static_build[statically linked] and https://en.wikipedia.org/wiki/Library_(computing)#Dynamic_linking[dynamically linked] (aka 'shared lib' or simply 'shared').
A 'statically' linked program has all the libraries and functions included in the binary
package that you execute, resulting in a much larger file on disk.  However, this dramatically increases the probablility that a statically linked program will run on an OS.  A 'dynamically' linked program only contains the 'calls' to the shared libraries, relying on the OS to provide the libraries and the user to provide the approporate environment PATHs to them (via link:#ld_library_path[LD_LIBRARY_PATH] or link:#rpath[RPATH])

So (assuming you cared) how do you tell whether a program is shared or static?  You point the 'ldd' (list 
dynamic dependencies) program at the program and it will tell you:

- whether the program is 'static' or 'dynamic'
- whether the immediate library requirements of a 'dynamic' program are met by the current
  environment (if a shared library requires a further library, ldd will also identify it.

A 'dynamic linking' is like going on vacation to Brazil and packing only your personal clothes, assuming that everything else will be provided for you.  A static linking assumes that you need your clothes of course, but also the towels, sheets, kitchen utensils, furniture, and your car.  Instead of traveling with a suitcase, you travel with a shipping container.

Sometimes, especially with complex programs, the program you think you're executing is not the actual program.  For a number of reasons, the actual executable is CALLED BY a 'wrapper script', often written in bash (the 'de facto' command language of Linux).  This allows the wrapper script to set a number of parameters based on how the program is being called, determine the exit status of the program and do various supporting jobs based on how the job exited.  If you've ever seen a popup window that says something to the effect "Sorry. The program 'salmonspots' has crashed. Would you like to send a crash report back to the developers?", then you're probably dealing with a program that has been called with a wrapper script (or is communicating on an application bus (like https://dbus.freedesktop.org/doc/dbus-tutorial.html#whatis[dbus]).

So should you run 'ldd' on such a wrapper, you'll get:

----------------------------------------------------------
ldd /data/apps/R/3.4.1/bin/R
        not a dynamic executable
----------------------------------------------------------

In this case, the only way to determin where the actual executable lives is to page thru the wrapper script.  In the above case, we see:

[source,bash]
----------------------------------------------------------
#!/bin/sh
# Shell wrapper for R executable.

R_HOME_DIR=/data/apps/R/3.4.1/lib64/R
if test "${R_HOME_DIR}" = "/data/apps/R/3.4.1/lib64/R"; then
   case "linux-gnu" in
   linux*)
     run_arch=`uname -m`
     case "$run_arch" in
        x86_64|mips64|ppc64|powerpc64|sparc64|s390x)
          libnn=lib64
          libnn_fallback=lib
<etc>
----------------------------------------------------------

.. And far below, we find that the actual R executable is buried in:

  R_binary="${R_HOME}/bin/exec${R_ARCH}/R"
which in this case means
 /data/apps/R/3.4.1/lib64/R/bin/exec/R
 
 If we run 'file' and then ldd on this 'terminal R', we find:

----------------------------------------------------------
$ file R
R: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped


$ ldd R
        linux-vdso.so.1 =>  (0x00007ffd5ed90000)
        libgfortran.so.3 => /data/apps/gcc/5.3.0/lib64/libgfortran.so.3 (0x00007fe1274b0000)
        libgomp.so.1 => /data/apps/gcc/5.3.0/lib64/libgomp.so.1 (0x00007fe127288000)
        libR.so => /data/apps/R/3.2.3/lib64/R/lib/libR.so (0x00007fe126c30000)
        libmpi.so.1 => /data/apps/mpi/openmpi-1.8.8/gcc/5.3.0/lib/libmpi.so.1 (0x00007fe126940000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe126720000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fe126388000)
        libquadmath.so.0 => /data/apps/gcc/5.3.0/lib/../lib64/libquadmath.so.0 (0x00007fe126148000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fe125ec0000)
        libgcc_s.so.1 => /data/apps/gcc/5.3.0/lib/../lib64/libgcc_s.so.1 (0x00007fe125ca8000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fe125aa0000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fe125898000)
        libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fe125640000)
        libreadline.so.6 => /lib64/libreadline.so.6 (0x00007fe1253f8000)
        libicuuc.so.42 => /usr/lib64/libicuuc.so.42 (0x00007fe1250a0000)
        libicui18n.so.42 => /usr/lib64/libicui18n.so.42 (0x00007fe124d08000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe1277d0000)
        libopen-rte.so.7 => /data/apps/mpi/openmpi-1.8.8/gcc/5.3.0/lib/libopen-rte.so.7 (0x00007fe124a80000)
        libopen-pal.so.6 => /data/apps/mpi/openmpi-1.8.8/gcc/5.3.0/lib/libopen-pal.so.6 (0x00007fe124798000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007fe124588000)
        libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x00007fe124378000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fe124170000)
        libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fe123f48000)
        libicudata.so.42 => /usr/lib64/libicudata.so.42 (0x00007fe122df8000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fe122af0000)
----------------------------------------------------------

The above 'file' command shows that it is 'dynamically linked (uses shared libs)' and the 'ldd' command below shows that all the required shared libs have been resolved during the build process.  If they had not been resolveable, you would see an entry like this:

----------------------------------------------------------
        ..
        libicudata.so.42 => not found
        ..
----------------------------------------------------------


which would send you off on a quest to find out how the damn thing had been built in the first place and where the missing library has gone to. (In many cases, the reason is bc the module supplying the missing library wasn't loaded by by the responsible module file.)

This is a case which will make you wish that you (or the author) had built it as a 'static executable'

=== Fixing Missing Libraries

Quite often, especially when you've copied a 'shared' application from another Linux distribution, you will find that it complains about missing libs when you know that you already have that library or you have a similar lib that may be a version higher or lower than the specific one demanded.

=== Modifying LD_LIBRARY_PATH
If you know that you have the missing lib, your current LD_LIBRARY_PATH is probably missing or misconfigured.  You can determine if this is so by first locating the missing lib and then checking the value of LD_LIBRARY_PATH.  If the missing lib is called 'libpcre.so.3.12.1', try:

----------------------------------------------------------
$ locate libpcre.so
/lib/i386-linux-gnu/libpcre.so.3
/lib/i386-linux-gnu/libpcre.so.3.13.2
/lib/x86_64-linux-gnu/libpcre.so.3
/lib/x86_64-linux-gnu/libpcre.so.3.13.2
/ohome/hjm/.singularity-cache/tacg/c/lib/x86_64-linux-gnu/libpcre.so.3
/ohome/hjm/Downloads/kdirstat-2.5.3/kdirstatlibs/libpcre.so.3
/ohome/hjm/Downloads/kdirstat-2.5.3/kdirstatlibs/libpcre.so.3.12.1
/usr/lib/x86_64-linux-gnu/libpcre.so

$ printenv LD_LIBRARY_PATH
<nothing>
----------------------------------------------------------

The envar LD_LIBRARY_PATH is not set, so the system is relying on the default set of paths set in 
the files in /etc/ld.so.conf.d.  These files are editable by root, so if you have your libs in a non-standard location, you can include it by editing those files and then running 'ldconfig' which will add the libraries therein to the cache.

If you wanted to add the lib above to the LD_LIBRARY_PATH, you can by explicitly setting it:

----------------------------------------------------------
export LD_LIBRARY_PATH+=/ohome/hjm/Downloads/kdirstat-2.5.3/kdirstatlibs
----------------------------------------------------------

The application would then be able to find the missing lib and execute (at least to the next failure).

=== Symlinking close matches
You can often 'fake' a fix if you don't have the specific lib for which the application is looking.  Linux libs tend to be very conservative, so that if you need 'libpcre.so.3.13.2' and you have 'libpcre.so.3.12.1', the application may very well be able to work just fine.  All you need to do is provide a symlink to the older lib.

----------------------------------------------------------
ln -s /path/to/libpcre.so.3.12.1 /path/to/libpcre.so.3.13.1  # lie to the application
----------------------------------------------------------

=== GLIBC problems
GLIBC is the Gnu Standard C library.  It's part of every Linux distribution and is very stable.  However, if you build a program on one Linux system (especially a modern one, such as your laptop) and then copy that application to an older system (a cluster or larger system that may not be as up-to-date as your laptop, you'll often see this error:

----------------------------------------------------------
./myprog-install: /lib/tls/libc.so.6: version `GLIBC_2.4' not found (required by ./myprog-install)
----------------------------------------------------------


This is due to having a GLIBC compatibility problem.  GLIBC is almost always backward compatible (so you can always run an old program on a newer system) but can't be forward compatible (so running a new program on an old system will often yield this error.  There are 2 somewhat easy solutions and one more difficult one. The easiest solution is to recompile your program on your laptop as a link:#staticdynamic[static executable].  That will force your program to carry with it all the functionality that it will ever need.  The other somewhat easy alternative is to re-compile your program on the old system, which will resolve all the symbols into the older GLIBC.  The harder alternative is to provide your current shared program with the correct GLIBC version by supplying the required libs separately and using [patchelf] to supply a newer loader to point to them. You can do this as a regular user.

 You can also do a similar thing in a https://en.wikipedia.org/wiki/Chroot[chroot environment] that has the correct GLIBC or more conventionally, use a [container] such as [Docker] or [Singularity] to do the same thing.


== Environment Variables
Environment Variables ('envars') are shell variables set at login or during an interactive session that define or change the behavior of your shell and the execution of programs that are started from it.
You can set 'envars' to be available to subshells (using the prefix 'export') or to be restricted to the local context by omitting 'export'.  In both cases, the value of GOOBERDIR and VEGGIEDIR will vanish when you logout & login again unless:

- you have made them permanent by editing them into your '~/.bashrc' or other startup file.
- you used http://byobu.co/[byobu], https://www.gnu.org/software/screen/[screen], https://github.com/tmux/tmux/wiki[tmux], or https://wiki.x2go.org/doku.php[x2go] that maintains the login when you quit.


[source,bash]
----------------------------------------------------------
export GOOBERDIR=/home/hjm/nuts/goober
# GOOBERDIR is now available to subshell and programs started from those subshells


VEGGIDIR=/home/hjm/veg/carrots
# VEGGIEDIR is only available to THE CURRENT shell, not to subshells

----------------------------------------------------------

Here are some critical 'envars'   that will change the behavior of programs that you try to execute.

=== PATH
*PATH* defines where the OS looks for executables. A default PATH is set when you log in, typically something like:

----------------------------------------------------------
  /home/hjm/bin:/usr/local/bin:/bin:/usr/bin:/usr/sbin:/usr/X11R6/bin
----------------------------------------------------------
  
which prepends my (hjm) private 'bin' dir in front of anything else.
PATH can be expanded and modified arbitarily. Here's what my laptop PATH looks like.

----------------------------------------------------------
  /home/hjm/bin:/home/hjm/eclipse:/usr/NX/bin:/usr/local/sbin:/usr/local/bin:/bin:\
  /sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/home/hjm/intel/bin:/linux86/bin
----------------------------------------------------------

It can also be changed programmatically to point to alternative applications, especially in large systems that might use something like the https://en.wikipedia.org/wiki/Environment_Modules_(software)[environment modules] or https://lmod.readthedocs.io/en/latest/[lmod] systems, which also manipulate the following variables to the same end.

=== LD_LIBRARY_PATH
*LD_LIBRARY_PATH* defines the directories thru which the loader will search to find shared libraries.  The following demonstrates the changes a 'module load' can have on the LD_LIBRARY_PATH.

[source,bash]
----------------------------------------------------------
$ printenv LD_LIBRARY_PATH

# nothing shown above

$ module load R/3.4.1
# ... 
R is a language optimized for statistics and mathematics, and with the
BioConductor package (installed), Bioinformatics and genomics.  
# ...

$ printenv LD_LIBRARY_PATH
/data/apps/R/3.4.1/lib64/R/lib:/data/apps/cern_root/5.34.36/lib:/data/apps/hdf5/1.8.11/lib:/data/apps/curl/7.52.1/lib:/data/apps/pcre/8.40/lib:/data/apps/xz/5.2.3/lib:/data/apps/bzip2/1.0.6/lib:/data/apps/zlib/1.2.8/lib:/data/apps/fftw/3.3.4-no-mpi/lib:/data/apps/tcl-tk/8.6.4/lib:/data/apps/gcc/5.3.0/lib64

----------------------------------------------------------

=== RPATH
https://en.wikipedia.org/wiki/Rpath[RPATH] is the search path hard-coded into a library when it's built, to point to required libraries needed to resolve missing symbols.  Because it's hard-coded, it's a fairly fragile mechanism for resolving such symbols and it generally better to use the 'LD_LIBRARY_PATH' to point to library locations.  If you have to use it, you can define the envar *LD_RUN_PATH* to be read by the linker 'ld' or supply it explicitly with '-rpath=/path/to/libraries'.


=== LDFLAGS,LIBS
*LDFLAGS* is an envar that often contains both the '-lname' and '-Llocation' of the libraries that need to be found in order to satisfy the compilation (and so can partly replace the 'LD_LIBRARY_PATH' in the compilation phase, but NOT in the execution phase).  

The name is constructed like: 
'-l*name* where *name* is the diagnostic part of the library.  So if the library was named 'libgomp.so.4.5', the LDFLAGS abbrieviation would be '-lgomp'.  The specification of the '-lname' is also often associated with the envar 'LIBS', depending on who is writing the code.

The 'location' of the library is specified in LDFLAGS with the prefix '-L/full/path/to/lib/dir'

[source,bash]
----------------------------------------------------------
# if the libraries of interest were the bzip2 and pcre libs, the envar would be set:

export LDFLAGS+="-L/data/apps/bzip2/1.0.6/lib -lbz2 -L/data/apps/pcre/8.40/lib -lpcre"

# the above line adds the locations and libnames of libbz2.so and libpcre.so to the existing LDFLAGS envar
----------------------------------------------------------

=== CPPFLAGS
*CPPFLAGS* is used to provide paths to header (*.h) files that are not on the standard include path 
(usually '/usr/include').  The format used is similar to the 'LDFLAGS' above:

[source,bash]
----------------------------------------------------------
# if the libraries of interest were the bzip2 and pcre libs, the envar would be set:
export CPPFLAGS+="-I/data/apps/bzip2/1.0.6/include -I/data/apps/pcre/8.40/include"

# the above line adds the locations of the relevant header files to the CPPFLAGS envar
----------------------------------------------------------


=== MANPATH
*MANPATH* is simply the path that the 'man' program should search in order to find the man pages for an entry.

[source,bash]
----------------------------------------------------------
# if you needed to add a specific path to find the man pages for bzip2 and pcre, the envar would be set:
export MANPATH=":/data/apps/bzip2/1.0.6/man:/data/apps/pcre/8.40/man"

# the above line adds the locations of the relevant man pages to the MANPATH envar
# note that the string starts with ':/data/apps...'  That syntax appends the given 
# MANPATH to the already defined one
----------------------------------------------------------


== Debugging
This section remains to be written (or at least consolidated into this doc).

=== printf
Describe the use of print/printf in debugging

=== \_\_LINE__
Associated with the https://www.lemoda.net/c/line-file-func/[\_\_LINE__] CPP var.

=== gdb
The Gnu Debugger is awesome and awful.

=== ddd
ddd  puts a friendly face on gdb.

=== valgrind
The monster memory monster debugger.


== Improving a program
What programs can be optimized and by what mechanisms?
When should you spend time optimizing?

=== Timing a program

==== time

==== /usr/bin/time

==== coding in high def timers

=== Profiling
The warning about premature optimization
Don't engage in optimization until you find out where your program is spending time.

==== oprofile

==== perf

==== PAPI

==== HPC Toolkit & Visualizer


== How are programs distributed and installed

=== Via a distribution-dependent installer

These mechanisms are highly preferred for personal installation since they can be
used to install scripts, programs, libraries, configuration files, and can also set up
the installation, alias, and initialization files.  They often cannot be used on 
large multi-user systems since they require root permissions that a normal user won't have
on a large system.

==== RedHat-derived
(RHEL, CentOS, Fedora)

  * rpm
  * yum
  
==== Debian-derived
(Debian, Ubuntu, Mint)

  * apt-get
  * synaptic
  * GUI variants of the above.

==== Alternatives
Alien, tarballs, etc

Scripts and programs are often bundled into archives of some kind, usually a 
'tar'  or 'zip' archive containing:

 * the executable script(s) or program(s) if they have been compiled
 * source code if they have not been compiled
 * the instructions for compiling and/or installing it

== What is a code repository?
Modern code development is often (and should always be) done using a code repository.
This is a system for organizing, sharing, backing and cooperating on the development of
the source code and associated documents
More frequently, code is being distributed directly from code repositories.

- from repositories
    * git
    * svn
    * cvs (ok, maybe not cvs)

== How to install your own  program


or contribute to this outline and I'll write it.