scut version: 1.30, last mod: 04-09-10

  by Harry Mangalam; <hjm@tacgi.com> || <harry.mangalam@uci.edu>
  scut v1.30 and above is released under the GNU General Public License v 3.0.
  That license can be found at: <http://www.gnu.org/licenses/gpl.html>
  I'd appreciate a note if you find it useful or find/fix a bug, or can
  offer a suggestion.

scut has 2 purposes:
 1) printing fields from lines that have one field that matches a field from
    another file in much the same way as the 'join' utility (explained below).

 2) slicing out columns out of a file and (optionally) re-ordering them
    If you had a file, a line of which was:
     0   1  2   3    4   5   6       7      8  9      10 11  12     13 14
    "now is the time for all twisted wackos to wheeze on the snoots of coots"
    and you only wanted fields 3 5 7 and 8, but you wanted them in the order:
    5 8 7 3, you could specify this by --c1='5 8 7 3', and that line would be
    output as:
    "all to wackos time"

    This function is essentially a smarter 'cut', and only REQUIRES the input
    (as STDIN, not a file name) and the columns to print (--c1='# # # #').
    If you want it to break on something other than whitespace, you have to
    specify that as well.

Usage: scut [options, below] > output_file
  --f1=[file1]    - the shorter or 'needle' file.  If using as a smarter cut,
                   use STDIN.
  --f2=[file2]    - the longer or 'haystack' file
  --xlf=[Excelfile] - can read and parse native binary Excel files with
                    Spreadsheet::Excel with the same options as used with
                    STDIN.  If there are multiple worksheets, all will be
                    processed.

  --k1=col#       - the key column from file1 (numbered from ZERO, not 1)
                     i.e the number of the column (starting from 0) that
                     has the key column name for file1 (see example below)
                     Use this to specify an ID column if you need one for
                     the --stats flag (see below). Default = 0;

  --k2=col#       - the key column from file2 (ditto)

  --c1='# # ..'   - the numbers of the columns from file1 that you want
                     printed out in the order in which you want them.  If
                     you DON'T want any columns from the file, just
                     omit the --c1 option completely.
                     If you want the whole line, type --c1='ALL'.

                     You can also use discontinous ranges like '2:4 8:10'
                     to print [2 3 4 8 9 10] and decreasing ranges like
                     '8:4' to print cols [8 7 6 5 4].  You can also negate
                     columns to remove them from a larger range '9:12' -11'
                     to print [9 10 12] or 12:1 -7:-4 to print
                     [12 11 10 9 8 3]. You can also use the 'ALL' keyword
                     to print all cols and negate the ones you don't
                     want with negative ranges - 'ALL -8:-14' to print all
                     columns EXCEPT 8-14.

                     Notes:
                     1) #s are split on whitespace, not commas.
                     2) scut also supports Excel-style column specifiers such as:
     or
  --c1='A C F ..'    (A B F AD BG etc) for up to 78 columns (->BZ)  If you want
                     more, add them to the \%excel_ids hash in the code or create an
                     algorithm that does it right.

  --c2='# # ..'   - ditto for file2
     or
  --c2='A C F ..'

  --id1='...'     - the delimiter string for file1; defaults to whitespace
                    (specify TAB ('\\t') by specifying either '\\t' or much
                    more simply 'TAB' [ --id='TAB' ]
                    (friggin shell escapes will bugger you every time)
                    It can be a multicharacter string as well such as '_|_'

  --id2='...'     - ditto for file2

  --csv='delim'   - sets the format for both file1 and file2 to process Excel-
                    formatted CSV files (argument=delim char, with text
                    enclosed with double quotes). ie:
                    7,"this is data 1","yadda badda",14.8,"my name isn't BOO"
                    for the above, use --csv=','
                    Can use 'TAB' to indicate a tab delim, as with '--id1'

  --od='...'      - the delimiter string for the output (defaults to TAB)

  --err           - generates lots of messages on stderr for debugging
                    (for large files, most of the CPU is dedicated to
                    processing the STDERR text stream (thanks for stressing
                    it, Peter), but if you need this output, you'll just
                    have to deal with it.

  NB: the following 4 options: --begin, --end, --excl --passthru currently only
  work with the single file version (as a smarter cut, not the merging functions).

  --begin=[#|regex] - specifies the line to START processing data at (for
                      example, if the file has 2 format sections and you only
                      want to process one of them).  The option can be either
                      an integer value to specify the line number, or a non-
                      repeating regular expression that unambiguously identifies
                      the line.

  --end=[#|regex] - as above, but specifies the line to STOP processing data at.

  --excl          - if added to the arguments, excludes the lines specified by
                      --begin and --end (in case you need to exclude the
                      defining header lines).
 --mod_col='#,[ab],text string'
                  - allows you to modify the specified column # by adding the
                    specified text string before or after the value.
                    --mod_col='3,a,tail end' appends the string 'tail end' to
                    the value in column 3 (remember: 0-based counts)

  --passthru      - if used, passes comments thru to the output unchanged

  --stats         - requests (per-row) descriptive stats of the output columns
                    to be appended to each line.  Includes mean, std_dev,
                    sem, counts and sum. Use the --k1 flag to define an ID
                    col; defaults to 0. For per-column stats, pipe each column into
                    'stats': <input> |scut --ic1='4' |stats
                    (stats is at:<http://goo.gl/uGsS>)

  --version       - gives the version of the software and dies.

  --nocase        - makes the merging key case INSENSITIVE.

  --sync          - whether you want the output sync'ed on file2.  The sync
                    will insert blank lines where there are comments as well.
  --help          - sends these lines to 'less' and dies on exit.
  --debug         - generates  lots of debugging info and expects file input
                    via --f1 (not STDIN) to allow pausing.

 Notes:

 = there have to be the same number of columns in each line or it will get
 confused.  The matches are case-sensitive, unless you use the '--nocase'
 option to turn it off.

 = scut sends its output to stdout, so if you want to catch the output in a
 file, use redirection '>' (see below) and if you want to catch the stderr
 you'll have to catch that as well ( >& out ).

 = scut ignores any line that starts with a '#', so you can document what
 the columns mean, add column numbering, etc, as long as those lines start
 with a '#'

 = scut always puts the matched key in the 1st column of the output

 = under Win/DOS execution, you will probably need to run it with the perl
   prefix i.e. perl scut [options] and will also have to enclose the option
   strings with DOUBLE QUOTES (\"opts\") instead of single quotes('opts').