Extract 1.3 build: 9639 released: 2014-04-05 compiled with: Java 1.8.0_131 Jet jet12.0-pro-x86/1.8.0_131

Search files for matches or non matches using Strings or regexes.

Copyright: (c) 2009-2017 Canadian Mind Products.

Java application.
Download from: http://mindprod.com/products1.html#EXTRACT

----

Notes:

You must install the Java JRE to use this program.
See http://mindprod.com/jgloss/jgloss/jre.html

This program can only be used from the command prompt, (or via an command
line style icon shortcut) e.g. under Windows command.exe or JPSoft
tcmd.exe, formerly called the DOS box. Just clicking the programs in a
directory listing will not do anything useful. Just typing the program
names at the command prompt will not either.

This program requires a manual install! See below.

This program works with vanilla text files, (e.g. ASCII files or UTF-8
Unicode files). You will need a text editor to create and view them, not a word
processor. e.g. notepad, Visual Slick Edit or other suitable text editor
http://mindprod.com/jgloss/editor.html. You must use a monospaced font
http://mindprod.com/jgloss/monospacedfonts.html (aka fixed pitch, aka
programmer font) to view your files, or they won't look properly aligned.

I put out an avalanche of free software into the world, and submit PAD
files to hundreds of distribution sites, but I rarely hear back from
anyone.  What's happening?  Does it all just work fine?  It is so
complicated nobody can figure out how to use it and they give up on it?
It is it useful?  Since everyone has the source, do people just fix the
programs to their liking themselves?  Did you have trouble installing?  Do
I presume you know too much?  I would be happy to hear from you about your
experiences, positive or negative and your requests for improvements.  A
one-line email to roedy@mindprod.com would be great.

===> Free <===
Full source included.
You may even include the source code, modified or unmodified
in free/commercial open source/proprietary programs that you write and distribute.
May be used freely for any purpose but military.
For more details on this restriction, see
http://mindprod.com/contact/nonmil.html
If you include any Canadian Mind Products code in your own applications,
your app too must be labelled non-military use only.
http://mindprod.com/contact/nonmil.html
All Java jars and source code are included. If you need the class files or Javadoc, you will have to build them yourself. To streamline the zip downloads, class files and Javadoc have been removed.

----

Prerequisites:

This program runs under any OS that supports Java,
(e.g.W2K/XP/W2003/Vista/W2008/W7-32/W7-64/W8-32/W8-64/Linux/LinuxARM/LinuxX86
/LinuxX64/Ubuntu/Solaris/SolarisSPARC/SolarisSPARC64/SolarisX86/SolarisX64/OSX/AIX...)
so long as you have
<><> Java version 1.8 <><> or later installed (32-bit or 64-bit Java).
See http://mindprod.com/jgloss/installingjava.html for details.

----

Installing on a PC:

Download source and compiled jar files to run on your own machine as an application.
First install a recent Java JDK or JVM.
See http://mindprod.com/jgloss/installingjava.html.
To install, extract the zip download with WinZip
(or similar unzip utility) into any directory you please,
often J:\ --  ticking off the <span class="click">use folder names</span> option.
To run as an application, type:
java.exe %JAVA_OPTIONS -ea -jar J:\com\mindprod\extract\extract.jar {put any parms here}

adjusting as necessary to account for where the jar file is.

----

Installing on a MacIntosh:

Use Safari to download source and compiled jar files to run on your own machine as an application.
Safari will automatically unpack the zip into ~/Downloads (version 10.5)
[or on the Desktop (version 10.4 and earlier)].
First install a recent Java JDK or JVM.
See http://mindprod.com/jgloss/installingjava.html.
You may optionally move the download tree to a permanent home.
I don't have a MacIntosh, just a PC, so I can't test my Java programs for Mac compatibility.
In theory they should work without problems, but in practice that does not always happen.
If you have problems please, let me know, preferably with screenshots and complete verbatim error messages.
To run as an application, without parameters, just double click the jar file.

To run as an application with parameters, in bash shell type:
open Terminal.app
cd ~/Desktop
java.exe -ea -jar com/mindprod/extract/extract.jar {put any parms here}

adjusting as necessary to account for where the jar file is.

----

Rebuilding:

The zip already contains the necessary jar files, so unless you modify
the program, there is no need to recompile the source or rebuild the jar.
Configure.java basedir="E:/" in rebuild.xml to the drive where your files are.
Use ANT and rebuild.xml, not build.xml, to recompile and recreate the jar.

----

Use:

a suite of four utilities to scan a set of files looking for a set of strings:
extract  - displays all lines in all files with a match on one or more strings, case-sensitive.    (precise)
extracti - displays all lines in all files with a match on one or more strings, case-insensitive.  (relaxed)
without   - displays all lines in all files with no match on any of the strings, case-sensitive.   (precise)
withouti  - displays all lines in all files with no match on any of the strings, case-insensitive. (relaxed)
It works on text and html files.
To find matching lines in a single file, type:
  java.exe -jar C:\com\mindprod\extract\extract.jar "wombat" "zebra" - myfile.html
Everything is case-sensitive. The results display on the
console. You can redirect with in the usual way with >
results.txt
You can also list several files on the command line:
  java.exe -jar C:\com\mindprod\extract\extract.jar "wombat" "zebra" - myfile.html C:\mydir\another.html
To use search the current directory of files:
  java.exe -jar C:\com\mindprod\extract\extract.jar "wombat" "zebra" - .
Sorry no wildcards, just . , and ..
DON'T USE WILDCARDS (*.xxx) unless you deeply understand how
they work. See http://mindprod.com/jgloss/wildcard.html.
Windows expands them, not the utility, and feeds them to the
utility as a giant list of all the directories and files in
the current directory. The utility will thus tend process
all the files in your directories, when you just meant to
process the files in the current directory.
The -s switch makes all subsequent directories searched
recursively to include all their subdirectories.
e.g.
  java.exe -jar C:\com\mindprod\amper\extract\extract.jar "wombat" "zebra" - -s E:\mindprod
will scan all files in the mindprod directory tree.
INVOKING
If you have the jar extension set up as executable, you can
abbreviate:
  C:\com\mindprod\extract\extract.jar "wombat" "zebra" - E:\mindprod
If you have Jet, you can compile the jars and abbreviate even further:
  extract.exe "wombat" "zebra" - E:\mindprod
Instead of extract, you can use one of the other utilities
that work the same way: extracti, without, withouti.
EXTENSIONS
Because extract is designed to work only with text file, it ignores all files
except those with the following extensions:
ans, asm, bat, batfrag, btm, btmfrag, c, cfrag, cmd, cpp,
cppfrag, css, cssfrag, csv, csvfrag, ctl, doc, dtd, dtdfrag,
e, h, hfrag, hpp, hppfrag, htm, html, htmlfrag, ih, ini,
java, javafrag, jnlp, jnlpfrag, jsp, jspfrag, list, log, look,
lst, mac, mft, pas, policy, prn, properties, ps, rh, sh,
site, sql, sqlfrag, tab, txt, use, wiki, xml, xmlfrag, xsd,
xsdfrag.
If you need more extensions, please ask, or add them yourself in Extract.java.
SWITCHES
-all
You may use the -all switch on the command line to extract
only if all the strings match.
-where
You may use the -where switch on the command line if you
want to have the output include the name of the file and
line number where the line was found in CSV format.  It
appears on the console. If you want to capture it to a file
use > redirection.
-s
-s means include files in all subdirectories of a directory
mentioned.
-
Dash separates the strings and regexes from the file names.
QUOTING
Depending on your operating system, there are a number of
characters that have magic meaning on the command line. They
won't necessarily be passed through to the program. Ones to
watch out for in Vista include " \ & ^ + | < > space.
Try enclosing awkward characters in quotes e.g. "<& | >"
To put a " inside quotes use \" e.g. "he said \"Hi\"".
To put a \ inside quotes use \\ e.g.
"the \\ is called backslash; the / is called slash".
If these drive you to distraction, you might try using
regexes, since the OS does not interfere with them in any
way.
REGEXES
Regexes are an advanced feature primarily for programmers.
The regexs to search for are put one per line in a file
with any extension.
Read up on how they work at http://mindprod.com/jgloss/regex.html
java.exe -jar extract.jar "apple" "pear" @myregex.txt - somefile1.txt -s somedir
extract.jar "apple" "pear" @myregex.txt - somefile1.txt -s somedir
extract.exe "apple" "pear" @myregex.txt - somefile1.txt -s somedir
Because command shells interfere so much with special
characters on the command line, it would be highly confusing
to put regex strings directly on the command line.  Instead
you put your regexes in tiny separate files.  You but them
on the command line to the left of the dash, with a leading
@, not part of the filename itself.
The tiny files must be incoded in UTF-8.  They don't include
Java string quoting.  They don't use \uxxxx the way Java
strings to you just key your accented characters directly.
Use the Quoter amanuensis
http://mindprod.com/applet/quoter.html to help you compose
your regex strings with just regex quoting, not Java string
quoting.
For example if you wanted to search for either \ or / just
before the word html, HTML hTMl... (case-insensitive) you
would code (?i)[\\/]html in your little regex file not
"(?i)[\\\\/]html" as you would in a Java program.
It does not matter whether you use extract, extracti,
without, withouti, the regexes are all case-sensitive unless
you use (?i) embedded in the regex to turn on case
insensitive mode.  You can go back and forth in the same
regex using (?-i) to flip back to case-sensitive mode.
You can  mix and match as many regular search strings and
regexes as you can fit on the command line. You can speed
the program up slightly by putting the most likely matches
first. The results are the same no matter what order you put
them.
Be careful that your editor does not trim or add any
trailing blanks on your regexes. You are safer to use
\p{Blank} or \s (which has slightly different meaning).
Any control characters in your regex files are stripped out
prior to use. This means you can break your regexes into
several lines in the file without penalty.
LIMITATIONS
1. Extract just does searches, no replacements.
2. Extract does not show you to context of where it found
each line, just the line itself.
3. Extract is a batch, command line program. There is no GUI
and no interaction.
4. Extract offers no debugging tools to help you figure out
why your regexes failing to find the lines you expect it to.
5. Regexes cannot span lines.
6. Java-style regexes only, no Unix, Perl, Funduc, SlickEdit etc.
Why the haystack logo?  These utilities help you find your
needles in a haystack.


----

Version History:

 1.0 2009-02-26 initial release

 1.1 2009-02-27 all switch means all strings must match
                where switch means display where each line found file/line #

 1.2 2009-06-03 fix bug, formerly ignored search strings one char long.

 1.3 2014-04-05 fix bug, formerly could not handle more than one regex string.

-30-