Pluck 1.0 build: 9639 released: 2008-06-04 compiled with: Java 1.8.0_131 Jet jet12.0-pro-x86/1.8.0_131

Scans a collection of files for instances of a regex pattern.

Copyright: (c) 2008-2017 Canadian Mind Products.

Java application.
Download from: http://mindprod.com/products1.html#PLUCK

----

Notes:

You must install the Java JRE to use this program.
See http://mindprod.com/jgloss/jgloss/jre.html

This program can only be used from the command prompt, (or via an command
line style icon shortcut) e.g. under Windows command.exe or JPSoft
tcmd.exe, formerly called the DOS box. Just clicking the programs in a
directory listing will not do anything useful. Just typing the program
names at the command prompt will not either.

This program requires a manual install! See below.

This program works with vanilla text files, (e.g. ASCII files or UTF-8
Unicode files). You will need a text editor to create and view them, not a word
processor. e.g. notepad, Visual Slick Edit or other suitable text editor
http://mindprod.com/jgloss/editor.html. You must use a monospaced font
http://mindprod.com/jgloss/monospacedfonts.html (aka fixed pitch, aka
programmer font) to view your files, or they won't look properly aligned.

I put out an avalanche of free software into the world, and submit PAD
files to hundreds of distribution sites, but I rarely hear back from
anyone.  What's happening?  Does it all just work fine?  It is so
complicated nobody can figure out how to use it and they give up on it?
It is it useful?  Since everyone has the source, do people just fix the
programs to their liking themselves?  Did you have trouble installing?  Do
I presume you know too much?  I would be happy to hear from you about your
experiences, positive or negative and your requests for improvements.  A
one-line email to roedy@mindprod.com would be great.

===> Free <===
Full source included.
You may even include the source code, modified or unmodified
in free/commercial open source/proprietary programs that you write and distribute.
May be used freely for any purpose but military.
For more details on this restriction, see
http://mindprod.com/contact/nonmil.html
If you include any Canadian Mind Products code in your own applications,
your app too must be labelled non-military use only.
http://mindprod.com/contact/nonmil.html
All Java jars and source code are included. If you need the class files or Javadoc, you will have to build them yourself. To streamline the zip downloads, class files and Javadoc have been removed.

----

Prerequisites:

This program runs under any OS that supports Java,
(e.g.W2K/XP/W2003/Vista/W2008/W7-32/W7-64/W8-32/W8-64/Linux/LinuxARM/LinuxX86
/LinuxX64/Ubuntu/Solaris/SolarisSPARC/SolarisSPARC64/SolarisX86/SolarisX64/OSX/AIX...)
so long as you have
<><> Java version 1.8 <><> or later installed (32-bit or 64-bit Java).
See http://mindprod.com/jgloss/installingjava.html for details.

----

Installing on a PC:

Download source and compiled jar files to run on your own machine as an application.
First install a recent Java JDK or JVM.
See http://mindprod.com/jgloss/installingjava.html.
To install, extract the zip download with WinZip
(or similar unzip utility) into any directory you please,
often J:\ --  ticking off the <span class="click">use folder names</span> option.
To run as an application, type:
java.exe %JAVA_OPTIONS -ea -jar J:\com\mindprod\pluck\pluck.jar {put any parms here}

adjusting as necessary to account for where the jar file is.

----

Installing on a MacIntosh:

Use Safari to download source and compiled jar files to run on your own machine as an application.
Safari will automatically unpack the zip into ~/Downloads (version 10.5)
[or on the Desktop (version 10.4 and earlier)].
First install a recent Java JDK or JVM.
See http://mindprod.com/jgloss/installingjava.html.
You may optionally move the download tree to a permanent home.
I don't have a MacIntosh, just a PC, so I can't test my Java programs for Mac compatibility.
In theory they should work without problems, but in practice that does not always happen.
If you have problems please, let me know, preferably with screenshots and complete verbatim error messages.
To run as an application, without parameters, just double click the jar file.

To run as an application with parameters, in bash shell type:
open Terminal.app
cd ~/Desktop
java.exe -ea -jar com/mindprod/pluck/pluck.jar {put any parms here}

adjusting as necessary to account for where the jar file is.

----

Rebuilding:

The zip already contains the necessary jar files, so unless you modify
the program, there is no need to recompile the source or rebuild the jar.
Configure.java basedir="E:/" in rebuild.xml to the drive where your files are.
Use ANT and rebuild.xml, not build.xml, to recompile and recreate the jar.

----

Use:

Scans multiple files looking for a REGEX pattern, and
summarised what it finds as a CSV file.
java.exe -jar C:\com\mindprod\pluck\pluck.jar  "\.[a-z]+\." E:\temp\temp.csv E:\somedir
or
java.exe -jar C:\com\mindprod\pluck\pluck.jar  "\.[a-z]+\." console E:\somedir\somefile.txt  C:\temp -s G:\myApps
adjusting as necessary to account for where the jar file is.
The first parameter is the regex pattern. See
http://mindprod.com/jgloss/regex.html for how to compose them.
The next parameter is where the output in to go.  use the
word console to have the output appear on the console.
Then put a list of files and directories on command line you
want to scan where -s means recursively include all
subdirectories for everything to the right of -s.
It will look only for *.html, *.htm, *.xml, *.txt
extensions.  You can't change that via the command line,
though you could modify the program.
The command line does not currently support wildcards, e.g.
ap*.txt or ff?.html.  You need to specify the full names of
files or directories, or . to mean all the files in the
current directory.
When you write your regex expression, you don't double your
\. You are not creating a Java string literal.
It can be tricky to get various characters in your regex passed through
to the Pluck program.
Awkward Characters on the Windows Command Line
Char    Special Meaning         How to pass it through to the program
******  **********************  ***************************************
space   separates parameters    enclose the whole parameter in quotes "
<       input redirection       enclose the whole parameter in quotes "
>       output redirection      enclose the whole parameter in quotes "
|       pipe                    enclose the whole parameter in quotes "
'       none                    enclose the whole parameter in quotes "
"       parameter delimiter     write it as \" and enclose the whole parameter in quotes "
%       macro replace           write it as %%
If you are using Linux bash, or other Bourne compatible or
csh compatible shell, enclose your regex in single quotes,
'...'. Then the only character you need to worry about
inside the regex is ' itself which needs to be encoded as
'\'' i.e. apos, backslash, apos, apos. This seems rather
long winded. What you are doing is ending the string, doing
the quote (quoting it with \ much as you would in Java
string literals), then starting it up again, concatenating
all three pieces.
Pluck echos what the command processor gives it for the
regex expression. Verify it did not mangle any of your
characters.
If you are having trouble solving your problem with Pluck
and regexes, you might precondition the files by converting
all newlines and control chars to space, running the files
to be scanned through a tidy program, such as:
http://tidy.sourceforge.net/
or running HTML to be scanned through the Compactor program.
See:
http://mindprod.com/products1.html#COMPACTOR
You might consider sifting through malformed HTML with
TagSoup. See:
http://mindprod.com/jgloss/tagsoup.html
You might even write a full-blown parser. See:
http://mindprod.com/jgloss/parser.html


----

Version History:

 1.0 2008-06-04 initial release

-30-