Entities 3.1 build: 9470 released: 2011-09-02
Converts HTML/XML/XHTML entities to char and back, also strips HTML/XML/XHTML tags.
Copyright: (c) 2012 Canadian Mind Products.
Java application.
Download from: http://mindprod.com/products1.html#ENTITIES
===> Free <===
Full source included.
You may even include the source code, modified or unmodified
in free/commercial open source/proprietary programs that you write and distribute.
May be used freely for any purpose but military.
For more details on this restriction, see
http://mindprod.com/contact/nonmil.html
If you include any Canadian Mind Products code in your own applications,
your app too must be labeled non-military use only.
All Java jars and source code are included.
----
Prerequisites:
This program runs under any OS,
(e.g. Win2K/XP/Vista/W7x86/W7-64/OSX/Linux/Solaris/AIX...) so long as you have
<><> Java version 1.5 <><> or later installed (32-bit or 64-bit Java).
See http://mindprod.com/jgloss/installingjava.html for details.
----
Installing on a PC:
Download source and compiled class files to run on your own machine as an application.
First install a recent Java JDK or JVM.
See http://mindprod.com/jgloss/installingjava.html.
To install, extract the zip download with WinZip
(or similar unzip utility) into any directory you please,
often J:\ -- ticking off the "use folder names" option.
To run as an application,type:
java -ea -jar J:\com\mindprod\entities\entities.jar {put any parms here}
adjusting as necessary to account for where the jar file is.
----
Installing on a MacIntosh:
Use Safari to download source and compiled class files to run on your own machine as an application.
Safari will automatically unpack the zip into ~/Downloads (version 10.5)
[or on the Desktop (version 10.4 and earlier)].
First install a recent Java JDK or JVM.
See http://mindprod.com/jgloss/installingjava.html.
You may optionally move the download tree to a permanent home.
I don't have a MacIntosh, just a PC, so I can't test my Java programs for Mac compatibility.
In theory they should work without problems, but in practice that does not always happen.
If you have problems please, let me know, preferably with screenshots and complete verbatim error messages.
To run as an application, without parameters, just double click the jar file.
To run as an application with parameters, in bash shell type:
open Terminal.app
cd ~/Desktop
java -ea -jar com/mindprod/entities/entities.jar {put any parms here}
adjusting as necessary to account for where the jar file is.
----
Rebuilding:
The zip already contains the necessary class and jar files, so unless you modify
the program, there is no need to recompile the source or rebuild the jar.
Configure.java basedir="E:/" in rebuild.xml to the drive where your files are.
Use ANT and rebuild.xml, not build.xml, to recompile and recreate the jar.
----
Use:
Consists of a three utilities that let you convert/strip/insert HTML
entities such as & and " from files converting them back and
forth to their equivalent single characters & and ". It handles the HTML
4 entities such as ♥ as well as decimal { and hexadecimal
&x#123; entities.
You may us this package as standalone utilities, or use the classes in
your own programs to insert or strip entities from HTML.
You can use them like this:
REM to remove & entities and HTML tags from from two files and all files in somedir
REM converting entities back to characters
flatten.jar afile.html another.html -s somedir
REM to convert & to & etc. entities in two files and all files in somedir
entify.jar afile.html another.html -s somedir
REM to convert entities in in two files and all files in somedir back to UFT-8 chars
REM leaving all HTML tags as is.
deentify.jar afile.html another.html -s somedir
They come complete with Java source and jar files.
----
Version History:
1.0 2004-01-01 initial version
1.1 2004-01-01 optimise using
text.indexOf('&') and sb.append(string) rather
than processing character by character.
1.2 2004-07-21 add stripHTMLTags -
stripFile also strips tags - add stripNbsp
1.3 2005-06-20 fix bug in possEntityToChar
exposed possEntityToChar as public
1.4 2005-07-02 check for null input
1.5 2005-07-29 no longer needs entitiestochar.ser
file. Converted to JDK 1.5 back to 1,2
1.6 2005-09-05 faster code for stripHTMLTags that
returns original string if nothing changed.
1.7 2005-09-05
1.8 2007-02-26 fix bug. hex entity it not &x#ffff;
1.9 2007-03-26
2.0 2007-04-26
2.1 2007-05-10
2.2 2007-05-14 StripHTMLTags now strips applet, style, script pairs.
generate hex entities as comments in entitycase.javafrag
2.3 2008-07-29 refactor code for Entities, add notes to chars, add a few new missing Entities.
2.4 2008-08-05 add translateNbspTo parameter to several methods and deprecate the versions without it.
This allows you to directly control how is translated, usually ' ' or (char)160.
Renamed methods to make it clearer just what sort of input is expect.
2.5 2008-08-06 add ability to insert XML entities. Convert to JDK 1.5+, with generics, and for:each, StringBuilder
2.6 2009-04-05 StripEntities now leaves a space behind when it removes a