XML : Java Glossary
home X words local find no local find frame, full screen Google search web for topic jump to footer translate with Babelfish by Roedy Green ©1996-2008 Canadian Mind Products
Go to : punctuation 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)
xml logo  XML
The primary function of XML is to consume RAM and datacommunication bandwidth. Presumably it was promoted to its current frenzy by companies who sell either RAM or bandwidth. Others promoting it have patents they hope to spring on the public once it is entrenched. XML is the biggest con game going in computers. You probably guessed, I am known for my rabid dislike of XML.
The Basics What Should Replace XML?
Entities DTD
Encoding Schema
Schemas XML Serialization
Validation Digitally Signing XML
Parsing Learning More
XML Benefits Links
XML Drawbacks

The Basics

XML is the Extensible Markup Language, a W3C proposed recommendation. Like HTML, XML is based on SGML, an International Standard (ISO 8879) for creating markup languages. However, while HTML is a single SGML document type, with a fixed set of element type names (AKA "tag names"), XML is a simplified profile of SGML: you can use it to define many different document types, each of which uses its own element type names (instead of HTML’s html , body, h1, ol, etc.). For example, in XML, you can markup an on-line transaction like this:
Fields that there can be only zero or one of are usually specified as attributes e.g. unit= "box". Fields that there can be many of are enclosed in tags e.g. <item>…</item> e.g. Just like HTML, comments begin with <-- and end with -->. You can abbreviate <mytag myattrib="something"></mytag> as <mytag myattrib="something" />.

XML was designed to make it easy to write a parser. I think this was an unfortunate decision. Only a handful of people in the world will ever write an XML parser, but hundreds of thousands have to compose XML. They should have designed it to be easy and terse to write. For example, its mandatory quotes around each field are there solely for the convenience of the parser writer. The tag names in the </mytag;> are redundant, and should be optional. They are not needed at all in XML designed solely for machine consumption. Even in human-read XML, they add nothing on the innermost nest on a single line.

Entities

Just like HTML, various characters are reserved and have long forms called entities to use when they occur accidentally in the text as data: &amp;, &lt;, &gt;, &apos; and &quot;. Unlike HTML, XML just has those five basic entities. Character references take one of two forms: decimal references, &#8478; and hexadecimal references, &#x211e;. Named character entities such as &eacute; don’t work. You can use any Unicode characters you want that are not part of the XML grammar, and UTF-8 deals with encoding them.

Encoding

UTF-8 is the default encoding, but unfortunately the encoding could be any ruddy encoding ever invented. Using other encodings destroys XML as an interchange format. Don’t do it!

Schemas

You describe your little XML subgrammar by writing a DTD (Document Type Definition) file. Optionally, you can include the DTD inline inside your XML file. There are other more elaborate schema grammars including RELAX NG, Schematron, XSD and various other schemes.

Validation

Each schema has its corresponding technique for validating an XML file that the syntax is valid. If you use a DTD, here is how to do it.

Parsing

There are two popular parsing techniques, SAX (Simple API for XML), which hands you each field as it parses, and W3C DOM (Document Object Model) tree which creates a complete parse tree you can prune and repeatedly scan.

I personally detest XML, however, it has caught on like a cocaine wave. It must have some redeeming features.

XML Benefits

XML Drawbacks

xml logo Using XML to transmit data is the analog of insisting that all code be passed around as triple spaced Java source files, with added dummy comments, rather than as binary byte code. There is no guarantee a source file is even syntactically correct. It is impossible to create a syntactically incorrect byte code file. Byte code files can be processed without time-consuming parsing. In byte code, repeating strings are naturally specified only once. XML, as it stands, suffers from all those analogous drawbacks and more.

What Should Replace XML?

The characteristics include:

One possible candidate for the XML replacement job is the Java serialised object format. It can handle just about any data structure imaginable. It is platform independent. It has a simple DTD — Java source code for the corresponding class. Some claim it is Java-only. Not so. It is no more difficult for C++ to parse than any other similar newly concocted protocol. It is not tied to any hardware or OS. It is just that Java has a head start implementing it. Java can implement it with no extra overhead.

There have been some efforts made to patch up the shortcomings of XML, in fact there are dozens of them. XML is no longer simple any more. It is raggedy patchwork quilt. People were sucked in by the initial simplicity, then discovered that it was not really all that useful in its simple form. Schema was added to allow specifying types (but still only permitting strings). Yes we need a standard interchange format, but XML was only a back of the envelope stab at it. XML was destined to fail since it totally ignored so many factors in coming up with a good design.

One such effort is VTD Virtual Token Descriptor (VTD). A VTD record is a 64-bit integer that encodes the starting offset, length, type and nesting depth of a token in an XML document. Because VTD records don’t contain data fields, they work alongside of the original XML document, which is maintained intact in memory by the processing model.

Due to the stupidity, duplicity and/or greed of those promoting XML, we will likely be stuck with some committee-patched variant of it forever — something that will make even HTML look clean. We need a common data interchange format, but not so inept.

DTD

You need to compose a DTD file that describes the format of the XML file. The <!ELEMENT statement is used to list the various tags you will use, and which tags may be used inside which tags, and how often and in which order. The <!ATTLIST statement is used to list the various attributes (mandatory and optional) of each tag. The <!ENTITY statement lets you make up you own abbreviations.

Here is a simple example:

DTD:

<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
The CDATA means the value of the field is a string.

XML:

<square width="100"></square>

Schema

A schema is a document that describes what constitutes a legitimate XML document. It might be very generic, describing all XML documents, or some particular class of XML documents, say ones describing an invoice for the XYZ company. The original XML schema was called DTD, borrowed from the HTML people. It was clumsy and did not allow very tight specification. It basically just let you specify the names of the tags and attributes. Since then there have been several other flavours of schema: RELAX NG, Schematron and a new one from W3C called XML schema. DTDs look nothing like XML itself. XML Schema is itself a flavour of XML. XML Schema is a major advance over DTD. It is described in three documents: Primer, Structures and Data Types. It can define datatypes, ranges, enumerator, dates, complex datatypes to much more rigidly specify what constitutes a valid XML file.

XML Serialization

There is another form of serialization that produces XML instead of binary ObjectOutputStreams. It uses the java.beans.XMLEncoder class. It does not use the Serializable interface, but writes ordinary Objects that have JavaBean-style getter and setter methods and a no-arg constructor. It does not persist fields, but rather properties (in the Delphi sense, not System. setProperty), implemented with get/set. Basically it looks for all the getXXX methods, and calls them, and emits a stream of tags named after the properties. To reconstitute, XMLDecoder instantiates an Object of the class, and calls the corresponding setXXX methods from the values in the XML stream. The source and target classes need not have matching code the way they do with true serialization. Most trouble using this features comes from thinking it behaves like ordinary serialization. They have almost nothing in common.

Digitally Signing XML

You would think XML would be a nightmare for digital signing, with its variable amounts of whitespace, and variable newline characters and lax attitude toward the encoding. However, W3C has invented a slick scheme to let you digitally sign various fields in an XML document (by specifying #xxxx HTML-like targets) and embed the signature in the document. You can also sign documents external to the XML file. The secret is canonicalisation. You use an algorithm to tidy the document to standard form. The transforms leave embedded, lead and trailing whitespace on fields intact, but collapse the rest to standard patterns. The scheme allows for various canonicalisation transforms and various signing algorithms. As you would expect from XML, the signature block is gargantuan.

Apache has written classes to make the work easier.

Learning More

Sun’s Javadoc on the Schema class : available:
Sun’s Javadoc on the SchemaFactory class : available:
Sun’s Javadoc on the Validator class : available:
Sun’s Javadoc on the XMLConstants class : available:
Sun’s Javadoc on the SAXParser class : available:
Sun’s Javadoc on the XMLEncoder class : available:
AELfred
Alphaworks: an XML parser in pure Java called XML4J
Altova XMLSpy
Ant: XML validator
ASN.1
binaphobia
Binary XML: unfortunately still in the it-would-be-a-good-idea stage
Castor
Cladonia Exchanger XL Lite XML Editor: free
cooktop
Crimson
Developer Life Tutorials
Digitally Signing XML
Digitally Signing XML documents
DOM 1 spec
DTD attributes
DTD: a language for describing XML file layouts
Fluffiness of various file formats: student project
Generic XML syntax checker: when you have no DTD
IBM’s tutorial
IBM’s XML page
JAXB
JAXP: Sun’s XML manipulating classes
JDOM
JNLP (Java Web Start’s XML configuration language)
JSON
JUntotal: a more compact XML alternative
Liquid XML: code generator to read/write XML given schema
Mistakes with XML
NotXMLProposal: SDL streamlined XML proposal
RDF
RefleX: (XSLT and XQuery)
RELAX NG: a language for desribing XML file layouts
SAX
Schematron: an XML description and pattern finding language
Serialization
Stylus Autogen: figures out a schema from sample XML
Stylus Schema Editor
Stylus XML tools
Sun’s Fast Web Services Project
UBDDL (a Yahoo group working to define a more efficient replacement for XML)
UDDI
VTD-XML: faster, more efficient XML parsing
W3schools: XML tutorials
Wattle XML editor and schema converter
x->Jen
Xerces
XHTML
XML 1.0 spec
XML Compactor
XML databases
XML inventors
XML Pitstop
XML validator on-line
XML Validator tools
XML validator: requires an XSD
XML well formedness checker: requires no DTD/XSD
XML.ORG
xmlfiles.com (has lots of examples and tutorials)
XMLFox: free Windows XSD/XML editor/validator
XMLGlobal has some tutorials and information
XMLsucks.org
XPath
XQuery
XSD
XSLT
XTP
XUL

CMP_homejump to top
CMP logo
feedback Please email your feedback for publication, errors, omissions, broken/redirected link reports
and suggestions to improve this page to Roedy Green : feedback email
made with CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[65.110.21.43]
Your face IP:[38.103.63.16] The information on this page is for non-military use only.
You are visitor number 28,838. Military use includes use by defence contractors.
You can get a fresh copy of this page from: or possibly from your local J: drive (Java virtual drive/Mindprod website mirror)
http://mindprod.com/jgloss/xml.html J:\mindprod\jgloss\xml.html