XSD (XML Scheme Definition). W3C’s XML (extensible Markup Language) Schema that is itself a form of
XML. It is often simply called XML Schema.
It offers much finer control of XML document content than the
older DTD (Document Type Definition)-style schema borrowed from HTML. XSD has a schema written
in XSD used to validate other schemas. You can download it.
It is 88k. Unfortunately,
by default, Opera treats it as raw text. IE nicely lists it with colours.
The two main advantages of XSD over DTD for specifying an
XML grammar are:
- The XSD grammar of the schema is simpler than DTD.
XSD is just a flavour of XML.
- XSD lets you restrict in much more detail just what constitutes a valid file.
Sample XSD Schemas
The academics who wrote the XSD spec were more interested in impressing you than informing
you. Therefore there are no examples or even anything remotely like English language descriptions of what the
various grammatical elements are for. Your only hope of making sense of it is to find example documents. Even the
primer fairly tough slogging. Keep looking at the example XML and
XSD to clarify the text. People can learn languages from a set of examples, heavily
commented gradually adding features much more easily than from descriptions of the grammar in some esoteric
formal
Note how it allows forward and backward references to permit a top-down description of the document. In typical
XML fashion, it is revoltingly verbose. Oddly, you specify the attributes on a tag after
you describe all the nested tags that tag may enclose, even though when you write the actual
XML the schema describes, the attributes come first. NMTOKEN is
an atomic string without spaces, often the name of an enumeration value. The 2-letter country codes would be
NMTOKEN. XML lets you specify the types of the fields with a rich
set of built-in types which included bounded integers, float, double, fixed decimal, dates, times, strings, urls,
hex, Boolean, durations… You can set up enumerated types where you give a list of the legal values of a
field. There is even a pattern scheme, similar to Perl regex, for describing legal string values.
XSD also allows you to enforce ordering of fields. Complete list of types.
XSD allows you to specify the minimum and maximum number of times a field may appear
with the minOccurs= "0" maxOccurs= "unbounded".
You can specify the types of fields with: type="xsd:positiveInteger" type="xsd:string"
type="xsd:anyURI" options.
You can specify the allowable low and high bounds on a numeric field with: mininclusive and maxinclusive.
There is a scheme to insist a data value be unique.
Sometimes the files are peppered with xs: and sometimes with xsd:. This is an arbitary string to abbreviate the xmlns name space
defined at http://www.w3.org/2001/XMLSchema. You can make it anything you like so long
as it you use it consistently. It lets the parser know that a word is a keyword. This way you can
accidentally use keywords for field names without confusion.
Understanding an XSD Schema
Understanding the keywords used in schemas, and comparing a schema with a known valid compliant
XML/jnlp file will be almost all you need to make sense of the schema. After you read the
XSD tutorial, this
list will refresh your memory.
- all
- you must supply exactly one of each of a group of tags.
- attribute
- a keyword="value" modifier on a tag.
- choice
- A group of possible tags. You can specify only one of them.
- complexType
- A tag that contains other tags nested inside it.
- dateTime
- A date/time in the form 2009-12-13T12:25:00.0000000-08:00
- default
- Specifies the default value for an attribute.
- element
- a tag that must appear in a particular order within a sequence group.
- enumeration
- describes one possible value of an attribute that has only a limited set of legal values.
- length
- the precise length for this field in characters. More commonly you use minLength and maxLength.
- maxLength
- the maximum length for this field in characters.
- maxOccurs
- the maximum number of times this tag can appear, "unbounded" for no upper limit. Oddly, must go
on the ref not the target.
- minLength
- the minimum length for this field in characters.
- minOccurs
- the minimum number of times this tag can appear, "0" for optional. Oddly, must go on the
ref not the target.
- NCName
- describes a field that can have pretty well any character, including spaces, except a colon.
- NMTOKEN
- describes an enumerated field that can only have a value selected from a list. Letters, digits, period,
colons, hyphens but no commas or spaces. You can have enumerated values based on strings, which do allow
spaces.
- restriction
- describes an attribute whose value is restricted in some way.
- sequence
- a group of tags that must appear in a particular order.
- simpleType
- usually describes an attribute that has restrictions on it.
- string
- a field that can include any Unicode-16 character.
- use
- either "required" or "optional". It
applies to attributes, not tags..
Stylus Studio has a wizard that will take a well formed
XML document and compose an approximate
XSD schema for it. You
can then fine tune it. This greatly speeds up the work of composing schemas. You can keep validating your
schema as you work. As you type it shows you multiple choices for what you most likely want to type next.
Validating an XML File Conforming to an XSD
Schema
Here is an example of validating XML with an XSD schema. This
schema describes a valid JNLP (Java Network Launching Protocol) 1.0 XML file. You can check that your JNLP file is correctly
formed using an XSD Schema originally from Vampqh. You must copy the JNLP 1.0
XSD schema posted below into the current directory as file jnlp1.xsd or use the JNLP 6.0 XSD jnlp6.xsd then
run the Java validation posted below with:
java.exe ValidateJNLP jnlp6.xsd C:\mydir\myapp.jnlp
The above validator is not user friendly. If all is ok, it prints nothing. If there is problem,
you get a cryptic exception. You can get a good idea what it is looking for by reading the
XSD file. I have composed three schemas for it jnlp1.xsd,
jnlp5.xsd and jnlp6.xsd. Use the one that
corresponds to the version of your JNLP. XSDs
are a bit like a BNF (Backus-Naur Form)
description of JNLP, written by someone with a terrible lexical stutter.
Extracting Data From an XML File
Extracting Data from an xml file with an XSD schema is verbose
undertaking. Unfortunately, the ranges, defaults etc. in the XSD schema, are all ignored
when you extract information from a conforming xml file. They are just used for
validating. Here are some of the classes you will
There are dozens of classes in other packages with identical or similar names. You have to make sure you use
matching classes. Your
IDE (
Integrated Development Environment) might automatically import the wrong classes if you are not careful.
I am in the process of writing an HtmlReflow utility (to reflow/tidy HTML (Hypertext Markup Language)) that uses an
XSD schema to describe an xml configuration file. You can have a
look at my code in progress:
Learning More
Oracle’s Javadoc on
Schema class : available:
Oracle’s Javadoc on
SchemaFactory class : available:
Oracle’s Javadoc on
Validator class : available:
Oracle’s Javadoc on
XMLConstants class : available:
Oracle’s Javadoc on
SAXParser class : available: