XML Schema

From W3C Wiki

The W3C XML Schema 1.0 Recommendation defines an XML schema language. Its salient characteristics are:

  • Unlike the DTD language defined in XML 1.0 (and in ISO 8879,
 the defining document for SGML), it uses XML syntax rather
 than a special non-XML syntax.  XML Schema documents are
 thus easier to process using standard XML tools than DTDs are.
  • It defines a set of simple types or datatypes for use in attribute
 values and simple element content (i.e. for elements with only
 character children); these include all the types most commonly
 found in programming languages and in database management
 systems, as well as a few others included for historical or
 other reasons.
  • It also provides a notion of complex types (for use with
 elements which may contain child elements); complex types
 can be constrained by attribute declarations and content models.
  • It distinguishes systematically between the generic identifiers
 (element type names, or 'tag names') written in angle brackets
 in the XML source, on the one hand, and the types assigned to 
 elements, on the other.  (This is sometimes referred to as
 'the tag/type distinction'.)
  • It allows for explicit relations between types. New
 simple types may be derived by restricting existing simple types.
 New complex types may be restrictions, or extensions, of
 existing types.
  • It defines explicitly what information generated as a by-product
 of validation may be made accessible to downstream applications;
 since this information is described as a set of augmentations
 to the input XML information set, the result of schema-validity
 assessment is described as a post-schema-validation infoset
 or PSVI.
  • It provides for wildcards in content models, which can match
 any element at all, any element in a particular namespace,
 any element in a namespace other than the target namespace of the
 schema document, etc.  Wildcards may be 'black-box' wildcards
 (no examination or validation of their contents), 'white-box'
 wildcards (all contents must be declared and valid), or
 'lax' wildcards (if a child element has a declaration, it will
 be validated; if not, it's not an error).
  • Instead of treating validity of documents as a simple all-or-nothing
 Boolean value, it provides discrete validity information for
 each element and attribute validated.

Note: 'XML Schema' is the name of the language defined by the W3C Rec. 'XML schema' is a common noun in English denoting a schema (in whatever formalism) for an XML vocabulary. To avoid confusion between the two, some people prefer to use the names 'XSD' or 'WXS' (W3C XML Schema) for the language defined in the Rec. or to use the full name 'W3C XML Schema' whenever confusion might otherwise arise.

What follows is a sketch of a possible skeleton set of topics related to XML Schema, to help encourage the development of a useful wiki on the subject.

All of these pages need to be drafted. You can help!

XML Schema software

Different kinds of software may be 'schema-aware'. It would be useful to have separate wiki pages with discussions of each type and pointers to specific software of the type.

Among the most obvious class of schema-aware software are:

  • schema-based validators
  • schema-aware XML editors and editing tools
  • data binding tools (for marshalling and de-marshalling
between XML and programming-language data structures)
  • form generators
  • schema-writing and maintenance tools
  • schema conversion tools
  • tools for exploring or displaying information about schemas
  • schema analyzers
  • schema-aware XSLT and XQuery engines
  • general toolkits

For more detail, see the page on XML Schema software.

Interoperability issues

Any schema-aware software is likely to be aimed at a particular application type or application domain; language features that don't match up neatly with the assumptions of the particular domain may be omitted or neglected. Among the features which are either unsupported or supported less conveniently than other features are:

 any good structures for representing mixed content, many
 data-binding tools either don't support mixed content at all,
 or support it poorly or grudgingly.
 have trouble with recursion.  (It's not obvious why this 
 should be a problem, but it appears to be.)
  • choices: these pose a problem for many data-binding tools;
 in languages with variant record types, there is a natural
 representation, but in others?
 poor support for substitution groups; they can be represented
 using class/subclass relations, but that entails writing
 classes not only for all types, but also for all elements,
 which may be undesirable for other reasons.
 for generating schemas fail to check for
 this and generate non-deterministic content models; other
 tools feel compelled to accept such illegal schema documents

Schema technical issues

Perhaps some of these should be discussed in this page; others should probably be in separate pages.

 history, consequences, rationale

Other schema languages

Other languages that may be used to constrain data in XML or other forms:

  • DTDs
  • Relax NG
  • Schematron
  • SQL Schemas

Some languages appear to be of mostly historical interest now (some of these may belong in the list above, if they are still actively used and developed)

  • XML Data and XML Data Reduced (XDR)
  • SOX (Schema for Object-0riented XML)
  • DCD (Document Content Description for XML)
  • DDML (Document Definition Markup Language)
  • Trex

Resources: