MarkupValidator/XML Limitations

From W3C Wiki

This page contains the draft text for a new page that may potentially be added to the MarkupValidator website. It is being placed on this wiki at the suggestion of Olivier Thereaux in order to allow others from the validator mailing list to contribute to its development. Please feel free to join in.

The Issue

Currently, the MarkupValidator's XHTML validation results page contains the following line:

Note: The Validator XML support has some limitations.

The link points to a page on the OpenJade website that contains highly technical information on OpenSP's limitations. Rather than having the link point directly to the OpenJade website, I think it would be less confusing to the people who use the validator (and post questions to the mailing list ;-) ) if the link pointed to a more user friendly page that was hosted on the validator website. The page would list some of the validator's more popular XML limitations in a way that was easy for users of the validator to understand. This intermediate page would then link to the OpenJade website.

Draft text

In addition to validating documents against their specified DTD, the w3c validator also tests documents that use an XHTML Doctype for XML well-formedness. At present, the validator's support for checking XML well-formedness has some known limitations. This means that certains documents that are labeled as valid by the validator will cause conforming XML user agents to throw fatal errors. Most notably many web browsers will simply refuse to load pages that are served as XML (using the application/xhtml+xml mime type) if they are not well-formed. Instead they simply display an error. Below are some of the more well known limitations.

The use of "&" and "<" as data is marked as a warning. It should be an error.

In XML and SGML the ampersand and left angle-bracket characters have special meaning, they are used to describe entity or character references (e.g " or 4) and tags (e.g.

). However unlike SGML (which forms the basis of HTML4) XML requires that these two characters only be used literally as markup delimeters, if they need to be in another way they must be escaped using either numeric character references or the strings "&" and "<" respectively. The validator incorrectly treats this usage as an warning, rather than a error.

Examples:

We need more R & D should be replaced with We need more R & D

Everyone knows that 1 < 2 should be replaced with Everyone knows that 1 < 2

Please refer to the XML Specification for more detailed information.

XML declaration

If the XML declaration is present, it must appear at the very beginning of the document, i.e. it must not be preceded by anything, not even by whitespace or comments, and it must match the production for XMLDecl.

However, the W3C validator considers <?xml encoding="utf-8" version="1.0"?> to be valid.

Adjacent attribute specifications

In XML documents, attribute specifications must be separated by whitespace, according to the productions for EmptyElemTag and STag.

However, the W3C validator considers

to be valid.

"--" in comments

"--" must not appear inside a comment, according to the production for Comment.

However, the W3C validator considers to be valid.

Character encoding declaration in meta element only

The XHTML 1.0 specification states that, even for XHTML documents that are delivered as text/html, it is unsufficient to declare their character encoding in a meta element. The W3C validator accepts such a declaration anyway.

Namespace declaration

The XHTML 1.0 specification requires a strictly conforming document to contain an xmlns declaration for the XHTML namespace. The W3C validator does not enforce this criterium.

system identifier in PUBLIC doctype declaration

In XML, unlike SGML, a PUBLIC doctype declaration can not have only a FPI, it must also have a SI. In XML mode, openSP (and thus the validator) marks it as a warning, should be an error.

Next well-known limitation goes here

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nunc porta. In sem neque, bibendum ac, sagittis vitae, malesuada ac, velit. Aenean nec metus sed wisi condimentum placerat. Etiam ullamcorper. Nam eu mi quis diam pretium vehicula. Nulla non nulla at diam convallis aliquam. Quisque non elit non lacus vehicula lobortis. Nam arcu mauris, mattis et, hendrerit sit amet, laoreet vitae, pede. Nulla consequat, magna et elementum vestibulum, sem nibh ultrices turpis, in faucibus turpis velit vel arcu. Vestibulum nonummy posuere purus.

More XML limitations

The software component currently being used to test XML well-formedness is called OpenSP. More details on the validator's current limitations can be found on the OpenSP website.