Last week I attended XML Europe in sunny Amsterdam at the RAI and made notes in some sessions I attended. So far I've not been able to get on with live blogging, rather I've been doing live talking to people during the conference.
This article reports on the presentations I saw along with some of the notes I took at the time, as well as a recording of the closing keynote by Edd Dumbill. (3,300 words).
Jeff Barr of Amazon gave the opening keynote despite having had all his luggage lost by the airline, Driving Innovation with Web Services at Amazon.com describing how they have been making a programmable web site with web services. I heard some of this last year at the O'Reilly Open Source Conference, but it is still good to know that it is working out well for them, and in particular still 80% of their developers use the REST API versus the SOAP one, mostly because of the XSLT transform they allow and ease of use. Jeff mentioned several novel applications that have been built with the Amazon API, things that Amazon would never have got round to or thought of. Their goal from the start was to provide the API and think about the business model of how both parties, Amazon and the API users, can make money.
Steven Pemberton gave the second keynote On The Design of Notations which was an entertaining and thoughtful consideration of the silly things humans do to make life easy for computers. How backwards is that? His presentation went from the notation for US state codes (GA, FL, ME) which have no good relation to the state name, to other notations such as programming languages like ABC and Python (both started by people at CWI, Amsterdam where Steven is based). XML was never designed for ease of authoring and looking ahead, some things such as xhtml2 in 2010 may be very complex. The markup produced by authoring tools is still embarrassing to read, so as people are involved, he recommended making it primarily easy for people, as parsing markup is a relatively much easier job (for coders).
Then into the conference itself where I went to Architecture Principles of the WWW, given by Chris Lilley of W3C going over the new Architecture of the World Wide Web, First Edition (a W3C Working Draft). One goal of this is to write down formally what used to be formed from ides in Tim Berners-Lee's head, some email messages and a few documents scattered around. This state was something that not everyone using or developing the web was aware of, could read or even access. Specifications can't conform to an underdocumented architecture The document covered a lot of areas that Chris outlined but there were a few questions about revising the URI specification in RFC2396bis and IRIs, admonitions against adding URI schemes such as itunes:, subscribe: that just wrap HTTP, mostly caused by issues of accessing the MIME processing stack between browsers and helper applications. Somebody in the audience said the US government has endorsed the determinedly non-networked info: URI scheme. The document now discusses safe interactions such as using HTTP GET, which led to some changes to other W3C work such as XForms and XML Protocol work to provide GET when possible and/or record safe operations. Chris was pleased about the new xml:id draft as one early example of the TAG getting somebody else to solve a problem for them!
Immediately after that was Take REST: An Analysis of Two REST APIS by Paul Prescod of Blast Radius. Paul said that he was an XML guy and interested in data, not service oriented APIs. Most Web Services (WS) are about sharing information but not in a web of information, but in silos with their own different and incompatible WS APIs. Paul promoted using instead of WS, an Resource Access Interface for REST. He was proud that his prediction that Amazon's REST service would be very popular turned out correct. Giving examples of Amazon's REST API he noted that the identifiers were ASINs and not URIs (so not resource-centric) but Amazon may have some good reasons for that:
"a URI that works just for you, is not as good as one that works for anybody" [i.e a web resource]
The SOAP and REST APIs are rather different in their identifying model and it isn't clear why the XML URIs need to be so complex.
The second API he talked about was the Atom API, "a process with nobody in charge" which is still evolving and may become an exemplar of REST for 2004, as Amazon was 2 years ago. Both atom and amazon rejected the RPC model as too fragile and not easy to extend.
Later on I went to Michael Kay's XSLT and XPath optimisation with saxon which looked impressive work, Jonathan Robie's talk SQL/XML, XQuery, and Native XML Programming Languages on SQL/XML, W3C XQuery and XQuery for Java (XQJ) of which I was aware of only XQuery. The last talk I saw on Monday was by Liam Quinn on Lessons From an XML Query-Driven SVG+XHTML Web Site for dealing with his Antiquarian books collection which has lots of really great pictures. He was using RDF(ish) metadata, and the acronyms in the title to run the web site.
I chaired a session with two talks, the first was XML Design Principles for Form and Function by Uche Ogbuji of Fourthought which (after some laptop death problems) went over some of his considered thoughts on designing XML as a very experienced XML consultant developer and writer. The paper goes into more detail, so here's a summary of the points I grabbed:
The second talk in the session I chaired was by Brandon Jockman on Test-Driven XML Development: Building Rapid Change Management into XML Systems applying the ideas of unit tests and suites of tests to XML, so whenever you need to change things, you can refactor the XML (document, schemas, tool chain) and still ensure the system is working with the evolving test suite. He was using XSLTUnit and XMLUnit for some of this work inside, with some warnings: "XSLT is complex enough to allow you to do very bad things"
After lunch, I went to the Knowledge Technologies track to see Semantic blogging: Spreading the semantic web meme by Steve Cayzer of HP Labs, Bristol. This was reporting some work he did for the SWAD Europe project I also work for. I had seen some of this before at project meetings but this time Steve was wearing a lovely "Life Life, Don't Blog" T-shirt (see Edd's picture).
Next up was
Crawling the Semantic Web by my friend
Matt Biddulph (speaking for himself, although works for the BBC).
Crawling linked stuff in rdf and searching it is mostly over the web
of rdfs:seeAlso properties, the RDF equivalent of <a
href="..."> in HTML. Crawling also needs to consider
the distribution effort, merging items by URIs, and in particular
tracking the provenance and trust found: who is behind the data
and the source of the data. He found that there were two really
useful bits of OWL worth implementing for this application:
1) a owl:sameAs b done using renaming and
2) a owl:inverseFunctionalProperty b.
He also tried an experiment with the
PlanetRDF blogroll
in RDF, crawling from that to the webpages and found that
30% of the planetrdf authors have links to their FOAF in their
webpage - linking the html web to the rdf web.
The next session in the KT track was where I was speaking but first up was another friend, Damian Steer speaking about Treehugger: The RDF Model Meets XPath which he said was design for two audiences:
He saw the existing RDF/XML as more like a macro language for triples (after a quote by Patrick Stickler) but designed treehugger to produce an RDF/XML-like XML syntax, with "striping", implemented as extension functions for SAXON for XQuery and XSLT returning a document root (with versions that do interfacing). It generates an XML tree with some duplicate subtrees, a possibly infinite graph generated lazily. The basic idea was to turn RDF graph arcs into elements, with the predicate names sort of capturing the entire original RDF triple they came from (the parent element subject, the child element object). He claimed that if you design an XPath query that matches an RDF/XML document, it should work for any other equivalent one; as long as it doesn't break the striping. There were several syntax hacks including using rdf:Description a wildcard, the use of the rdf:about and rdf:nodeID attributes and rdf:li iterates over the rdf:_n items automagically. But all of that was a compromise to get it pretty much working without much user thought needed.
I spoke next on Modernising Semantic Web Markup about designing a simple XML format for RDF triples, called RXR. I'll have to see if anyone else recorded what I said during the presentation
I chaired another session with two papers both about XForms, first A New Methodology for Designing Electronic Forms Promoting Reuse of Information Carriers by Jeanine Lilleng and Even Thorbergsen describing work on a large project using XForms in Norwegian government to reduce times of form filling and aim for filling a form once only. It seems to be pretty successful already and they have already planned an updated information model for the next version.
The second talk was on another SWAD Europe friend Charles McCathieNevile (W3C), on RDF, XForms, and the Law - Staying Out of Gaol. He has been using RDF and XForms for a foaf-a-matic style replication to make RDF data, for EARL and later on for his own use managing a legal problem across 3 languages and 2 continents. He found the basics of XForms quite simple for somebody who can do html but at the time needed an XForms validator which has now appeared.
Finally I attended the closing plenary session: The State of XML by Edd Dumbill which has appeared in a different form as an article The State of XML on XML.com. My notes here are from what I recorded he said, written before the article was published
After his closing keynote at XML2000 in Paris, called the State of XML, Edd closed the conference, describing how it seems more than ever presumptuous to sum up XML. He then went on to describe his more personal options. After 5 years of bedding down XML has disrupted as well as changing things more for the better than not. Nowadays people say more "why aren't you using XML?" than "why?". He encouraged people to still challenge the status quo in XML and keep proposing alternatives.
The State of XML is pretty good.
Edd Dumbill
Edd saw XML as the spider in the center of a web of technologies and talked about some of the technologies closer to the spider than further out. Attention has been given to the lower layers of XML recently which has made him happy, after previous concentration on higher layers from the core. Maybe the energy had moved on to that, and the W3C had followed the industry needs there. Or maybe it was easier if the lower layers were stable for a while.
Two things in particular have recently pleased Edd:
So the core of XML is not being neglected.
We've seen spec development move beyond the W3C, but successful ones outside the W3C are more rare. This has given freedom to challenge the status quo. One of the big successes has been RELAX NG who's model and ease of authoring has been compelling. Even inside Microsoft, RELAX NG is used as the scribble format for schemas. The XML schema languages are more of a design by individuals versus a design by committee thing than a conflicting set of organisations.
The bad issue Edd saw outside the W3C is the loss of coordination with web architecture and the WG process. However, W3C specs are not guarantees for success. Some things in particular are not looking good in the pile of Web Services frameworks specifications; lots of divergence and maybe moving towards chaos.
A conundrum, some of the things that are exciting - interoperability across applications. But some, hateful things - overblown hype, poor specifications, escalating complexity remain. This is now mostly one step removed from XML. However, the desktop is less of an island as Jeff Barr mentioned in the opening keynote. REST as a very webby API for scripting and web environment is very pleasing and accessible - feels more like how the web came about. With HTTP you can build services and compose them. The Web Services world has descended into some dull, but maybe necessary, complexity. Some of the components we had in CORBA are being reimplemented again in angle brackets. It's a bit disturbing reimplementing what we had before.
As a developer Edd feel unhappy with a monolithic pile of technologies that can only be used from either Java or .Net. Rather than the simple model of HTTP.
On the bright side, he saw a lot more useful stuff to be done with document based web services, and in particular with XForms. Forms can be decentralised and passed around.
Changes have led to new problems to solve, new issues and one of the hardest is about metadata, needing new processes. Systems developers are now seeing more of the need for this; rather than just being seen for librarians. Nowadays we are all needing to manage this as well have email, photos, word documents, electronic forms, PDFs. Even at the low level vendors are seeing the advantage of metadata storage and manipulation. He refereed to Microsoft's upcoming WinFS - filesystem enriched with metadata facilities. We have the tools for this now - RDF, OWL, Topic Maps, XML Schema - all great. But the biggest issue is "which schema should we use?" even for the simple things like names and addresses. There is more work to be done here in selling this. We can't always get everyone using the same set of terms.
Another challenge, more outside our normal experience, is the user interfaces to ensure that metadata enriches our content. As well as terms, we need techniques for translations to map between the metadata schemas - the XSLT for ontologies.
There are now new constituencies such as mobile. On the desktop, IE is the anchor but the mobile world is emerging from the disaster of their own creation - special versions of HTTP, HTML for WAP - similar, but not the same, and not clear on control. They are moving away from this redundancy and their needs are more being addressed, such as binary XML for their constrained environment.
The web user interface platform isn't getting better in terms of usability, so we are seeing technologies outside the browser. Microsoft's .Net XAML is a windows description for widgets in XML, which allows mixing code. What's so great about this?, asked Edd. Answer: It's easy and webby. Some people think that mostly graphical tools are used, but you know text editors/raw XML are still used. These interfaces can be trivially connected to the web and web services. A lot of people are concerned that this might replace the desktop browser, and stop an open interface for web applications - the browser.
On Intellectual Property (IP) issues, some of us have spent a long time in the XML space not looking outside as things changed . Edd is still somewhat surprised by seeing URLs on TVs. Web resources are nearly as real as houses, cars, the street, things in the real world. And these are getting laws and policy like the real world. Some people have realised this and are getting in on the law making, and not all for the common good. Copyright is good, but restrictions on a means of expression isn't so good. And what are schemas but methods of expression?
The great strengths of XML are it's intimate relationship to the Internet and the web. It has unlocked a large number of possibilities. And it's relationship to the URI has facilitated both elegant and practical applications. Another one is the human readable and editable form - and you cannot underestimate this. A conference theme. XML.com asked developers how they edited their XML and over 80% said they used a text editor (maybe also other things - 45% used an XML editor also). It fascinated Edd that generating markup that is easy to read and write such as simple document types like HTML, RSS leads to success. Microsoft gets this one also, like in their new XML formats such as XAML, WinFS technology and after a sort of firestorm with their early XML work have moved to better formats.
Edd briefly mentioned RDF and syntaxes, and it was better to think of them as less of an XML thing and it is OK to have a non-XML syntax. Thinking of this, RELAX NG has an non-XML format for a long time.
Edd payed tribute to the dedicated bunch of XML people who have keep the flame of XML burning and told people to feel inspired showing an "Inspiration ... XML!" slogan over a glorious sunset picture
Where else do you get a huge variety of technologies (docs, data, images, libraries, ...) related with XML and even working into the core of XML technology. It may not have met their meeds precisely but it's web enablement and pervasiveness has enabled it's success.
Fin
As usual, another interesting XML Europe conference. So what caught my eye? Learned that XForms is widely used, the many ways of querying XML. What did I miss - the always lovely SVG talks, which all clashed with my chairing duties :( However The KT track has worked as a mini semantic-web conference for developers in Europe, and as might be seen, very relevant for SWAD-Europe. Of course I learned again that Amsterdam is a great city to enjoy.
See also the XML Europe 2004 Proceedings
Many applications have a need to render RDF in human-readable form.
In some cases the requirement is to present a complete and accurate visualization of the RDF for people familar with the RDF model. There are many tools which support this both graphical (e.g. IsaViz, visualizer) and textual (e.g. brownsauce).
In other cases the need is to present an application specific UI which contains some data extracted from the RDF. In that case some form of template-driven rendering approach seems appropriate. There are some tools for this (e.g. RDFStyles) as well as general XSLT-for-RDF proposals (e.g. treehugger, RDF Template, RDF Twig) which could be used for rendering directly to XHTML.
One concern is how robust a template-driven approach is. It seems that a direct application of XSLT style templating might tend to produce monolithic templates which just insert strings extracted from the RDF graph into holes in the template. This fill-in-the-blanks approach could be fragile. We want to be able to write templates that can handle missing data and would not need substantial rewrites if the data were extended with extra properties or substructures. More specifically we identified the following requirements to enable robust templates:
This approach seems to be working well for us. It does have a somewhat imperative (as opposed to declarative) flavour, but it does seem reasonably simple and flexible. The key was the use of a rendering manager to enable a template to invoke other templates indirectly via a template registry. Just another example of the old adage - there's no problem in computer science that can't be solved with one more level of indirection :-)
The United Nations Environment Programme (UNEP) hosted the first Environmental Thesaurus and Terminology Workshop this week in Geneva. The Workshop is part of the broader ECOinformatics Initiative, which is working to facilitate co-operation between organisations and projects working in the area of enviromental information.
SWAD-Europe was represented at the workshop by myself (Alistair Miles, CCLRC). I gave a presentation on recent developments in the SWAD-Europe Thesaurus Activity.
I was very impressed to find that, although this was the first meeting of this group, there was a strong coherence in the vision, goals and expertise of its members. Developing web services for accessing terminologies and thesauri via the internet was a major theme of the workshop, and received unanimous interest. In this context I presented the SKOS API, a generic application programming interface for a thesaurus web service which is being developed as part of the SWAD-Europe Thesaurus Activity. The API was well received, and we hope to involve members of this community in its further development and testing.
Another theme of the workshop was standards for machine-readable representations of terminological data. This community is keen to achieve tighter integration of its technological infrastructure, and adopting some common standard for data publication and exchange is recognised by all as a necessity. I presented the SKOS-Core RDF Schema for RDF encodings of thesaurus and terminological data as a solution for this requirement, and the proposal was well-received. The extensibility of the SKOS-Core framework allows thesauri to be ported to the semantic web in a way that preserves all their unique features without compromising interoperability, and this was recognised as a significant boon.
The Semantic Web was also represented at the workshop by Bernard Vatant (Mondeca), who gave an excellent presentation introducing the core Semantic Web technologies and illustrating their use and potential for building well organised information systems for the web. Both Bernard and I also represent the W3C Semantic Web Best Practices and Deployment Working Group, and there was an action taken at the end of the meeting to establish a link between SWBP-WG and ECOTERM, which was the name agreed upon for the working group that will be formed from those present at the meeting and the wider community interested in environmental terminologies and thesauri.
It was a pleasure to meet the members of this highly competent and forward-looking community. Thanks again to Gerry Cunningham from UNEP and Stefan Jensen from EEA.
Validation for RDF can mean a variety of different terms especially where RDF is using XML and several layers of technology are connected. This FAQ describes validation for RDF and answers how to do it for the different technologies.
Validation is a tricky word to consider, and often used with schema, which can also have several different interpretations. There is validation of syntax (XML validation, RDF/XML - RDF's XML syntax) as well as RDF schema validation.
That means you can do:
RDF schema validation bears expanding. An RDF schema is a description of the terms used in the RDF triples forming the RDF graph (which can be written in a document, in an RDF/XML syntax). Checking the terms match how the RDF schema describes them is what RDF schema validation typically means. RDF schemas allow description of classes, properties and the ranges, domains of properties and so on. This is explained further in the RDF Vocabulary Description Language 1.0: RDF Schema W3C Recommendation.
There are plenty of tools that do these kinds of things already. The XML validation needs an XML parser, a description of the XML - an XML schema in some XML schema languge and an XML schema validator. The RDF/XML validation needs an RDF parser (see the FAQ How Do I Parse RDF?) The RDF schema validation needs a system that do the checking which formally, is called handling RDFS entailment defined in the RDF Semantics W3C Recommendation.
You can find RDF schema validators in free software RDF toolkits such as Jena and Sesame, Euler (all in Java) as well as in the python Cwm. There are some on-line RDF schema validators such as Rosco - a non-judgemental RDF schema and document checker which does checking, but not all RDFS entailments.
OWL systems which contain OWL reasoners can also typically validate RDF schema since it is a small subset of the more powerful things that OWL reasoners can do, and all RDF is OWL Full. OWL applications that only handle OWL-DL or OWL-Lite cannot check all RDFS entailments.
The W3C RDF validator is an RDF/XML validator, not an RDFS validatior. It is based on the ARP2 RDF parser which is part of Jena. There are many more RDF parsers available that can perform this as already described in the FAQ How Do I Parse RDF?
Finally, you can do XML validation of RDF/XML against an XML schema such as RELAX NG. It's in the RDF/XML specification in section A.1 RELAX NG Compact Schema. The W3C XML Schema language (WXS) is not as suitable as RELAX NG for XML-validating RDF/XML, since it is enforces strong XML constraints and RDF/XML has a wide-open set of tags that may appear.
See also the FAQs: