April 28, 2004

XML Europe 2004

Last week I attended XML Europe in sunny Amsterdam at the RAI and made notes in some sessions I attended. So far I've not been able to get on with live blogging, rather I've been doing live talking to people during the conference.

This article reports on the presentations I saw along with some of the notes I took at the time, as well as a recording of the closing keynote by Edd Dumbill. (3,300 words).

Monday 19th April 2004

Jeff Barr of Amazon gave the opening keynote despite having had all his luggage lost by the airline, Driving Innovation with Web Services at Amazon.com describing how they have been making a programmable web site with web services. I heard some of this last year at the O'Reilly Open Source Conference, but it is still good to know that it is working out well for them, and in particular still 80% of their developers use the REST API versus the SOAP one, mostly because of the XSLT transform they allow and ease of use. Jeff mentioned several novel applications that have been built with the Amazon API, things that Amazon would never have got round to or thought of. Their goal from the start was to provide the API and think about the business model of how both parties, Amazon and the API users, can make money.

Steven Pemberton gave the second keynote On The Design of Notations which was an entertaining and thoughtful consideration of the silly things humans do to make life easy for computers. How backwards is that? His presentation went from the notation for US state codes (GA, FL, ME) which have no good relation to the state name, to other notations such as programming languages like ABC and Python (both started by people at CWI, Amsterdam where Steven is based). XML was never designed for ease of authoring and looking ahead, some things such as xhtml2 in 2010 may be very complex. The markup produced by authoring tools is still embarrassing to read, so as people are involved, he recommended making it primarily easy for people, as parsing markup is a relatively much easier job (for coders).

Then into the conference itself where I went to Architecture Principles of the WWW, given by Chris Lilley of W3C going over the new Architecture of the World Wide Web, First Edition (a W3C Working Draft). One goal of this is to write down formally what used to be formed from ides in Tim Berners-Lee's head, some email messages and a few documents scattered around. This state was something that not everyone using or developing the web was aware of, could read or even access. Specifications can't conform to an underdocumented architecture The document covered a lot of areas that Chris outlined but there were a few questions about revising the URI specification in RFC2396bis and IRIs, admonitions against adding URI schemes such as itunes:, subscribe: that just wrap HTTP, mostly caused by issues of accessing the MIME processing stack between browsers and helper applications. Somebody in the audience said the US government has endorsed the determinedly non-networked info: URI scheme. The document now discusses safe interactions such as using HTTP GET, which led to some changes to other W3C work such as XForms and XML Protocol work to provide GET when possible and/or record safe operations. Chris was pleased about the new xml:id draft as one early example of the TAG getting somebody else to solve a problem for them!

Immediately after that was Take REST: An Analysis of Two REST APIS by Paul Prescod of Blast Radius. Paul said that he was an XML guy and interested in data, not service oriented APIs. Most Web Services (WS) are about sharing information but not in a web of information, but in silos with their own different and incompatible WS APIs. Paul promoted using instead of WS, an Resource Access Interface for REST. He was proud that his prediction that Amazon's REST service would be very popular turned out correct. Giving examples of Amazon's REST API he noted that the identifiers were ASINs and not URIs (so not resource-centric) but Amazon may have some good reasons for that:

"a URI that works just for you, is not as good as one that works for anybody" [i.e a web resource]

The SOAP and REST APIs are rather different in their identifying model and it isn't clear why the XML URIs need to be so complex.

The second API he talked about was the Atom API, "a process with nobody in charge" which is still evolving and may become an exemplar of REST for 2004, as Amazon was 2 years ago. Both atom and amazon rejected the RPC model as too fragile and not easy to extend.

Later on I went to Michael Kay's XSLT and XPath optimisation with saxon which looked impressive work, Jonathan Robie's talk SQL/XML, XQuery, and Native XML Programming Languages on SQL/XML, W3C XQuery and XQuery for Java (XQJ) of which I was aware of only XQuery. The last talk I saw on Monday was by Liam Quinn on Lessons From an XML Query-Driven SVG+XHTML Web Site for dealing with his Antiquarian books collection which has lots of really great pictures. He was using RDF(ish) metadata, and the acronyms in the title to run the web site.

Tuesday 20th April 2004

I chaired a session with two talks, the first was XML Design Principles for Form and Function by Uche Ogbuji of Fourthought which (after some laptop death problems) went over some of his considered thoughts on designing XML as a very experienced XML consultant developer and writer. The paper goes into more detail, so here's a summary of the points I grabbed:

  • Chose human readable names - this is why WSDL is awful XML
  • Use dashes in XML names, despite some issues with converting to code
  • Put core data for the user in elements, application specific stuff in attributes
  • Put structured info in elements, not micro parsing
  • Discrete information in attributes (like "300km" can be used with care)
  • Names of people should be structured rather than left as 1 string
  • Things such as document titles look odd to him as attributes values
  • Do not put URLs in element content - these are mostly for machines
  • Do not use attributes to qualify other attributes
  • Take care with namespaces and prefixes
  • It is important to preserve conventional namespace prefixes throughout the tool chain
  • Use "namespace normal form" - all namespaces are defined on the root element, with no duplicate namespace names
  • In new formats, do not repeat the namespace prefix in the local name like foo:foo (except where historical such as html:html)
  • If an example document makes people's head hurt, go back to the drawing board
  • Use XML schemas for communication XML rather than validation
  • Always declare the <?xml?> with content encoding

The second talk in the session I chaired was by Brandon Jockman on Test-Driven XML Development: Building Rapid Change Management into XML Systems applying the ideas of unit tests and suites of tests to XML, so whenever you need to change things, you can refactor the XML (document, schemas, tool chain) and still ensure the system is working with the evolving test suite. He was using XSLTUnit and XMLUnit for some of this work inside, with some warnings: "XSLT is complex enough to allow you to do very bad things"

After lunch, I went to the Knowledge Technologies track to see Semantic blogging: Spreading the semantic web meme by Steve Cayzer of HP Labs, Bristol. This was reporting some work he did for the SWAD Europe project I also work for. I had seen some of this before at project meetings but this time Steve was wearing a lovely "Life Life, Don't Blog" T-shirt (see Edd's picture).

Next up was Crawling the Semantic Web by my friend Matt Biddulph (speaking for himself, although works for the BBC). Crawling linked stuff in rdf and searching it is mostly over the web of rdfs:seeAlso properties, the RDF equivalent of <a href="..."> in HTML. Crawling also needs to consider the distribution effort, merging items by URIs, and in particular tracking the provenance and trust found: who is behind the data and the source of the data. He found that there were two really useful bits of OWL worth implementing for this application: 1) a owl:sameAs b done using renaming and 2) a owl:inverseFunctionalProperty b. He also tried an experiment with the PlanetRDF blogroll in RDF, crawling from that to the webpages and found that 30% of the planetrdf authors have links to their FOAF in their webpage - linking the html web to the rdf web.

The next session in the KT track was where I was speaking but first up was another friend, Damian Steer speaking about Treehugger: The RDF Model Meets XPath which he said was design for two audiences:

  1. XML people, who want to just use RDF with XML tools
  2. RDF people who know the graph

He saw the existing RDF/XML as more like a macro language for triples (after a quote by Patrick Stickler) but designed treehugger to produce an RDF/XML-like XML syntax, with "striping", implemented as extension functions for SAXON for XQuery and XSLT returning a document root (with versions that do interfacing). It generates an XML tree with some duplicate subtrees, a possibly infinite graph generated lazily. The basic idea was to turn RDF graph arcs into elements, with the predicate names sort of capturing the entire original RDF triple they came from (the parent element subject, the child element object). He claimed that if you design an XPath query that matches an RDF/XML document, it should work for any other equivalent one; as long as it doesn't break the striping. There were several syntax hacks including using rdf:Description a wildcard, the use of the rdf:about and rdf:nodeID attributes and rdf:li iterates over the rdf:_n items automagically. But all of that was a compromise to get it pretty much working without much user thought needed.

I spoke next on Modernising Semantic Web Markup about designing a simple XML format for RDF triples, called RXR. I'll have to see if anyone else recorded what I said during the presentation

Wednesday 21st April 2004

I chaired another session with two papers both about XForms, first A New Methodology for Designing Electronic Forms Promoting Reuse of Information Carriers by Jeanine Lilleng and Even Thorbergsen describing work on a large project using XForms in Norwegian government to reduce times of form filling and aim for filling a form once only. It seems to be pretty successful already and they have already planned an updated information model for the next version.

The second talk was on another SWAD Europe friend Charles McCathieNevile (W3C), on RDF, XForms, and the Law - Staying Out of Gaol. He has been using RDF and XForms for a foaf-a-matic style replication to make RDF data, for EARL and later on for his own use managing a legal problem across 3 languages and 2 continents. He found the basics of XForms quite simple for somebody who can do html but at the time needed an XForms validator which has now appeared.

The State of XML, Edd Dumbill

Finally I attended the closing plenary session: The State of XML by Edd Dumbill which has appeared in a different form as an article The State of XML on XML.com. My notes here are from what I recorded he said, written before the article was published

After his closing keynote at XML2000 in Paris, called the State of XML, Edd closed the conference, describing how it seems more than ever presumptuous to sum up XML. He then went on to describe his more personal options. After 5 years of bedding down XML has disrupted as well as changing things more for the better than not. Nowadays people say more "why aren't you using XML?" than "why?". He encouraged people to still challenge the status quo in XML and keep proposing alternatives.

The State of XML is pretty good.
Edd Dumbill

Edd saw XML as the spider in the center of a web of technologies and talked about some of the technologies closer to the spider than further out. Attention has been given to the lower layers of XML recently which has made him happy, after previous concentration on higher layers from the core. Maybe the energy had moved on to that, and the W3C had followed the industry needs there. Or maybe it was easier if the lower layers were stable for a while.

Two things in particular have recently pleased Edd:

  1. the XML processing model.
    You have a document, schema, transformation and stylesheet - how and when do you apply them?
  2. the xml:id draft
    which meets the core needs of several other XML activities

So the core of XML is not being neglected.

We've seen spec development move beyond the W3C, but successful ones outside the W3C are more rare. This has given freedom to challenge the status quo. One of the big successes has been RELAX NG who's model and ease of authoring has been compelling. Even inside Microsoft, RELAX NG is used as the scribble format for schemas. The XML schema languages are more of a design by individuals versus a design by committee thing than a conflicting set of organisations.

The bad issue Edd saw outside the W3C is the loss of coordination with web architecture and the WG process. However, W3C specs are not guarantees for success. Some things in particular are not looking good in the pile of Web Services frameworks specifications; lots of divergence and maybe moving towards chaos.

A conundrum, some of the things that are exciting - interoperability across applications. But some, hateful things - overblown hype, poor specifications, escalating complexity remain. This is now mostly one step removed from XML. However, the desktop is less of an island as Jeff Barr mentioned in the opening keynote. REST as a very webby API for scripting and web environment is very pleasing and accessible - feels more like how the web came about. With HTTP you can build services and compose them. The Web Services world has descended into some dull, but maybe necessary, complexity. Some of the components we had in CORBA are being reimplemented again in angle brackets. It's a bit disturbing reimplementing what we had before.

As a developer Edd feel unhappy with a monolithic pile of technologies that can only be used from either Java or .Net. Rather than the simple model of HTTP.

On the bright side, he saw a lot more useful stuff to be done with document based web services, and in particular with XForms. Forms can be decentralised and passed around.

Changes have led to new problems to solve, new issues and one of the hardest is about metadata, needing new processes. Systems developers are now seeing more of the need for this; rather than just being seen for librarians. Nowadays we are all needing to manage this as well have email, photos, word documents, electronic forms, PDFs. Even at the low level vendors are seeing the advantage of metadata storage and manipulation. He refereed to Microsoft's upcoming WinFS - filesystem enriched with metadata facilities. We have the tools for this now - RDF, OWL, Topic Maps, XML Schema - all great. But the biggest issue is "which schema should we use?" even for the simple things like names and addresses. There is more work to be done here in selling this. We can't always get everyone using the same set of terms.

Another challenge, more outside our normal experience, is the user interfaces to ensure that metadata enriches our content. As well as terms, we need techniques for translations to map between the metadata schemas - the XSLT for ontologies.

There are now new constituencies such as mobile. On the desktop, IE is the anchor but the mobile world is emerging from the disaster of their own creation - special versions of HTTP, HTML for WAP - similar, but not the same, and not clear on control. They are moving away from this redundancy and their needs are more being addressed, such as binary XML for their constrained environment.

The web user interface platform isn't getting better in terms of usability, so we are seeing technologies outside the browser. Microsoft's .Net XAML is a windows description for widgets in XML, which allows mixing code. What's so great about this?, asked Edd. Answer: It's easy and webby. Some people think that mostly graphical tools are used, but you know text editors/raw XML are still used. These interfaces can be trivially connected to the web and web services. A lot of people are concerned that this might replace the desktop browser, and stop an open interface for web applications - the browser.

On Intellectual Property (IP) issues, some of us have spent a long time in the XML space not looking outside as things changed . Edd is still somewhat surprised by seeing URLs on TVs. Web resources are nearly as real as houses, cars, the street, things in the real world. And these are getting laws and policy like the real world. Some people have realised this and are getting in on the law making, and not all for the common good. Copyright is good, but restrictions on a means of expression isn't so good. And what are schemas but methods of expression?

The great strengths of XML are it's intimate relationship to the Internet and the web. It has unlocked a large number of possibilities. And it's relationship to the URI has facilitated both elegant and practical applications. Another one is the human readable and editable form - and you cannot underestimate this. A conference theme. XML.com asked developers how they edited their XML and over 80% said they used a text editor (maybe also other things - 45% used an XML editor also). It fascinated Edd that generating markup that is easy to read and write such as simple document types like HTML, RSS leads to success. Microsoft gets this one also, like in their new XML formats such as XAML, WinFS technology and after a sort of firestorm with their early XML work have moved to better formats.

Edd briefly mentioned RDF and syntaxes, and it was better to think of them as less of an XML thing and it is OK to have a non-XML syntax. Thinking of this, RELAX NG has an non-XML format for a long time.

Edd payed tribute to the dedicated bunch of XML people who have keep the flame of XML burning and told people to feel inspired showing an "Inspiration ... XML!" slogan over a glorious sunset picture

Where else do you get a huge variety of technologies (docs, data, images, libraries, ...) related with XML and even working into the core of XML technology. It may not have met their meeds precisely but it's web enablement and pervasiveness has enabled it's success.

Fin

Summary

As usual, another interesting XML Europe conference. So what caught my eye? Learned that XForms is widely used, the many ways of querying XML. What did I miss - the always lovely SVG talks, which all clashed with my chairing duties :( However The KT track has worked as a mini semantic-web conference for developers in Europe, and as might be seen, very relevant for SWAD-Europe. Of course I learned again that Amsterdam is a great city to enjoy.

See also the XML Europe 2004 Proceedings

Posted by dbeckett2 at 04:29 PM | Comments (0)

April 27, 2004

Some thoughts on RDF rendering

As part of the Semantic Portals work we've had to create a browsing utility that presents a web based UI for viewing and navigating a set of RDF descriptions (of environmental organizations). This blog note captures a few thoughts on how to go about this based on our experiences with the current prototype.

Many applications have a need to render RDF in human-readable form.

In some cases the requirement is to present a complete and accurate visualization of the RDF for people familar with the RDF model. There are many tools which support this both graphical (e.g. IsaViz, visualizer) and textual (e.g. brownsauce).

In other cases the need is to present an application specific UI which contains some data extracted from the RDF. In that case some form of template-driven rendering approach seems appropriate. There are some tools for this (e.g. RDFStyles) as well as general XSLT-for-RDF proposals (e.g. treehugger, RDF Template, RDF Twig) which could be used for rendering directly to XHTML.

One concern is how robust a template-driven approach is. It seems that a direct application of XSLT style templating might tend to produce monolithic templates which just insert strings extracted from the RDF graph into holes in the template. This fill-in-the-blanks approach could be fragile. We want to be able to write templates that can handle missing data and would not need substantial rewrites if the data were extended with extra properties or substructures. More specifically we identified the following requirements to enable robust templates:

  1. Open ended slots - rather than just put a specific property value in a slot in a template we wanted to be able to enumerate all properties of a particular type For example we have several different relational links between organizations in the dataset, we want to be able to write a template which says "list all relational links" here in such a way we can add new specific link types in the future without having to modify the templates.
  2. "Object" nesting - some of our data includes nested substructures (e.g. contact information) typically represented using bNodes. The top level template should be able to say "render the contact information here using an appropriate template" and then, when we later change the schema for the contact data, the top level template should remain valid, it should just find a different subtemplate to use for the contact information, dynamically.
  3. Conditionals - RDF is semi-structured, there are often missing values to cope with. A scripting language which can say "if this value is present do this bit of rendering" compactly and clearly would seem useful.
  4. Template extension. We aggregate data from different sources. We want the community to be able to add additional RDF in new schemas specific to specialist subgroups. Thus we'd like them to be able to publish appropriate rendering templates that the portal can use for rendering objects it doesn't know about.

Our current solution to all this is reasonably simple but proving quite effective for us. We use a general template language with scripting capability Jakarata velocity. We place wrapped up versions of the RDF data together with a rendering manager object into the velocity context so that the velocity scripts can access them through a simple API. The velocity templates can call out to the embedded rendering manager asking it to render a structured resource using some appropriate template. The rendering manager can then discover the template to use based on the type of the resource being rendered and a weighted mapping of RDF classes to templates. We identify the templates using URIs making it easy to add new templates published by other providers. The RDF model wrappers provide convenience functions for iterating over all properties of a given type (identified by namespace, marker properties or the subproperty hiearachy) to make it possible to write templates with open-ended slots.

This approach seems to be working well for us. It does have a somewhat imperative (as opposed to declarative) flavour, but it does seem reasonably simple and flexible. The key was the use of a rendering manager to enable a template to invoke other templates indirectly via a template registry. Just another example of the old adage - there's no problem in computer science that can't be solved with one more level of indirection :-)

Posted by dreynold2 at 04:51 PM | Comments (0)

April 16, 2004

SWAD-Europe and the ECOinformatics Initiative

The United Nations Environment Programme (UNEP) hosted the first Environmental Thesaurus and Terminology Workshop this week in Geneva. The Workshop is part of the broader ECOinformatics Initiative, which is working to facilitate co-operation between organisations and projects working in the area of enviromental information.

SWAD-Europe was represented at the workshop by myself (Alistair Miles, CCLRC). I gave a presentation on recent developments in the SWAD-Europe Thesaurus Activity.

I was very impressed to find that, although this was the first meeting of this group, there was a strong coherence in the vision, goals and expertise of its members. Developing web services for accessing terminologies and thesauri via the internet was a major theme of the workshop, and received unanimous interest. In this context I presented the SKOS API, a generic application programming interface for a thesaurus web service which is being developed as part of the SWAD-Europe Thesaurus Activity. The API was well received, and we hope to involve members of this community in its further development and testing.

Another theme of the workshop was standards for machine-readable representations of terminological data. This community is keen to achieve tighter integration of its technological infrastructure, and adopting some common standard for data publication and exchange is recognised by all as a necessity. I presented the SKOS-Core RDF Schema for RDF encodings of thesaurus and terminological data as a solution for this requirement, and the proposal was well-received. The extensibility of the SKOS-Core framework allows thesauri to be ported to the semantic web in a way that preserves all their unique features without compromising interoperability, and this was recognised as a significant boon.

The Semantic Web was also represented at the workshop by Bernard Vatant (Mondeca), who gave an excellent presentation introducing the core Semantic Web technologies and illustrating their use and potential for building well organised information systems for the web. Both Bernard and I also represent the W3C Semantic Web Best Practices and Deployment Working Group, and there was an action taken at the end of the meeting to establish a link between SWBP-WG and ECOTERM, which was the name agreed upon for the working group that will be formed from those present at the meeting and the wider community interested in environmental terminologies and thesauri.

It was a pleasure to meet the members of this highly competent and forward-looking community. Thanks again to Gerry Cunningham from UNEP and Stefan Jensen from EEA.

Posted by ajmiles at 01:56 PM | Comments (0)

April 14, 2004

FAQ: How do I validate RDF?

Validation for RDF can mean a variety of different terms especially where RDF is using XML and several layers of technology are connected. This FAQ describes validation for RDF and answers how to do it for the different technologies.

Validation is a tricky word to consider, and often used with schema, which can also have several different interpretations. There is validation of syntax (XML validation, RDF/XML - RDF's XML syntax) as well as RDF schema validation.

That means you can do:

  1. XML validation against an XML schema, also called XML schema validation
  2. RDF/XML validation of the syntax that it matches the RDF/XML Syntax Specification (Revised) W3C Recommendation
  3. RDF schema validation

RDF schema validation bears expanding. An RDF schema is a description of the terms used in the RDF triples forming the RDF graph (which can be written in a document, in an RDF/XML syntax). Checking the terms match how the RDF schema describes them is what RDF schema validation typically means. RDF schemas allow description of classes, properties and the ranges, domains of properties and so on. This is explained further in the RDF Vocabulary Description Language 1.0: RDF Schema W3C Recommendation.

There are plenty of tools that do these kinds of things already. The XML validation needs an XML parser, a description of the XML - an XML schema in some XML schema languge and an XML schema validator. The RDF/XML validation needs an RDF parser (see the FAQ How Do I Parse RDF?) The RDF schema validation needs a system that do the checking which formally, is called handling RDFS entailment defined in the RDF Semantics W3C Recommendation.

You can find RDF schema validators in free software RDF toolkits such as Jena and Sesame, Euler (all in Java) as well as in the python Cwm. There are some on-line RDF schema validators such as Rosco - a non-judgemental RDF schema and document checker which does checking, but not all RDFS entailments.

OWL systems which contain OWL reasoners can also typically validate RDF schema since it is a small subset of the more powerful things that OWL reasoners can do, and all RDF is OWL Full. OWL applications that only handle OWL-DL or OWL-Lite cannot check all RDFS entailments.

The W3C RDF validator is an RDF/XML validator, not an RDFS validatior. It is based on the ARP2 RDF parser which is part of Jena. There are many more RDF parsers available that can perform this as already described in the FAQ How Do I Parse RDF?

Finally, you can do XML validation of RDF/XML against an XML schema such as RELAX NG. It's in the RDF/XML specification in section A.1 RELAX NG Compact Schema. The W3C XML Schema language (WXS) is not as suitable as RELAX NG for XML-validating RDF/XML, since it is enforces strong XML constraints and RDF/XML has a wide-open set of tags that may appear.

See also the FAQs:

Posted by dbeckett2 at 11:45 AM | Comments (0)

April 01, 2004

Semantic Blogging update

It's been an interesting month for semantic blogging. I'm in the midst of writing papers and articles, some external, some internal (and more of which anon). We're also trying to deploy semantic blogging internally, a true 'eat your own dogfood' approach. I'm hoping to demonstrate an early prototype at XMLEurope 2004 in Amsterdam. If you can't make it, then the paper is available from my site.
Posted by scayzer2 at 04:03 PM | Comments (0)