April 28, 2004

XML Europe 2004

Last week I attended XML Europe in sunny Amsterdam at the RAI and made notes in some sessions I attended. So far I've not been able to get on with live blogging, rather I've been doing live talking to people during the conference.

This article reports on the presentations I saw along with some of the notes I took at the time, as well as a recording of the closing keynote by Edd Dumbill. (3,300 words).

Monday 19th April 2004

Jeff Barr of Amazon gave the opening keynote despite having had all his luggage lost by the airline, Driving Innovation with Web Services at Amazon.com describing how they have been making a programmable web site with web services. I heard some of this last year at the O'Reilly Open Source Conference, but it is still good to know that it is working out well for them, and in particular still 80% of their developers use the REST API versus the SOAP one, mostly because of the XSLT transform they allow and ease of use. Jeff mentioned several novel applications that have been built with the Amazon API, things that Amazon would never have got round to or thought of. Their goal from the start was to provide the API and think about the business model of how both parties, Amazon and the API users, can make money.

Steven Pemberton gave the second keynote On The Design of Notations which was an entertaining and thoughtful consideration of the silly things humans do to make life easy for computers. How backwards is that? His presentation went from the notation for US state codes (GA, FL, ME) which have no good relation to the state name, to other notations such as programming languages like ABC and Python (both started by people at CWI, Amsterdam where Steven is based). XML was never designed for ease of authoring and looking ahead, some things such as xhtml2 in 2010 may be very complex. The markup produced by authoring tools is still embarrassing to read, so as people are involved, he recommended making it primarily easy for people, as parsing markup is a relatively much easier job (for coders).

Then into the conference itself where I went to Architecture Principles of the WWW, given by Chris Lilley of W3C going over the new Architecture of the World Wide Web, First Edition (a W3C Working Draft). One goal of this is to write down formally what used to be formed from ides in Tim Berners-Lee's head, some email messages and a few documents scattered around. This state was something that not everyone using or developing the web was aware of, could read or even access. Specifications can't conform to an underdocumented architecture The document covered a lot of areas that Chris outlined but there were a few questions about revising the URI specification in RFC2396bis and IRIs, admonitions against adding URI schemes such as itunes:, subscribe: that just wrap HTTP, mostly caused by issues of accessing the MIME processing stack between browsers and helper applications. Somebody in the audience said the US government has endorsed the determinedly non-networked info: URI scheme. The document now discusses safe interactions such as using HTTP GET, which led to some changes to other W3C work such as XForms and XML Protocol work to provide GET when possible and/or record safe operations. Chris was pleased about the new xml:id draft as one early example of the TAG getting somebody else to solve a problem for them!

Immediately after that was Take REST: An Analysis of Two REST APIS by Paul Prescod of Blast Radius. Paul said that he was an XML guy and interested in data, not service oriented APIs. Most Web Services (WS) are about sharing information but not in a web of information, but in silos with their own different and incompatible WS APIs. Paul promoted using instead of WS, an Resource Access Interface for REST. He was proud that his prediction that Amazon's REST service would be very popular turned out correct. Giving examples of Amazon's REST API he noted that the identifiers were ASINs and not URIs (so not resource-centric) but Amazon may have some good reasons for that:

"a URI that works just for you, is not as good as one that works for anybody" [i.e a web resource]

The SOAP and REST APIs are rather different in their identifying model and it isn't clear why the XML URIs need to be so complex.

The second API he talked about was the Atom API, "a process with nobody in charge" which is still evolving and may become an exemplar of REST for 2004, as Amazon was 2 years ago. Both atom and amazon rejected the RPC model as too fragile and not easy to extend.

Later on I went to Michael Kay's XSLT and XPath optimisation with saxon which looked impressive work, Jonathan Robie's talk SQL/XML, XQuery, and Native XML Programming Languages on SQL/XML, W3C XQuery and XQuery for Java (XQJ) of which I was aware of only XQuery. The last talk I saw on Monday was by Liam Quinn on Lessons From an XML Query-Driven SVG+XHTML Web Site for dealing with his Antiquarian books collection which has lots of really great pictures. He was using RDF(ish) metadata, and the acronyms in the title to run the web site.

Tuesday 20th April 2004

I chaired a session with two talks, the first was XML Design Principles for Form and Function by Uche Ogbuji of Fourthought which (after some laptop death problems) went over some of his considered thoughts on designing XML as a very experienced XML consultant developer and writer. The paper goes into more detail, so here's a summary of the points I grabbed:

  • Chose human readable names - this is why WSDL is awful XML
  • Use dashes in XML names, despite some issues with converting to code
  • Put core data for the user in elements, application specific stuff in attributes
  • Put structured info in elements, not micro parsing
  • Discrete information in attributes (like "300km" can be used with care)
  • Names of people should be structured rather than left as 1 string
  • Things such as document titles look odd to him as attributes values
  • Do not put URLs in element content - these are mostly for machines
  • Do not use attributes to qualify other attributes
  • Take care with namespaces and prefixes
  • It is important to preserve conventional namespace prefixes throughout the tool chain
  • Use "namespace normal form" - all namespaces are defined on the root element, with no duplicate namespace names
  • In new formats, do not repeat the namespace prefix in the local name like foo:foo (except where historical such as html:html)
  • If an example document makes people's head hurt, go back to the drawing board
  • Use XML schemas for communication XML rather than validation
  • Always declare the <?xml?> with content encoding

The second talk in the session I chaired was by Brandon Jockman on Test-Driven XML Development: Building Rapid Change Management into XML Systems applying the ideas of unit tests and suites of tests to XML, so whenever you need to change things, you can refactor the XML (document, schemas, tool chain) and still ensure the system is working with the evolving test suite. He was using XSLTUnit and XMLUnit for some of this work inside, with some warnings: "XSLT is complex enough to allow you to do very bad things"

After lunch, I went to the Knowledge Technologies track to see Semantic blogging: Spreading the semantic web meme by Steve Cayzer of HP Labs, Bristol. This was reporting some work he did for the SWAD Europe project I also work for. I had seen some of this before at project meetings but this time Steve was wearing a lovely "Life Life, Don't Blog" T-shirt (see Edd's picture).

Next up was Crawling the Semantic Web by my friend Matt Biddulph (speaking for himself, although works for the BBC). Crawling linked stuff in rdf and searching it is mostly over the web of rdfs:seeAlso properties, the RDF equivalent of <a href="..."> in HTML. Crawling also needs to consider the distribution effort, merging items by URIs, and in particular tracking the provenance and trust found: who is behind the data and the source of the data. He found that there were two really useful bits of OWL worth implementing for this application: 1) a owl:sameAs b done using renaming and 2) a owl:inverseFunctionalProperty b. He also tried an experiment with the PlanetRDF blogroll in RDF, crawling from that to the webpages and found that 30% of the planetrdf authors have links to their FOAF in their webpage - linking the html web to the rdf web.

The next session in the KT track was where I was speaking but first up was another friend, Damian Steer speaking about Treehugger: The RDF Model Meets XPath which he said was design for two audiences:

  1. XML people, who want to just use RDF with XML tools
  2. RDF people who know the graph

He saw the existing RDF/XML as more like a macro language for triples (after a quote by Patrick Stickler) but designed treehugger to produce an RDF/XML-like XML syntax, with "striping", implemented as extension functions for SAXON for XQuery and XSLT returning a document root (with versions that do interfacing). It generates an XML tree with some duplicate subtrees, a possibly infinite graph generated lazily. The basic idea was to turn RDF graph arcs into elements, with the predicate names sort of capturing the entire original RDF triple they came from (the parent element subject, the child element object). He claimed that if you design an XPath query that matches an RDF/XML document, it should work for any other equivalent one; as long as it doesn't break the striping. There were several syntax hacks including using rdf:Description a wildcard, the use of the rdf:about and rdf:nodeID attributes and rdf:li iterates over the rdf:_n items automagically. But all of that was a compromise to get it pretty much working without much user thought needed.

I spoke next on Modernising Semantic Web Markup about designing a simple XML format for RDF triples, called RXR. I'll have to see if anyone else recorded what I said during the presentation

Wednesday 21st April 2004

I chaired another session with two papers both about XForms, first A New Methodology for Designing Electronic Forms Promoting Reuse of Information Carriers by Jeanine Lilleng and Even Thorbergsen describing work on a large project using XForms in Norwegian government to reduce times of form filling and aim for filling a form once only. It seems to be pretty successful already and they have already planned an updated information model for the next version.

The second talk was on another SWAD Europe friend Charles McCathieNevile (W3C), on RDF, XForms, and the Law - Staying Out of Gaol. He has been using RDF and XForms for a foaf-a-matic style replication to make RDF data, for EARL and later on for his own use managing a legal problem across 3 languages and 2 continents. He found the basics of XForms quite simple for somebody who can do html but at the time needed an XForms validator which has now appeared.

The State of XML, Edd Dumbill

Finally I attended the closing plenary session: The State of XML by Edd Dumbill which has appeared in a different form as an article The State of XML on XML.com. My notes here are from what I recorded he said, written before the article was published

After his closing keynote at XML2000 in Paris, called the State of XML, Edd closed the conference, describing how it seems more than ever presumptuous to sum up XML. He then went on to describe his more personal options. After 5 years of bedding down XML has disrupted as well as changing things more for the better than not. Nowadays people say more "why aren't you using XML?" than "why?". He encouraged people to still challenge the status quo in XML and keep proposing alternatives.

The State of XML is pretty good.
Edd Dumbill

Edd saw XML as the spider in the center of a web of technologies and talked about some of the technologies closer to the spider than further out. Attention has been given to the lower layers of XML recently which has made him happy, after previous concentration on higher layers from the core. Maybe the energy had moved on to that, and the W3C had followed the industry needs there. Or maybe it was easier if the lower layers were stable for a while.

Two things in particular have recently pleased Edd:

  1. the XML processing model.
    You have a document, schema, transformation and stylesheet - how and when do you apply them?
  2. the xml:id draft
    which meets the core needs of several other XML activities

So the core of XML is not being neglected.

We've seen spec development move beyond the W3C, but successful ones outside the W3C are more rare. This has given freedom to challenge the status quo. One of the big successes has been RELAX NG who's model and ease of authoring has been compelling. Even inside Microsoft, RELAX NG is used as the scribble format for schemas. The XML schema languages are more of a design by individuals versus a design by committee thing than a conflicting set of organisations.

The bad issue Edd saw outside the W3C is the loss of coordination with web architecture and the WG process. However, W3C specs are not guarantees for success. Some things in particular are not looking good in the pile of Web Services frameworks specifications; lots of divergence and maybe moving towards chaos.

A conundrum, some of the things that are exciting - interoperability across applications. But some, hateful things - overblown hype, poor specifications, escalating complexity remain. This is now mostly one step removed from XML. However, the desktop is less of an island as Jeff Barr mentioned in the opening keynote. REST as a very webby API for scripting and web environment is very pleasing and accessible - feels more like how the web came about. With HTTP you can build services and compose them. The Web Services world has descended into some dull, but maybe necessary, complexity. Some of the components we had in CORBA are being reimplemented again in angle brackets. It's a bit disturbing reimplementing what we had before.

As a developer Edd feel unhappy with a monolithic pile of technologies that can only be used from either Java or .Net. Rather than the simple model of HTTP.

On the bright side, he saw a lot more useful stuff to be done with document based web services, and in particular with XForms. Forms can be decentralised and passed around.

Changes have led to new problems to solve, new issues and one of the hardest is about metadata, needing new processes. Systems developers are now seeing more of the need for this; rather than just being seen for librarians. Nowadays we are all needing to manage this as well have email, photos, word documents, electronic forms, PDFs. Even at the low level vendors are seeing the advantage of metadata storage and manipulation. He refereed to Microsoft's upcoming WinFS - filesystem enriched with metadata facilities. We have the tools for this now - RDF, OWL, Topic Maps, XML Schema - all great. But the biggest issue is "which schema should we use?" even for the simple things like names and addresses. There is more work to be done here in selling this. We can't always get everyone using the same set of terms.

Another challenge, more outside our normal experience, is the user interfaces to ensure that metadata enriches our content. As well as terms, we need techniques for translations to map between the metadata schemas - the XSLT for ontologies.

There are now new constituencies such as mobile. On the desktop, IE is the anchor but the mobile world is emerging from the disaster of their own creation - special versions of HTTP, HTML for WAP - similar, but not the same, and not clear on control. They are moving away from this redundancy and their needs are more being addressed, such as binary XML for their constrained environment.

The web user interface platform isn't getting better in terms of usability, so we are seeing technologies outside the browser. Microsoft's .Net XAML is a windows description for widgets in XML, which allows mixing code. What's so great about this?, asked Edd. Answer: It's easy and webby. Some people think that mostly graphical tools are used, but you know text editors/raw XML are still used. These interfaces can be trivially connected to the web and web services. A lot of people are concerned that this might replace the desktop browser, and stop an open interface for web applications - the browser.

On Intellectual Property (IP) issues, some of us have spent a long time in the XML space not looking outside as things changed . Edd is still somewhat surprised by seeing URLs on TVs. Web resources are nearly as real as houses, cars, the street, things in the real world. And these are getting laws and policy like the real world. Some people have realised this and are getting in on the law making, and not all for the common good. Copyright is good, but restrictions on a means of expression isn't so good. And what are schemas but methods of expression?

The great strengths of XML are it's intimate relationship to the Internet and the web. It has unlocked a large number of possibilities. And it's relationship to the URI has facilitated both elegant and practical applications. Another one is the human readable and editable form - and you cannot underestimate this. A conference theme. XML.com asked developers how they edited their XML and over 80% said they used a text editor (maybe also other things - 45% used an XML editor also). It fascinated Edd that generating markup that is easy to read and write such as simple document types like HTML, RSS leads to success. Microsoft gets this one also, like in their new XML formats such as XAML, WinFS technology and after a sort of firestorm with their early XML work have moved to better formats.

Edd briefly mentioned RDF and syntaxes, and it was better to think of them as less of an XML thing and it is OK to have a non-XML syntax. Thinking of this, RELAX NG has an non-XML format for a long time.

Edd payed tribute to the dedicated bunch of XML people who have keep the flame of XML burning and told people to feel inspired showing an "Inspiration ... XML!" slogan over a glorious sunset picture

Where else do you get a huge variety of technologies (docs, data, images, libraries, ...) related with XML and even working into the core of XML technology. It may not have met their meeds precisely but it's web enablement and pervasiveness has enabled it's success.

Fin

Summary

As usual, another interesting XML Europe conference. So what caught my eye? Learned that XForms is widely used, the many ways of querying XML. What did I miss - the always lovely SVG talks, which all clashed with my chairing duties :( However The KT track has worked as a mini semantic-web conference for developers in Europe, and as might be seen, very relevant for SWAD-Europe. Of course I learned again that Amsterdam is a great city to enjoy.

See also the XML Europe 2004 Proceedings


Categories trip report
Posted by dbeckett2 at April 28, 2004 04:29 PM
Comments
Post a comment