SkosDev/GlobalReference

From W3C Wiki

Using Local Identifiers for Global Reference

This wiki page discusses ways in which you could use local identifiers for thesaurus concepts to unambiguously reference those concepts in a global context.

Let's say I have created a thesaurus, and in that thesaurus is a concept with the preferred label 'bananas' and the alternative label 'plantains'. Within my own applications I use the identifier 'A001' to refer to this concept.

Now I would like to publish an RDF description of my thesaurus, and start using my thesaurus within semantic web applications. Is there any way I can keep on using the 'A001' identifier to refer to the 'bananas' concept in this new global context, given that other people have probably used the identifier 'A001' to refer to a whole host of other things?

Option 1: Use a URI Namespace

If I happen to own the URI http://www.example.com then I can use this URI as part of a URI namespace. I can then append the local identifier of each concept to this namespace to achieve a global identifier. For example, I could publish an RDF description of my bananas concept:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:my="http://www.example.com/mythesaurus#">
  
  <skos:Concept rdf:about="http://www.example.com/mythesaurus/A001">
    <skos:prefLabel>bananas</skos:prefLabel>
    <skos:altLabel>plantains</skos:altLabel>
  </skos:Concept>

</rdf:RDF>


This is known as 'directly allocating' a URI to a concept.

Other people can then use this URI to refer to my concept from other contexts, for example in a statement about the subject of a web page, e.g.:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
  
  <rdf:Description rdf:about="http://www.bananalink.org.uk/">
    <skos:subject rdf:resource="http://www.example.com/mythesaurus/A001"/>
  </rdf:Description>

</rdf:RDF>


This option is the simplest way of doing things in RDF. However, if you do not for some reason want to directly allocate a URI to each one of your concepts, there are alternatives.

Option 2: Define Your Own IFP

An inverse-functional property (IFP) allows you to identify something indirectly. If you state that some property p is an IFP, then two things with the same value of p are necessarily the same thing.

For example, I could define the property <http://www.example.com/mythesaurus#identifier> as an identifier for a concept in my thesaurus, and as an IFP. Then I could publish an RDF description of my bananas concept, e.g.:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:my="http://www.example.com/mythesaurus#">
  
  <skos:Concept>
    <skos:prefLabel>bananas</skos:prefLabel>
    <skos:altLabel>plantains</skos:altLabel>
    <my:identifier>A001</my:identifier>
  </skos:Concept>

</rdf:RDF>


This would allow other people to refer to my concept from other contexts, for example in a statement about the subject of a web page, e.g.:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:my="http://www.example.com/mythesaurus#">
  
  <rdf:Description rdf:about="http://www.bananalink.org.uk/">
    <skos:subject>
      <rdf:Description>
        <my:identifier>A001</my:identifier>
      </rdf:Description>
    </skos:subject>
  </rdf:Description>

</rdf:RDF>


As a short note about RDF/XML, you can use a shorthand form when describing blank nodes (i.e. nodes without URIs). The following piece of RDF/XML is identical to the piece directly above:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:my="http://www.example.com/mythesaurus#">
  
  <rdf:Description rdf:about="http://www.bananalink.org.uk/">
    <skos:subject rdf:parseType="Resource">
      <my:identifier>A001</my:identifier>
    </skos:subject>
  </rdf:Description>

</rdf:RDF>


N.B. any solution that uses IFPs to establish identity carries a (possibly significant?) computational overhead, in that a reasoning engine must perform some inference to establish that two nodes with the same value for an IFP are in fact the same node.

N.B. In this solution, a URI was allocated to the identifier property, and to ensure stability of this identification mechanism this URI must be maintained.

Option 3: Use a Generic IFP with a Datatype

Let's just say for a minute that SKOS Core had a property called skos:localID (it doesn't, but let's just imagine it does for the moment). skos:localID is an IFP, but to make it useful as an IFP you are instructed to use it with an user-defined datatype that is specific to your thesaurus.

So, for example, I could define the datatype <http://www.example.com/mythesaurus#IDdatatype>. Then I could publish an RDF description of my 'bananas' concept as follows:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
  
  <skos:Concept>
    <skos:prefLabel>bananas</skos:prefLabel>
    <skos:altLabel>plantains</skos:altLabel>
    <skos:localID rdf:datatype="http://www.example.com/mythesaurus#IDdatatype">A001</skos:localID>
  </skos:Concept>

</rdf:RDF>


Other people could refer to this concept from other contexts, as in:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
  
  <rdf:Description rdf:about="http://www.bananalink.org.uk/">
    <skos:subject rdf:parseType="Resource">
      <skos:localID rdf:datatype="http://www.example.com/mythesaurus#IDdatatype">A001</skos:localID>
    </skos:subject>
  </rdf:Description>

</rdf:RDF>


N.B. 'skos:localID' is not a real property in the SKOS Core vocab. However, adding such a property was discussed, to be used as an IFP in conjunction with a datatype as described in this section. However, at the time I felt that, for using local identifiers in the absence of allocating URIs, option 2 here was just as good if not better, and there was no strong need for such a property. If you have any feelings about this, post a message to <public-esw-thes@w3.org>.

Again it should be noted that I have had to use a URI to identify my datatype - i.e. I have not avoided URIs completely, and may just as easily have allocated URIs directly to my concepts.

Option 4 - Use a Generic Property in Conjunction with an Identity Rule

Dublin Core has a property dc:identifier. This property is not an IFP, but I could use this property in combination with another property such as skos:inScheme to establish identity. For example, if I published:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
  
  <skos:Concept>
    <skos:prefLabel>bananas</skos:prefLabel>
    <skos:altLabel>plantains</skos:altLabel>
    <dc:identifier>A001</dc:identifier>
    <skos:inScheme rdf:resource="http://www.example.com/mythesaurus"/>
  </skos:Concept>

</rdf:RDF>


... and I also published an identity rule which said that:


(?x dc:identifier ?id)
(?x skos:inScheme ?scheme)
(?y dc:identifier ?id)
(?y skos:inScheme ?scheme)
->
(?x owl:sameAs ?y)


... then other people could refer to this concept from other contexts, as in:


<rdf:RDF 
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
  
  <rdf:Description rdf:about="http://www.bananalink.org.uk/">
    <skos:subject rdf:parseType="Resource">
      <dc:identifier>A001</dc:identifier>
      <skos:inScheme rdf:resource="http://www.example.com/mythesaurus"/>
    </skos:subject>
  </rdf:Description>

</rdf:RDF>


... as long as they had implemented the identity rule in their own RDF systems. Because there is no standard rule syntax as yet, relying on rules like this is probably a bad idea at the moment, although it doesn't mean that will be true in the future.

Also as with IFPs there is a computational overhead associated with establishing identity via some sort of inference. It may not be significant, but it should be considered.

And again, as with the previous 2 options, I have had to use a URI somewhere along the line to achiece globally unique reference. As long as I am using at least one URI, then I have to worry about URI persistence and maintenance, and those worries don't necessarily increase if I were to allocate URIs to all my concepts. If I were to allocate URIs to all my concepts, using a common URI base, then I really only have to consider the persistence and maintenance of that URI base, i.e. I am really only maintaining a single URI.

Discussion

Each of the three options above is essentially equivalent, in terms of the cost of allocating and maintaining URIs. I.e. I have to somehow combine [someURI] with [localIdentifier] to achieve globally unique identity. Which of the above options to choose is then a matter of style.

My opinion is, given that options 2 3 and 4 are essentially equivalent to directly allocating URIs (option 1), carry the same overheads with regards to URI mainenance, and carry a (possibly significant?) computational cost, you really might as well directly assign URIs.

All comments on this discussion are most welcome, and should be sent to public-esw-thes@w3.org

By Al Miles.