OwlPuzzle

From W3C Wiki

Exploring the relationship between OWL and RDF in detail, and the OWL profileS (Full/DL/Lite), some things can be puzzling.

Here be tests, examples, puzzles for discussion, commentary.

Using FunctionalProperty and InverseFunctionalProperty in RDF/OWL - how does OWL Full differ?

Only the OWL Full version of OWL allows an IFP to take literal values; in DL and Lite each IFP is an ObjectProperty (@@refs).

Here is a quick set of examples for discussion, exploring how this affects resource description in OWL. This discussion highlights some interactions between technical features of RDF/RDFS/OWL, as well as the relationship of those issues to the expressiveness, power and utility of the various OWL profiles (Full/DL/Lite). This may help illustrate the design choices faced by (a) vocab creators ("should I use an ObjectProperty here?") and (b) data consumers ("What conclusions can I draw here?").

In OWL, there are two kinds of properties. Each property is either a DatatypeProperty or an ObjectProperty. ObjectProperty: values are always resources (represented in RDF either as bNodes or with a URI name). DatatypeProperty: string-valued, optionally with datatype info, xml language tagging etc. In OWL DL and OWL Lite, these are distinct. This breaks with deployed practice for some RDF vocabularies (eg. Dublin Core's dc:creator property). OWL Full allows such scruffyness to continue.

In OWL DL (and OWL Lite) you can't use textual IFPs. This is because in OWL, an inverse functional property is just that, the inverse of a functional property. However, datatype properties (such as eg:age) can't have inverses, due to RDF's restriction that the subject of each RDF statement much be a resource, never a literal. The Description Logic semantics of OWL DL exploit this when formalising the meaning of IP and IFP, which is why they exclude the (otherwise seemingly reasonable) possibility of inverse functional properties taking string values.

(@@todo: make this full RDF/XML syntax with real namespace URIs)

First, we declare a couple of properties in a schema/ontology:

<!-- schema info, todo: check urls -->
<owl:InverseFunctionalProperty rdf:about="http://xmlns.com/foaf/0.1/homepage"/>
<owl:ObjectProperty rdf:about="http://xmlns.com/foaf/0.1/homepage"/>
<owl:InverseFunctionalProperty rdf:about="http://xmlns.com/foaf/corp/nasdaqCode"/>
<owl:FunctionalProperty rdf:about="http://xmlns.com/foaf/corp/nasdaqCode"/>


Here we try using one of these to description homepage of a company. In this example, there are two very slightly different URIs denoting the same document, so we will need a little more help to draw the conclusion that these descriptions can be merged.

Scenario 1: We are using a resource-valued IFP (ie. ObjectProperty) to individuate companies. Note that here, the use of homepage as an identification strategy breaks down. We have two company descriptions, the first of which relates a company to the resource with URI http://testcorp.example.com/ and the second to the resource with URI http://testcorp.example.com:80/ Only later do we learn that these two URIs denote the same resource.

<!-- a.rdf -->
<fc:Company>
  <foaf:homepage rdf:resource="http://testcorp.example.com/"/>
  <fc:name>TestCorp, Inc.</fc:name>
  <fc:test>can we merge two descriptions?</fc:test>
</fc:Company>


A second hypothetical document contains further markup, in reality about the same company. But how do we get an RDF/OWL processor to realise this?


<!-- b.rdf -->
<fc:Company>
  <foaf:homepage rdf:resource="http://testcorp.example.com:80/"/>
  <fc:name>TestCorp, Inc.</fc:name>
  <fc:test>a 2nd description</fc:test>
</fc:Company>


The answer is to realise later in the day that the two URIs denote the same thing:

<!-- c.rdf -->
<rdf:Description rdf:about="http://testcorp.example.com:80/">
  <owl:sameAs rdf:resource="http://testcorp.example.com/"/>
</rdf:Description>


Scenario 2: a loosly situation, but here we have two company descriptions with fc:nasdaqCode properties that have the same value. Unlike with the URI-based scenario, here we know immediately whether the company-descriptions are descriptions of the same company or not. This also puts us in the OWL Full profile of OWL. Note that the situation isn't entirely the same, since we make use of the fact that this fc:nasdaqCode property happens also to be a FunctionalProperty. We can't/don't assume the same of foaf:homepage (since multiple documents can be the homepage of the same entity).


<!-- d.rdf -->
<fc:Company>
  <fc:name>AlphaCorp, Inc.</fc:name>
  <fc:nasdaqCode>XYZA</fc:nasdaqCode>
  <fc:test>a 1st description</fc:test>
</fc:Company>
fc:Company>
  <fc:name>AlphaCorp, Inc.</fc:name>
  <fc:nasdaqCode>XYZA</fc:nasdaqCode>
  <fc:test>a 2nd description</fc:test>
</fc:Company>


This is enough for any system that works with the OWL Full IFP semantics to conclude the following, ie. to merge these two descriptions:

<!-- merged1.rdf -->
<fc:Company>
  <fc:name>AlphaCorp, Inc.</fc:name>
  <fc:nasdaqCode>XYZA</fc:nasdaqCode>
  <fc:test>a 1st description</fc:test>
  <fc:name>AlphaCorp, Inc.</fc:name>          <-- todo: hmm, would this be tidied away? -->
  <fc:test>a 2nd description</fc:test>
</fc:Company>


Scenario 3: what if the descriptions were actually of different companies? How would we know? This is where the difference really hits us...

OK here I should explain a potential confusion/complication. The RDF/OWL vocabulary we're using declares foaf:homepage as an inverse functional property, as well as fc:nasdaqCode as an inverseFunctionalProperty. That means that there is at most one thing that has any given value for said property. We also, please notice, declare fc:nasdaqCode to be a functionalProperty: there is at most one value for that property for any thing. We don't do this for homepage, which breaks the symmetry and prettyness of the argument sketched here.


<!-- e.rdf -->
<fc:Company>
  <fc:name>BetaCorp, Inc.</fc:name>
  <fc:nasdaqCode>XYZB</fc:nasdaqCode>
  <fc:test>a 1st description</fc:test>
</fc:Company>
fc:Company>
  <fc:name>ZetaCorp, Inc.</fc:name>
  <fc:nasdaqCode>XYZZ</fc:nasdaqCode>
  <fc:test>a 2nd description</fc:test>
</fc:Company>


Here we can, using OWL Full, conclude that there are two distinct companies being described here. Since the fc:nasdaqCode values differ, we know that these couldn't be true descriptions of a single company. OWL DL/Lite could conclude that. But OWL DL/Lite would have trouble with the case where the values were the same, since it doesn't acknowledge that text-valued properties can be uniquely identifying. Note that this differs from the situation with URI-valued properties, due to the absence of a unique names assumption in RDF. We can see that by considering a similar example, in which two companies are described solely with reference to their homepages.


<!-- f.rdf -->
<fc:Company>
  <fc:name>BetaCorp, Inc.</fc:name>
  <foaf:homepage rdf:resource="http://betacorp.example.com/"/>
  <fc:test>a 1st description</fc:test>
</fc:Company>
fc:Company>
  <fc:name>ZetaCorp, Inc.</fc:name>
  <foaf:homepage rdf:resource="http://zetacorp.example.com/"/>
  <fc:test>a 2nd description</fc:test>
</fc:Company>


Now is this enough to allow us to know for sure that two companies are described here? Unfortunately not (regardless of OWL Full vs DL vs Lite). The problem is that this situation is, for machines, indistinguishable from scenario 1 (ie. a.rdf through b.rdf) shown above, in which two different URIs are deployed for the selfsame resource. The machine has no way to tell that http://betacorp.example.com/ and http://zetacorp.example.com/ denote different resources, unless we tell it. They could (like http://testcorp.example.com:80/ and http://testcorp.example.com/) simply be two names for the same thing. With a literal-valued property (such as fc:nasdaqCode) this doesn't arise, since different textual values immediately tell us the resources they're attached to are distinct individuals.

Hmmm: (arguing with myself here) this is a bad argument, needs fixing. the point is well motiviated, ie. that the lack of unique name assumption in RDF/OWL interacts with the utility of the semantics of FP and IFP. But even if these were string valued properties, we'd need to have declared foaf:homepage as a functional property to conclude distinctness on basis of distinct values for that property. @@fixme

However, with literal-valued properties it is not possible to state that two individuals with different values are actually one individual, since literals cannot be the subject of an RDF statement, and thus owl:sameAs cannot be used.

This example (@@correct if wrong please!) outlines one reason why RDF datasets and vocabularies might want to draw upon the facilities of OWL Full, as well as suggesting a family of test cases for OWL DL/Lite reasoners. We should be able to check that DL-based systems don't draw conclusions that are unwarrented by the data. (One approach here might be for these systems to have a 'make unique names assumption' facility available explicitly within API or query interfaces?).

Scenario 4: Finally, to make things more directly comparable, imagine that we instead use an ObjectProperty to record the nasdaq company code. So instead of writing <fc:nasdaqCode>XYZB</fc:nasdaqCode> we write <fc:nasdaqCode2 rdf:resource="http://nasdaq.example.org/codes/company#XYZB"/>. Does this change things? The new fc:nasdaqCode2 property is, like our fc:nasdaqCode property, both a FunctionalProperty and an InverseFunctionalProperty.


<!-- g.rdf -->
<fc:Company>
  <fc:name>BetaCorp, Inc.</fc:name>
  <fc:nasdaqCode2 rdf:resource="http://nasdaq.example.org/codes/company#XYZB"/>
  <fc:test>a 1st description</fc:test>
</fc:Company>
fc:Company>
  <fc:name>ZetaCorp, Inc.</fc:name>
  <fc:nasdaqCode2 rdf:resource="http://nasdaq.example.org/codes/company#XYZZ"/>
  <fc:test>a 2nd description</fc:test>
</fc:Company>


Is this a solution? Is RDF vocabulary designs adopt this approach, and use ObjectProperty instead of string-valued properties, will we get the behaviour we want from OWL DL/Lite systems? Sadly not, because the data shown here is consistent with a situation in which http://nasdaq.example.org/codes/company#XYZB and http://nasdaq.example.org/codes/company#XYZZ are two names for the same thing, ie.:


<!-- h.rdf -->
<fc:Company>
  <fc:name>BetaCorp, Inc.</fc:name>
  <fc:nasdaqCode2 rdf:resource="http://nasdaq.example.org/codes/company#XYZB"/>
  <fc:test>a 1st description</fc:test>
</fc:Company>
fc:Company>
  <fc:name>ZetaCorp, Inc.</fc:name>
  <fc:nasdaqCode2 rdf:resource="http://nasdaq.example.org/codes/company#XYZZ"/>
  <fc:test>a 2nd description</fc:test>
</fc:Company>
<rdf:Description rdf:about="http://nasdaq.example.org/codes/company#XYZB">
  <owl:sameAs rdf:resource="http://nasdaq.example.org/codes/company#XYZZ"/>
</rdf:Description>


Which possibility means that these two descriptions could be descriptions of the self-same company. All the time we use textual IFPs (ie. a DatatypeProperty) we are on firmer ground.

NOTE: this discussion may well be in error; it is posted here for discussion/review/correction (see related irc chat...).