Rdb2RdfXG/RdfsRdbVocabulary

From W3C Wiki

RDFS RDB Vocabulary

This page describes a proposed RDFS vocabulary for the basic relational model.

/!\ Work in progress by Paul Tyson for presentation to the XG on 2008-06-27. Comments welcome! /!\


Requirements

  1. Provide the simplest possible way of representing relational data in RDF.
  2. Separate the problem of applying (or implying) meaning from the problem of transforming data structures.

Design

Define a metamodel of the relational model, expressed in RDF. Use the metamodel vocabulary for late-bound instance graphs of any relational data.

The design is limited and specific in order to support a variety of different pipelines that could be implemented to render data between relational and semantic representations.

No semantic information is added by putting relational data into the metamodel vocabulary. However, once the relational data is put into RDF, it can be ontologically enriched using familiar techniques (e.g., rule-based transformation). So, instead of compounding the extraction and enrichment problems, they can be treated separately. The extraction process can be entirely automated.

Vocabulary definition

A straw man proposal.


@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdb: <http://www.w3.org/xg-rdb2rdf/rdb#> .

rdb:Relation a rdfs:Class . # a database table
rdb:relationName a rdf:Property ; # the table name
       rdfs:domain rdb:Relation .
rdb:header a rdf:Property ;
       rdfs:domain rdb:Relation ;
       rdfs:range rdb:RelationHeader .
rdb:body a rdf:Property ;
       rdfs:domain rdb:Relation ;
       rdfs:range rdb:RelationBody .
rdb:RelationHeader a rdfs:Bag . # of TypeDefinitions
rdb:TypeDefinition a rdfs:Class .
rdb:typeName a rdf:Property ; # of a TypeDefinition or TypedValue
       rdfs:range rdb:TypeDefinition .
rdb:underlyingType a rdf:Property ; # of a TypeDefinition
       rdfs:range rdb:TypeDefinition .
rdb:primaryKey a rdf:Property ; # boolean
       rdfs:range rdb:TypeDefinition .
rdb:foreignKey a rdf:Property ; # reference to foreign typeName
       rdfs:range rdb:TypeDefinition .
rdb:RelationBody a rdfs:Bag . # of Tuples
rdb:Tuple a rdfs:Bag . # of TypedValues
rdb:TypedValue a rdfs:Class .
rdb:value a rdf:Property ; # of a TypedValue
       rdfs:range rdb:TypedValue .
rdb:type a rdf:Property ; # of a TypedValue
       rdfs:range rdb:TypedValue .


Several different formulations of the metamodel can be imagined, with varying degrees of coverage and expressiveness.

For example, here is a rough approximation of the "neutral database model" used as a default in the Rdb2Onto system:


@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbm: <http://www.tao-project.eu/RDBToOnto/dbm#> .

dbm:DBSchema a rdfs:Class .
dbm:schemaTableDefs a rdf:property ;
	rdfs:domain dbm:DBSchema ;
        rdfs:range dbm:TableDef .
dbm:TableDef a rdfs:Class .
dbm:Key a rdfs:Class .
dbm:PrimaryKey rdfs:subclassOf dbm:Key .
dbm:ForeignKey rdfs:subclassOf dbm:Key .
dbm:primaryKey a rdf:Property ;
	rdfs:domain dbm:TableDef ;
	rdfs:range dbm:PrimaryKey .
dbm:foreignKey a rdf:Property ;
	rdfs:domain dbm:TableDef ;
	rdfs:range dbm:ForeignKey .
dbm:Database a rdfs:Class .
dbm:Attribute a rdfs:Class .
dbm:Table a rdfs:Class .
dbm:Column a rdfs:Class .
dbm:hasAttribute a rdf:Property ;
	rdfs:domain dbm:Column ;
	rdfs:domain dbm:Key ;
	rdfs:range dbm:Attribute .
dbm:tableDefinition a rdf:Property ;
	rdfs:domain dbm:Table ;
	rdfs:range dbm:TableDef .


Another example, addressing different concerns, from http://www.cs.utexas.edu/~jsequeda/pub/sql2sw.pdf (again, this is a rough approximation--any misinterpretations are my fault - -- PaulTyson DateTime(2008-06-26T02:13:32Z)):


@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix jsrdb: <http://www.cs.utexas.edu/~jsequeda/rdb#> .

jsrdb:Relation a rdfs:Class .
jsrdb:Attribute a rdfs:Class .
jsrdb:attr a rdf:Property ;
	rdfs:range jsrdb:Relation ;
	rdfs:domain jsrdb:Attribute .
jsrdb:nn a rdf:Property . #non-null attribute in a relation
jsrdb:unq a rdf:Property .#unique attribute value in a relation
jsrdb:chk a rdf:Property .# attribute in relation has CHECK IN constraint
jsrdb:pk a rdf:Property ; # primary key
	rdfs:range jsrdb:Relation ;
	rdfs:domain jsrdb:Attribute .
jsrdb:fk a rdf:Property ; # foreign key
	rdfs:range jsrdb:Relation .
jsrdb:nonFk a rdf:Property ; #non-foreign key
	rdfs:range jsrdb:Relation ;
	rdfs:domain jsrdb:Attribute .


These vocabularies illustrate overlapping but different approaches to the problem of describing the relational model in RDFS. All of the different concerns could be addressed with a standardized RDFS vocabulary.

From this brief survey, several questions arise which would have to be worked out in committee. These questions might include:

  1. Do you use rdfs:Bag as the supertype for classes such as rdb:[[RelationBody]] and rdb:Tuple? Formally they are sets. But they could easily be represented using multiple predicate statements without specifying any collection class.
  2. Typing will be an interesting problem. Strictly speaking, every database column is a distinct "type", and it seems that these distinctions should be preserved in RDF even if they are not always honored by RDBM systems. So rdb:[[TypeDefinition]] is an SQL column specification. The SQL typing mechanism is here represented by rdb:underlyingType, which could in practice be an XML Schema datatype as in the example below, or an SQL or application-specific type.
  3. The concept of rdb:Tuple is implicit in SQL, probably because it is not grammatically necessary. Structurally, however, it seems to be a necessary component of the metamodel.
  4. How much of SQL do you include? Very little is necessary to adequately represent the data itself, but if you want to duplicate SQL functionality in the semantic web then it would be good to have more of it. (Entirely apart from data representation, I can see the possibility of applying semantic tools to RDF/OWL representations of SQL DDL.)

Example

A database table (from Date, Database In Depth):

---------------------------------
| SNO | SNAME | STATUS |  CITY  |
|-----|-------|--------|--------|
| S1  | Smith | 20     | London |
| S2  | Jones | 10     | Paris  |
| S3  | Blake | 30     | Paris  |
| S4  | Clark | 20     | London |
| S5  | Adams | 30     | Athens |
---------------------------------


Put into rdb RDF/XML, the header and first two records would look like:


<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 xmlns:rdb="http://www.w3.org/xg-rdb2rdf/rdb#"
 xmlns:ex="http://sample/db#">
  <rdb:Relation>
    <rdb:relationName>Sample_DB</rdb:relationName>
    <rdb:header>
      <rdb:RelationHeader>
         <rdfs:member>
            <rdb:TypeDefinition rdf:about="http://sample/db#SNO">
               <rdb:typeName>SNO</rdb:typeName>
               <rdb:underlyingType 
                  rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>
            </rdb:TypeDefinition>
         </rdfs:member>
         <rdfs:member>
            <rdb:TypeDefinition  rdf:about="http://sample/db#SNAME">
               <rdb:typeName>SNAME</rdb:typeName>
               <rdb:underlyingType 
                  rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
            </rdb:TypeDefinition>
         </rdfs:member>
         <rdfs:member>
            <rdb:TypeDefinition  rdf:about="http://sample/db#CITY">
               <rdb:typeName>CITY</rdb:typeName>
               <rdb:underlyingType 
                  rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
            </rdb:TypeDefinition>
         </rdfs:member>
         <rdfs:member>
            <rdb:TypeDefinition  rdf:about="http://sample/db#STATUS">
               <rdb:typeName>STATUS</rdb:typeName>
               <rdb:underlyingType 
                  rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>
            </rdb:TypeDefinition>
         </rdfs:member>
      </rdb:RelationHeader>
    </rdb:header>
    <rdb:body>
      <rdb:RelationBody>
        <rdfs:member>
          <rdb:Tuple>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#SNO"/>
                <rdb:value>S1</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#SNAME"/>
                <rdb:value>Smith</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#STATUS"/>
                <rdb:value>20</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#CITY"/>
                <rdb:value>London</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
          </rdb:Tuple>
        </rdfs:member>
        <rdfs:member>
          <rdb:Tuple>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#SNO"/>
                <rdb:value>S2</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#SNAME"/>
                <rdb:value>Jones</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#STATUS"/>
                <rdb:value>10</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
            <rdfs:member>
              <rdb:TypedValue>
                <rdb:type rdf:resource="http://sample/db#CITY"/>
                <rdb:value>Paris</rdb:value>
              </rdb:TypedValue>
            </rdfs:member>
          </rdb:Tuple>
        </rdfs:member>
      </rdb:RelationBody>
    </rdb:body>
  </rdb:Relation>
</rdf:RDF>


Problems

  1. What relational model to use?
  2. Obviously, there is no easy way to enforce well-formedness constraints such as Tuples always having the same number and types of members as the [[RelationHeader]].
  3. Could we do without a Relation class by equating tables with named graphs?

Extensions

It might be useful to have a standardized mapping from the late-bound metamodel vocabulary to an equivalent early-bound vocabulary that uses the table and column names given in the relational source. This still doesn't add any information, but it might make it easier to specify graph transformations for ontological enrichment.

I conjecture that by using this type of standardized RDFS vocabulary, and some conventions for writing table and column names, you could automatically and bidirectionally translate a large class of useful SPARQL and SQL queries.

Alternatives

The Common Logic metamodel might be a useful alternative, since the relational model is built on common logic. This idea should be developed on a page of its own.