Browser-based SQL2OWL Mapping Tool

This page is intended to record the design issues of a browser-based SQL-OWL mapping tool. The tool will be based on the DartGrid mapping system and virtuoso's meta schema language, and take use of AJAX-based toolkits such as OpenLink Ajax Toolkit. This new mapping tool will cover more complex mapping cases and will be tested against by the life science demo use cases and the THALIA Testbed.

This is a collaborative effort between DartGrid and Virtuoso. People who have contributed to or are interested include:

... Please feel free to add your name if you are interested or want to contribute.

Introduction

Motivation

With the ever growing need of semantically linking data across different domain, organizational, cultual, displinary boundaries using RDF and OWL, one of the major obstacles impeding the advancement of the Semantic Web is the data availability. Where should the linked data come from? How could we obtain semantically well-defined structured data? Three major sources or methods are under taken commonly. The third one is more promising as most of those hot websites have structured data hidden in their backend (often referred as deep web data also), but tightly bound to and inextricably curated by specific client applications.

This page is intended to identify and record the SQL2OWL mapping issues, integration difficulties commonly occuring in real-life applications. A set of mapping cases will be identified as best practices based on two applications: the life science demo use cases and the THALIA Testbed.

Test Dataset

THALIA

The tested relational databases are taken from the THALIA Testbed benchmark. It provides a collection of 40 relational database tables representing university course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities.

The test ontology can be found at THALIA ontology. It has resued several public ontologies such as FOAF, Dublin Core, SKOS, TIME ontology, etc..

Life science

The tested data will be taken from typical life science data sources, which have public available SQL versions. The test ontology will draw on the W3C Banff life scienece demo.

About the representation and notation

We utilize three types of representations from different perspectives.

   R(?a,?b,?c,?d, ..., ...)-: (uri<?a>, rdf:type, thalia:Course),
                              (uri<?a>, foaf:name,?b),
                              (uri<?a>, thalia:hasInstructor, uri<?2>),
                              (uri<?2>, rdf:type, thalia:Instructor),
                              .......

Mapping Cases

THALIA mapping cases

Case 1 Arizona State University

The simplest situation, as illustrated in Fig.1, is the one when a single table exactly maps to one single class in the ontology. The asu table stores CS course information of the Arizona State University.

Description as a rule:

asu(?Title,?Description,?MoreInfoURL) :- (uri<?>, rdf:type, thalia:Course), 
                                        (uri<?>, dc:title, ?Title), 
                                        (uri<?>, dc:description, ?Description), 
                                        (uri<?>, rdfs:seeAlso, ?MoreInfoURL),
                                        (uri<?>, thalia:forUniversity, "http://purl.org/thalia/university/asu"),
                                        (uri<?>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").

Description in virtuoso meta-schema language:


Remained issue: Instance-level mapping, how to mapping to course subject information?

Case 2 Brown University

Fig.2 illustrates the semantic mappings from brown table to the THALIA testbed Ontology. The brown table stores CS course information of the Brown University. This use case illustrates the mapping from one table to four ontological classes: Course, Instructor, Event, Point.

Rule:

brown(?Code,?Title,?Instructor,?Room) :- 
                          (uri<?Code>, rdf:type, thalia:Course),
                          (uri<?Code>, dc:title, ?Title),
                          (uri<?Code>, thalia:hasInstructor uri<?1>),
                          (uri<?1>, dc:homepage, ?Instructor),
                          (uri<?Code>, thalia:hasLecture, uri<?2>),
                          (uri<?2>, rdf:type, event:Event),
                          (uri<?2>, event:place, uri<?3>),
                          (uri<?3>, rdf:type, geo:Point),
                          (uri<?3>, dc:title, ?Room),
                          (uri<?Code>, thalia:forUniversity, "http://purl.org/thalia/university/brown"),
                          (uri<?Code>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").

MDL

Case 3 CMU University

Fig.3 illustrates the semantic mappings from cmu table to the THALIA testbed Ontology. The cum table stores CS course information of the CMU University.

cmu(?Code,?Sec,?CourseXListed,?CourseTitle,?Lecturer,?Room,?Day,?Time,?Units):-
                                       (uri<?Code>, rdf:type, thalia:Course),
                                       (uri<?Code>, dc:title, ?CourseTitle),
                                       (uri<?Code>, dc:hasInstructor, uri<?1>),
                                       (uri<?1>, foaf:name, ?Lecturer),
                                       (uri<?Code>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:time, uri<?3>),
                                       (uri<?3>, rdf:type, time:Interval),
                                       (uri<?3>, time:inDateTime, uri<?4>),
                                       (uri<?4>, rdf:type, time:DateTiemDescription),
                                       (uri<?4>, time:dayOfWeek, ?Day),
                                       (uri<?4>, time:hour, ?Time),
                                       (uri<?2>, event:place, uri<?5>),
                                       (uri<?5>, rdf:type,geo:Point),
                                       (uri<?5>, dc:title, ?Room).

MDL Description.

Unresolved issues:

Case 5 Georgia Tech University.

gatech(?Department,?Code,?Section,?Mode_,?CRN,?Title,?Hours,?In_,?Max_,?Days,?Time_,?Instructor,?Room,?Building,?Description) -: 
                                       (uri<?Code>, rdf:type, thalia:Course),
                                       (uri<?Code>, dc:title, ?Title),
                                       (uri<?Code>, dc:hasInstructor, uri<?1>),
                                       (uri<?Code>, dc:description, ?Description), 
                                       (uri<?1>, rdf:type, thalia:Instructor),
                                       (uri<?1>, foaf:name, ?Instructor),
                                       (uri<?Code>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:time, uri<?3>),
                                       (uri<?3>, rdf:type, time:Interval),
                                       (uri<?3>, time:inDateTime, uri<?4>),
                                       (uri<?4>, rdf:type, time:DateTiemDescription),
                                       (uri<?4>, time:dayOfWeek, ?Days),
                                       (uri<?4>, time:hour, ?Time_),
                                       (uri<?2>, event:place, uri<?5>),
                                       (uri<?5>, rdf:type,geo:Point),
                                       (uri<?5>, dc:title, ?Room+?Building),
                                       (uri<?Code>, thalia:forUniversity, "http://purl.org/thalia/university/gatech"),
                                       (uri<?Code>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").

MDL Description

Remained issues: how to map ?room + ?Building to a geo place? Combing them?

==== Case 6 Toronto University.====

toronto (?No_,?level_,?offeredTerm,?title,?instructorEmail,?instructorName,?location,?coursewebsite,?prereq,?text_):-
                                       (uri<?No_>, rdf:type, thalia:Course),
                                       (uri<?No_>, dc:title, ?title),
                                       (uri<?No_>, dc:description, ?Description), 
                                       (uri<?No_>, dc:hasInstructor, uri<?1>),
                                       (uri<?No_>, rdfs:seeAlso,?coursewebsite),
                                       (uri<?No_>, thalia:hasPrerequisite, ?prereq),
                                       (uri<?1>, rdf:type, thalia:Instructor),
                                       (uri<?1>, foaf:name, ?instructorName),
                                       (uri<?1>, foaf:mbox, ,?instructorEmail),
                                       (uri<?No_>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:place, uri<?5>),
                                       (uri<?5>, rdf:type,geo:Point),
                                       (uri<?5>, dc:title, ?location),
                                       (uri<?No_>, thalia:forUniversity, "http://purl.org/thalia/university/ucsd"),
                                       (uri<?No_>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").
     

MDL Description.

Case 7 UCSD

ucsd(?Number,?Title,?Fall2003,?Winter2004,?Spring2004):-
                                       (uri<?Number>, rdf:type, thalia:Course),
                                       (uri<?Number>, dc:title, ?Title),
                                       (uri<?Number>, dc:hasInstructor, uri<?1>),
                                       (uri<?1>, rdf:type, thalia:Instructor),
                                       (uri<?1>, foaf:name, ?Fall2003),
                                       (uri<?Number>, dc:hasInstructor, uri<?2>),
                                       (uri<?2>, rdf:type, thalia:Instructor),
                                       (uri<?2>, foaf:name, ?Winter2004),
                                       (uri<?Number>, dc:hasInstructor, uri<?3>),
                                       (uri<?3>, rdf:type, thalia:Instructor),
                                       (uri<?3>, foaf:name, ?Spring2004),
                                       (uri<?Number>, thalia:forUniversity, "http://purl.org/thalia/university/gatech"),
                                       (uri<?Number>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").

Remained issues: how to represent some teacher teaches one couse in one specific semester.

Case 8

Using the ?Code as the input to generate URI is important for this use case, because it help merge duplicate records in the databases.

 umd(?Code,?CourseName,?Credits,?GradeMethod,?SectionTitle,?SectionTime):-
                                       (uri<?Code>, rdf:type, thalia:Course),
                                       (uri<?Code>, dc:title, ?CourseName),
                                       (uri<?Code>, thalia:hasSection, ?SectionTitle),
                                       (uri<?Code>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:time, uri<?3>),
                                       (uri<?3>, rdf:type, time:Interval),
                                       (uri<?3>, time:inDateTime, uri<?4>),
                                       (uri<?4>, rdf:type, time:DateTiemDescription),
                                       (uri<?4>, time:dayOfWeek, ?Days),
                                       (uri<?4>, time:hour, ?SectionTime_),
                                       (uri<?Code>, thalia:forUniversity, "http://purl.org/thalia/university/umd"),
                                       (uri<?Code>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").
                             

Case 9

student(?id,?name,?cid)-: 
            (uri<?id>, rdf:type, thalia:Student),
            (uri<?id>, foaf:name, ?name),
            (uri<?id>, thalia:select, ?uri<cid>).

course(?id,?name)-: 
            (uri<?id>, rdf:type, thalia:Course),
            (uri<?id>, dc:title, ?name).      
    

Case 10 : Self-join

student(?id,?name,...):- 
                 (uri<?id>, rdf:type, thalia:Student),
                 (uri<?id>, foaf:name, ?name).

friends(?id1,?id2):- (uri<?id1>, rdf:type, thalia:Student),
                     (uri<?id1>, foaf:friend, uri<?id2>),
                     (uri<?id2>, rdf:type, thalia:Student).
                 

Life Science Mapping Case

Case 1

The normal mapping systems often separately consider the schema-level mapping and instance-level mapping. However, in practical application, it is not always straiforward to distinguish these two types of data. Take the following mapping case as an example. Inhibit should be considered as a property (schema) in OWL, but is stored in a relational tuple(instance). OWL has more advance features for modeling semantics and usually requires more carefully and precisely distinguish concept data (class and properties) from individual data (instance). However, in relational database, many data that are actually schema-level information are stored in relational records instead of in the data dictionaries.

Fig. 1 The Senselab mapping case. Courtesy by Kei Cheung from Yale.

2. Complex Test Case 2

Design Issues

See Also

==== Some log about the mapping issue =====

SQL2OWL (last edited 2007-07-06 20:09:54 by HuajunChen)