A Survey of Current Approaches for Mapping of Relational Databases to RDF

W3C RDB2RDF Incubator Group

Editors

Contributors


Abstract

This document surveys the current state of the art techniques used for conversion of Relational Databases to RDF. The different approaches to map SPARQL queries to SQL are also covered. Some knowledge of RDF and Relational Database technologies is assumed for readers of this document. The survey is intended to enable the members of the RDB2RDF XG to:

  1. Identify common as well as distinct characteristics of transformation approaches
  2. Identify any link between conversion technique vis-a-vis query mapping approach

Status of This Document

This document is currently a living document and work in progress. Major revisions of this document have to be expected. This document is being developed by the W3C RDB2RDF Incubator Group, part of the W3C Incubator Activity.


Index

1. Introduction

In "Relational Databases on the Semantic Web" (Berners-Lee, 1998), the modeling of relationships as first class objects (in RDF) is listed as the significant difference between an entity-relationship (ER) and RDF data models. A vast majority of data underpinning the internet is stored in RDB with their proven track record for scalability, efficient storage, optimized query execution, and maintenance. RDF, on the other hand, is a more expressive data model and data expressed in RDF can be to interpreted, processed and reasoned over by software agents.

In this document, we have surveyed multiple approaches used in different domains to map data in RDB to RDF. One of the primary objectives of our survey is to analyze the “information gain” of an RDB2RDF transformation approach through explicit modeling of relationships between entities that is either implicit or non-existent in the relational data model. The incorporation of domain semantics in a knowledge repository based on the RDF data model is a critical aspect of transforming RDB data to RDF.

Another important aspect that we have evaluated in this survey is the use of RDF for data integration from multiple heterogeneous sources. The representation of data in RDF also enables use of reasoning tools to derive additional knowledge from existing data.

1.1. Related resources

Complementary and related resources can be found at the following pages:

2. Summary of Surveyed Literature

The following summaries of surveyed literature are categorized according to the approaches used in order to give an overview about the various high-level characteristics. A more in-depth analysis covering many various aspects will also be provided.

2.1. Transformation RDB -> RDF

2.1.1. Table to class

2.1.2. Domain Semantics-driven

2.1.3. Automatic ontology-based mapping discovery

2.1.4. Other

2.2. Querying

2.2.1. SPARQL -> SQL

2.3. Other

3. Survey Framework

A reference framework will enable the effective categorization, comprehension and evaluation of the different approaches used to convert and/or map RDB data to RDF. We have used a set of six broad metrics as constituents of our survey framework.

3.1. Components of Survey Framework

In this section we describe each of the six metrics (and their sub-metrics) used in our survey:

  1. Mapping Approach: We can classify the approach used to map relational data to RDF as either automatic conversion of components of ER diagram to RDF components or customized mapping using complex rules (often reflecting domain semantics). (a)The first approach takes advantage of the ER diagram semantics and (in most cases) maps the table name to a class, column name to a predicate. An example of this approach include the Virtuoso RDF View (Blakeley, 2007) that uses the unique identifier of a record (row key) as the RDF object, the column of a table as RDF predicate and the column value as the RDF object. (b) The second approach often makes use of a domain ontology as the reference knowledge model and defines transformation rules to map the relational data to RDF. This approach is an ontology population technique where the transformed data are instances of the ontology schema concepts. The Ordnance Survey (Green et al., 2008) approach uses their hydrology ontology [Ref] as the reference knowledge model to define mapping rules.

  2. Mapping Representation and Access: The mapping algorithm used for conversion of RDB to RDF may be represented in a XSLT stylesheet using XPath rules or in a XML based declarative language such as R2O. The mappings created may have wider applicability hence to encourage reuse the mappings should be accessible in a modular fashion by community members. This is especially true if the mappings incorporate rich domain semantics.

  3. Mapping Implementation: The approaches to convert RDB data to RDF can be broadly classified as either a static Extract Transform Load (ETL) or a query-driven dynamic implementation. The ETL implementation, also called “RDF dump”, uses a batch process to create the RDF repository from RDB. The query-driven approach implements the conversion dynamically in response to a query. There are multiple advantages and disadvantages associated with each of the approaches such as the ETL approach may not reflect the most current data, while the query-driven approach may have performance penalty due to the on-demand conversion.

  4. Query Implementation: The query implementation can be either a direct execution of SPARQL query over a RDF repository or the SPARQL query may be mapped to a SQL query which is subsequently executed over a RDB.

  5. Application Domain: As discussed in “Mapping Approach” section, an important aspect of RDB to RDF mapping is the incorporation of domain semantics. Hence, by identifying the application domain, we may be able identify unique domain-specific and some cross-domain common mapping characteristics.

  6. Data Integration: A primary objective of using RDF data model is the enablement of integration of data from disparate, heterogeneous data sources. Hence, this metric evaluates whether a given approach lead to data integration.

3.2. Comparative View

[Please fill in]

Group/Reference

Mapping Approach

Mapping Representation and Access

Mapping Implementation

Query Implementation

Application Domain

Data Integration

ER_to_RDF
(Table to Class)

or
Domain Semantics-driven

Representation Language

Mapping Storage/Access

Static (ETL)
or
On-Demand

SPARQL -> RDF
or
SPARQL->SQL->RDB

Yes/No

Number of Datasets

1. Virtuoso RDF View (Blakeley, 2007)

Both (user-specified)

SPASQL-based Meta Schema Language

Quad Storage

Both

Both

Horizontal (Agnostic)

Yes

Theoretically unlimited

2. DB2OWL (Cullot et al., 2007)

ER_to_RDF
(Table to Class)

R2O language

R2O mapping document

SPARQL->SQL->RDB

3. D2RQ (Bizer et al., 2007)

Both (user-specified)

D2RQ language

D2RQ mapping file

Both

Both

4. R2O (Barrasa et al., 2006)

Both (user-specified)

R2O language

R2O mapping document

Both

Both

5. (Hu et al., 2007)

6. (Chebotko, 2006)

SPARQL->SQL->RDB
(graph pattern translation)

7. (Kashyap et al., 2007)

Domain Semantics-driven

mapping mediator

On-Demand

SPARQL->SQL->RDB

Life Sciences

Yes

8. (Sahoo et al., 2008)

Domain Semantics-driven

XPath

XSLT document

Static

SPARQL

Life Sciences

Yes

Test included five (Gene, Biological Pathway)

9. Dartgrid (Wu et al., 2006)

ER_to_RDF
(Table to Class)

XML File

Visualized Mapping tool

On-Demand

SPARQL->SQL->RDB
(Provide search and query interface)

Life Science (TCM)

Yes

Test included databases for herb, compound formulas, disease, drug, TCM treatment.

10. (Li et al., 2005)

ER_to_RDF
(Table to Class)

Static

11. Asio Tools

ER_to_RDF
(Table to Class)

OWL Full based language

File based

Both

SPARQL->SQL->RDB

Domain agnostic

Yes

Theoretically unlimited

3.3. RDB2RDF Transformation Approaches

Rdb2RdfXG/StateOfTheArt/Table

3.4. List of Available Tools

New list (many already exist in the cluster above):

  1. D2RQ

  2. Triplify

  3. SquirrelRDF

  4. CODE (no link found, mentioned in 7. Man Li, "A Semi-automatic"...)
  5. DartGrid

  6. Relational.OWL

  7. Protege plugins
  8. Virtuoso SPASQL-based Meta Schema Language: uses SQL & SPARQL language hybridization to declare RDF Views over SQL Tables

  9. The Semantic Discovery System: Provides the functionality to rapidly build solutions for non technical Users to create and execute Ad Hoc queries using the network Graph User Interface (SPARQL to SQL is auto generated). Integrates and interconnects ALL data silo types - providing a virtual Semantic Web interface to all RDBMS's, Web Services, Excel Spreadsheets, and any Hybrid File Systems.

  10. lgraps mentioned in the mailinglist (out of scope)

  11. Protege Excel_Import(out of scope)

  12. R2D2

  13. dbview

  14. Amara

4. References

4.1. Reviewed

(Berners-Lee, 1998)

Relational Databases on the Semantic Web T. Berners-Lee, 1998.

  1. [Barrasa et al., 2006] Barrasa J. Upgrading Legacy Data to the Semantic Web

(Barrasa et al., 2006)

Upgrading relational legacy data to the semantic web (slides) J. Barrasa and A. Gómez-Pérez. In Proc. of 15th international conference on World Wide Web Conference (WWW 2006), pages 1069-1070, Edinburgh, United Kingdom, 23-26 May 2006.

(Bizer et al., 2007)

D2RQ — Lessons Learned C. Bizer and R. Cyganiak. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 25-26 October 2007.

(Blakeley, 2007)

RDF Views of SQL Data (Declarative SQL Schema to RDF Mapping) C. Blakeley, OpenLink Software, 2007.

(Chebotko et al., 2006)

Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns A. Chebotko, S. Lu, H. Jamil and F. Fotouhi. Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, Detroit, USA, May 2006.

(Chen et al., 2006)

Towards a Semantic Web of Relational Databases: A Practical Semantic Toolkit and an In-Use Case from Traditional Chinese Medicine H. Chen and Y. Wang. In Proc. of 5th International Semantic Web Conference (ISWC 2006), pages 750-763, Athens, USA, 5-9 November 2006.

(Cerbah, 2008)
Learning Highly Structured Semantic Repositories from Relational Databases - The RDBToOnto Tool. Proceedings of the 5th European Semantic Web Conference (ESWC 2008), Tenerife, Spain, Accepted - to be published in June, 2008.

(Cullot et al., 2007)

DB2OWL: A Tool for Automatic Database-to-Ontology Mapping N. Cullot, R. Ghawi and K. Yetongnon. In Proc. of 15th Italian Symposium on Advanced Database Systems (SEBD 2007), pages 491-494, Torre Canne, Italy, 17-20 June 2007.

(Green et al., 2008)

Linking Ontologies to Spatial Databases J. Green and C. Dolbear, RDB2RDF XG presentation, 2008.

(Hu et al., 2007)

Discovering Simple Mappings Between Relational Database Schemas and Ontologies W. Hu and Y. Qu. In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225-238, Busan, Korea, 11-15 November 2007.

(Kashyap et al., 2007)

From Web 1.0 -> 3.0: Is RDF access to RDB enough? V. Kashyap and M. Flanagan. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 25-26 October 2007.

(Sahoo et al., 2008)

An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence S. Sahoo, O. Bodenreider, J. Rutter, K. Skinner and A. Sheth. Journal of Biomedical Informatics (Special Issue: Semantic Biomedical Mashups), (in press), 2008.

(Wu et al., 2006)

Dartgrid: a Semantic Web Toolkit for Integrating Heterogeneous Relational Databases Z. Wu, H. Chen, H. Wang, Y. Wang, Y. Mao, J. Tang and C. Zhou. Semantic Web Challenge at 4th International Semantic Web Conference (ISWC 2006), Athens, USA, 5-9 November 2006.

(Li et al., 2005)

Man Li, Xiaoyong Du,Shan Wang, A Semi-automatic Ontology Acquisition Method for the Semantic Web dblp

4.2. To be reviewed

Please review and add more

  1. François Belleaua, Marc-Alexandre Nolina, Nicole Tourignyb, Philippe Rigaulta, Jean Morissettea, Bio2RDF: Towards a mashup to build bioinformatics knowledge systems http://dx.doi.org/10.1016/j.jbi.2008.03.004

  2. Yuan An, Alex Borgida, and John Mylopoulos, Inferring Complex Semantic Mappings between Relational Tables and Ontologies from Simple Correspondences http://www.cs.toronto.edu/~yuana/research/publications/odbase05.paper13.pdf

  3. Justas Trinkunas and Olegas Vasilecas, Building ontologies from relational databases using reverse engineering methods http://ecet.ecs.ru.acad.bg/cst07/Docs/cp/SII/II.6.pdf

  4. Benjamin Habegger,Learning Data-Consistent Mappings from a Relational Database to an Ontology http://www.ceur-ws.org/Vol-200/06.pdf

  5. Martin J. O'Connor, et al., Efficiently Querying Relational Databases Using OWL and SWRL http://bmir.stanford.edu/file_asset/index.php/1163/SMI-2007-1244.pdf

  6. E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth, “OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies,” http://knoesis.wright.edu/library/download/MKSI96.pdf

  7. C. Pérez de Laborda, S. Conrad, "Database to Semantic Web Mapping using RDF Query Languages", http://dbs.cs.uni-duesseldorf.de/~perezdel/pdf/06PeCob.pdf

  8. There are several more Virtuoso whitepapers and presentations relevant to this topic, http://virtuoso.openlinksw.com/Whitepapers/ ... More than just Table-to-Class, Virtuoso maps Tables, Views, SQL Stored Procedures, and other Data Sources to Classes (which may not be made clear by these whitepapers without reference to or familiarity with the documentation). It seems that Virtuoso should be mentioned not only in 2.1.1 "Transformation RDB -> RDF" "Table to class" but also in 2.1.2 "Domain Semantics-driven", 2.1.3 "Automatic ontology-based mapping discovery", 2.2.1 "Querying" "SPARQL -> SQL", and 2.3 "Other", above. Especially note --

    1. Deploying RDF Linked Data via Virtuoso Universal Server

    2. Virtuoso's SQL to RDF Technology Presentation (W3C RDF & DBMS Integration Workshop 10-25-2007

  9. M. Dean. Use of SWRL for Ontology Translation. 2008 Semantic Technology Conference, San Jose, CA, May 2008.
  10. M. Fisher, M. Dean, and G. Joiner. A Tool for Semantic Relational Database Translation using OWL and SWRL. OWL Experiences and Directions 2008 DC, Gaithersburg, MD, April 2008. http://www.webont.org/owled/2008dc/papers/owled2008dc_paper_13.pdf

  11. http://researchweb.watson.ibm.com/journal/sj/402/davidson.pdf

  12. http://db.cis.upenn.edu/K2/K2.doc

4.3. Related Topics

Rdb2RdfXG/StateOfTheArt (last edited 2008-07-14 10:54:55 by WolfgangHalb)