A Survey of Current Approaches for Mapping of Relational Databases to RDF
Editors
Contributors
Abstract
This document surveys the current state of the art techniques used for conversion of Relational Databases to RDF. The different approaches to map SPARQL queries to SQL are also covered. Some knowledge of RDF and Relational Database technologies is assumed for readers of this document. The survey is intended to enable the members of the RDB2RDF XG to:
- Identify common as well as distinct characteristics of transformation approaches
- Identify any link between conversion technique vis-a-vis query mapping approach
Status of This Document
This document is currently a living document and work in progress. Major revisions of this document have to be expected. This document is being developed by the W3C RDB2RDF Incubator Group, part of the W3C Incubator Activity.
Index
1. Introduction
In "Relational Databases on the Semantic Web" (Berners-Lee, 1998), the modeling of relationships as first class objects (in RDF) is listed as the significant difference between an entity-relationship (ER) and RDF data models. A vast majority of data underpinning the internet is stored in RDB with their proven track record for scalability, efficient storage, optimized query execution, and maintenance. RDF, on the other hand, is a more expressive data model and data expressed in RDF can be to interpreted, processed and reasoned over by software agents.
In this document, we have surveyed multiple approaches used in different domains to map data in RDB to RDF. One of the primary objectives of our survey is to analyze the “information gain” of an RDB2RDF transformation approach through explicit modeling of relationships between entities that is either implicit or non-existent in the relational data model. The incorporation of domain semantics in a knowledge repository based on the RDF data model is a critical aspect of transforming RDB data to RDF.
Another important aspect that we have evaluated in this survey is the use of RDF for data integration from multiple heterogeneous sources. The representation of data in RDF also enables use of reasoning tools to derive additional knowledge from existing data.
1.1. Related resources
Complementary and related resources can be found at the following pages:
W3C Workshop on RDF Access to Relational Databases 25 to 26 October 2007 Cambridge, MA, USA.
Accepted papers at the workshop
2. Summary of Surveyed Literature
The following summaries of surveyed literature are categorized according to the approaches used in order to give an overview about the various high-level characteristics. A more in-depth analysis covering many various aspects will also be provided.
2.1. Transformation RDB -> RDF
2.1.1. Table to class
Virtuoso RDF Views (Blakeley, 2007): The Virtuoso RDF View uses the Table-to-Class (RDFS class) and Column-as-Predicate approach, and takes into consideration special cases such as whether a column is part of either primary foreign key. The foreign key relationship between tables is made explicit between the relevant classes representing the tables. The RDB data are represented as virtual RDF graphs without physical creation of RDF datasets. The Virtuoso RDF views are composed of “quad map patterns” that define the mapping from a set of RDB columns to triples. The quad map pattern is represented in Virtuoso's SPASQL-based meta schema language, which also supports SPARQL-style notations.
DB2OWL (Cullot et al., 2007): DB2OWL uses the Table to Class and Column to Predicate approach but uses specific relational database schema characteristics, that is how tables relate to each other, to assert subclass and other object properties. The object properties represent many-to-many relationships and referential integrity. The mappings are stored in a R2O document.
RDBToOnto (Cerbah, 2008): is a highly configurable tool that ease the design and implementation of methods for ontology acquisition from relational databases. It is also a user-oriented tool that supports the complete transitioning process from the access to the input databases to generation of populated ontologies. The settings of the learning parameters and control of the process are performed through a full-fledged dedicated interface.
A Semi-automatic Ontology Acquisition... (Li et al., 2005): This work uses the Table to Class and Column to Predicate approach to create an initial ontology schema which is then refined by referring to a dictionary or thesauri (for example, WordNet). Constraints in the relational model are mapped to constraints in the ontology schema. For example, "NOT NULL" and "UNIQUE" are mapped to cardinality constraints on relevant properties. If a given set of relations are mapped to an ontology concept, the corresponding tuples of the relations are transformed as instances of the ontology concept.
Asio Semantic Bridge for Relational Database and Automapper : Asio Semantic Bridge for Relational Databases (SBRD) and Automapper use the ER_to_RDF (Table to Class) approach. Automapper generates an OWL full ontology from a relational database. In the generated ontology, each class corresponds to a table in the relational database and columns are represented as properties of the relevant class. A primary key column has cardinality set to 1. A nullable column has max cardinality set to 1. For a foreign key, an object property is created and its range is set to the corresponding class. The generated ontology includes SWRL rules to equate individuals based on multiple primary key columns. Semantic Bridge for Relational Databases provides an RDF view of data in the relational database. SPARQL queries can be written in terms of the Automapper generated data source ontology and relational data is returned as RDF. SBRD rewrites the SPARQL query to SQL, executes the SQL and converts SQL rows to RDF conforming to the data source ontology.
2.1.2. Domain Semantics-driven
D2RQ (Bizer et al., 2007): D2RQ provides an integrated environment with multiple options to access relational data including “RDF dumps”, Jena and Sesame API based access (API calls are rewritten to SQL), and SPARQL endpoints on D2RQ Server. The mappings may be defined by the user thereby allowing the incorporation of domain semantics in the mapping process, though there are some limitations to this as described in the Ordnance Survey presentation (Green et al., 2008). The mappings are expressed in a “declarative mapping language”. The performance varies depending on the access method and is reported to perform reasonably well for basic triple patterns but suffers when SPARQL language features such as FILTER, LIMIT are used.
Asio Semantic Query Decomposition: Semantic Query Decomposition (SQD) takes a SPARQL query in a domain ontology, uses SWRL mappings from the data source ontologies to the domain ontology to generate SPARQL queries for each applicable data source, executes those queries through the Asio Semantic Bridges (Relational Database, Web Service (SOAP and REST) and SPARQL Endpoint), and translates the results into the domain ontology.
2.1.3. Automatic ontology-based mapping discovery
Discovering Simple Mappings Between Relational Database Schemas and Ontologies (Hu et al., 2007): This approach aims at the automatic creation of simple mappings between relational database schemas and ontologies. Based on the relational schema and an ontology initial simple mappings are derived and then checked for consistency. Based on sample input (mappings and instances for both the relational schema and the ontology) more contextual mappings are constructed. Experimental results in a limited domain showed the feasibility of the approach.
2.1.4. Other
R2O (Barrasa et al., 2006): R2O is a XML based declarative language to express the mappings between a RDB elements and an ontology. R2O mappings can be used to “detect inconsistencies and ambiguities” in mapping definitions. The ODEMapster engine uses a R2O document to either execute the transformation in response to a query or in a batch mode to create a RDF dump.
2.2. Querying
2.2.1. SPARQL -> SQL
Semantic Preserving SPARQL-to-SQL (Chebotko et al., 2006): This work discusses the use of a graph pattern translation approach called BGPtoSQL to transform SPARQL query to SQL query while preserving the semantics of the SPARQL query. The algorithm treats the SPARQL query as a directed graph and replaces each blank node with a unique variable.
2.3. Other
From Web 1.0 -> Web 3.0... (Kashyap et al., 2007): This work proposes a mediator based approach to represent mappings from ontological concepts to disparate data sources as part of a general framework for RDF based access to heterogeneous data sources. The heterogeneous data sources, illustrated using a life sciences domain scenario, include RDB, Web services and Excel sheets (using MS Office API). The SPARQL queries are automatically translated to the appropriate query language using the mappings represented by the mediator classes.
An ontology-driven semantic mash-up... (Sahoo et al., 2008): This is a life sciences application focused work that incorporates domain semantics (from multiple, integrated ontologies) to create the mappings (represented XPath rules in XSLT stylesheet) from RDB to RDF. A RDF dump is created using a batch approach and stored in Oracle 11g. SPARQL query language is used to query the RDF repository.
Dartgrid: a Semantic Web Toolkit... (Wu et al., 2006) and Towards a Semantic Web of Relational Databases... (Chen et al., 2006): Dartgrid is a Semantic Web toolkit that offers tools for the mapping und querying of RDF generated from RDB. The mapping is basically a manual table to class mapping where the user is provided with a visual tool to define the mappings. The mappings are then stored and used for the conversion. The construction of SPARQL queries is assisted by the visual tool and the queries are translated to SQL queries based on the previously defined mappings. A full-text search is also provided.
3. Survey Framework
A reference framework will enable the effective categorization, comprehension and evaluation of the different approaches used to convert and/or map RDB data to RDF. We have used a set of six broad metrics as constituents of our survey framework.
3.1. Components of Survey Framework
In this section we describe each of the six metrics (and their sub-metrics) used in our survey:
Mapping Approach: We can classify the approach used to map relational data to RDF as either automatic conversion of components of ER diagram to RDF components or customized mapping using complex rules (often reflecting domain semantics). (a)The first approach takes advantage of the ER diagram semantics and (in most cases) maps the table name to a class, column name to a predicate. An example of this approach include the Virtuoso RDF View (Blakeley, 2007) that uses the unique identifier of a record (row key) as the RDF object, the column of a table as RDF predicate and the column value as the RDF object. (b) The second approach often makes use of a domain ontology as the reference knowledge model and defines transformation rules to map the relational data to RDF. This approach is an ontology population technique where the transformed data are instances of the ontology schema concepts. The Ordnance Survey (Green et al., 2008) approach uses their hydrology ontology [Ref] as the reference knowledge model to define mapping rules.
Mapping Representation and Access: The mapping algorithm used for conversion of RDB to RDF may be represented in a XSLT stylesheet using XPath rules or in a XML based declarative language such as R2O. The mappings created may have wider applicability hence to encourage reuse the mappings should be accessible in a modular fashion by community members. This is especially true if the mappings incorporate rich domain semantics.
Mapping Implementation: The approaches to convert RDB data to RDF can be broadly classified as either a static Extract Transform Load (ETL) or a query-driven dynamic implementation. The ETL implementation, also called “RDF dump”, uses a batch process to create the RDF repository from RDB. The query-driven approach implements the conversion dynamically in response to a query. There are multiple advantages and disadvantages associated with each of the approaches such as the ETL approach may not reflect the most current data, while the query-driven approach may have performance penalty due to the on-demand conversion.
Query Implementation: The query implementation can be either a direct execution of SPARQL query over a RDF repository or the SPARQL query may be mapped to a SQL query which is subsequently executed over a RDB.
Application Domain: As discussed in “Mapping Approach” section, an important aspect of RDB to RDF mapping is the incorporation of domain semantics. Hence, by identifying the application domain, we may be able identify unique domain-specific and some cross-domain common mapping characteristics.
Data Integration: A primary objective of using RDF data model is the enablement of integration of data from disparate, heterogeneous data sources. Hence, this metric evaluates whether a given approach lead to data integration.
3.2. Comparative View
[Please fill in]
Group/Reference |
Mapping Approach |
Mapping Representation and Access |
Mapping Implementation |
Query Implementation |
Application Domain |
Data Integration |
||
|
ER_to_RDF |
Representation Language |
Mapping Storage/Access |
Static (ETL) |
SPARQL -> RDF |
|
Yes/No |
Number of Datasets |
1. Virtuoso RDF View (Blakeley, 2007) |
Both (user-specified) |
SPASQL-based Meta Schema Language |
Quad Storage |
Both |
Both |
Horizontal (Agnostic) |
Yes |
Theoretically unlimited |
2. DB2OWL (Cullot et al., 2007) |
ER_to_RDF |
R2O language |
R2O mapping document |
|
SPARQL->SQL->RDB |
|
|
|
3. D2RQ (Bizer et al., 2007) |
Both (user-specified) |
D2RQ language |
D2RQ mapping file |
Both |
Both |
|
|
|
4. R2O (Barrasa et al., 2006) |
Both (user-specified) |
R2O language |
R2O mapping document |
Both |
Both |
|
|
|
5. (Hu et al., 2007) |
|
|
|
|
|
|
|
|
6. (Chebotko, 2006) |
|
|
|
|
SPARQL->SQL->RDB |
|
|
|
7. (Kashyap et al., 2007) |
Domain Semantics-driven |
|
mapping mediator |
On-Demand |
SPARQL->SQL->RDB |
Life Sciences |
Yes |
|
8. (Sahoo et al., 2008) |
Domain Semantics-driven |
XPath |
XSLT document |
Static |
SPARQL |
Life Sciences |
Yes |
Test included five (Gene, Biological Pathway) |
9. Dartgrid (Wu et al., 2006) |
ER_to_RDF |
XML File |
Visualized Mapping tool |
On-Demand |
SPARQL->SQL->RDB |
Life Science (TCM) |
Yes |
Test included databases for herb, compound formulas, disease, drug, TCM treatment. |
10. (Li et al., 2005) |
ER_to_RDF |
|
|
Static |
|
|
|
|
11. Asio Tools |
ER_to_RDF |
OWL Full based language |
File based |
Both |
SPARQL->SQL->RDB |
Domain agnostic |
Yes |
Theoretically unlimited |
3.3. RDB2RDF Transformation Approaches
3.4. List of Available Tools
New list (many already exist in the cluster above):
- CODE (no link found, mentioned in 7. Man Li, "A Semi-automatic"...)
- Protege plugins
Virtuoso SPASQL-based Meta Schema Language: uses SQL & SPARQL language hybridization to declare RDF Views over SQL Tables
The Semantic Discovery System: Provides the functionality to rapidly build solutions for non technical Users to create and execute Ad Hoc queries using the network Graph User Interface (SPARQL to SQL is auto generated). Integrates and interconnects ALL data silo types - providing a virtual Semantic Web interface to all RDBMS's, Web Services, Excel Spreadsheets, and any Hybrid File Systems.
lgraps mentioned in the mailinglist (out of scope)
Protege Excel_Import(out of scope)
4. References
4.1. Reviewed
- (Berners-Lee, 1998)
Relational Databases on the Semantic Web T. Berners-Lee, 1998.
[Barrasa et al., 2006] Barrasa J. Upgrading Legacy Data to the Semantic Web
- (Barrasa et al., 2006)
Upgrading relational legacy data to the semantic web (slides) J. Barrasa and A. Gómez-Pérez. In Proc. of 15th international conference on World Wide Web Conference (WWW 2006), pages 1069-1070, Edinburgh, United Kingdom, 23-26 May 2006.
- (Bizer et al., 2007)
D2RQ — Lessons Learned C. Bizer and R. Cyganiak. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 25-26 October 2007.
- (Blakeley, 2007)
RDF Views of SQL Data (Declarative SQL Schema to RDF Mapping) C. Blakeley, OpenLink Software, 2007.
- (Chebotko et al., 2006)
Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns A. Chebotko, S. Lu, H. Jamil and F. Fotouhi. Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, Detroit, USA, May 2006.
- (Chen et al., 2006)
Towards a Semantic Web of Relational Databases: A Practical Semantic Toolkit and an In-Use Case from Traditional Chinese Medicine H. Chen and Y. Wang. In Proc. of 5th International Semantic Web Conference (ISWC 2006), pages 750-763, Athens, USA, 5-9 November 2006.
- (Cerbah, 2008)
- Learning Highly Structured Semantic Repositories from Relational Databases - The RDBToOnto Tool. Proceedings of the 5th European Semantic Web Conference (ESWC 2008), Tenerife, Spain, Accepted - to be published in June, 2008.
- (Cullot et al., 2007)
DB2OWL: A Tool for Automatic Database-to-Ontology Mapping N. Cullot, R. Ghawi and K. Yetongnon. In Proc. of 15th Italian Symposium on Advanced Database Systems (SEBD 2007), pages 491-494, Torre Canne, Italy, 17-20 June 2007.
- (Green et al., 2008)
Linking Ontologies to Spatial Databases J. Green and C. Dolbear, RDB2RDF XG presentation, 2008.
- (Hu et al., 2007)
Discovering Simple Mappings Between Relational Database Schemas and Ontologies W. Hu and Y. Qu. In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225-238, Busan, Korea, 11-15 November 2007.
- (Kashyap et al., 2007)
From Web 1.0 -> 3.0: Is RDF access to RDB enough? V. Kashyap and M. Flanagan. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 25-26 October 2007.
- (Sahoo et al., 2008)
An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence S. Sahoo, O. Bodenreider, J. Rutter, K. Skinner and A. Sheth. Journal of Biomedical Informatics (Special Issue: Semantic Biomedical Mashups), (in press), 2008.
- (Wu et al., 2006)
Dartgrid: a Semantic Web Toolkit for Integrating Heterogeneous Relational Databases Z. Wu, H. Chen, H. Wang, Y. Wang, Y. Mao, J. Tang and C. Zhou. Semantic Web Challenge at 4th International Semantic Web Conference (ISWC 2006), Athens, USA, 5-9 November 2006.
- (Li et al., 2005)
Man Li, Xiaoyong Du,Shan Wang, A Semi-automatic Ontology Acquisition Method for the Semantic Web dblp
4.2. To be reviewed
Please review and add more
François Belleaua, Marc-Alexandre Nolina, Nicole Tourignyb, Philippe Rigaulta, Jean Morissettea, Bio2RDF: Towards a mashup to build bioinformatics knowledge systems http://dx.doi.org/10.1016/j.jbi.2008.03.004
Yuan An, Alex Borgida, and John Mylopoulos, Inferring Complex Semantic Mappings between Relational Tables and Ontologies from Simple Correspondences http://www.cs.toronto.edu/~yuana/research/publications/odbase05.paper13.pdf
Justas Trinkunas and Olegas Vasilecas, Building ontologies from relational databases using reverse engineering methods http://ecet.ecs.ru.acad.bg/cst07/Docs/cp/SII/II.6.pdf
Benjamin Habegger,Learning Data-Consistent Mappings from a Relational Database to an Ontology http://www.ceur-ws.org/Vol-200/06.pdf
Martin J. O'Connor, et al., Efficiently Querying Relational Databases Using OWL and SWRL http://bmir.stanford.edu/file_asset/index.php/1163/SMI-2007-1244.pdf
E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth, “OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies,” http://knoesis.wright.edu/library/download/MKSI96.pdf
C. Pérez de Laborda, S. Conrad, "Database to Semantic Web Mapping using RDF Query Languages", http://dbs.cs.uni-duesseldorf.de/~perezdel/pdf/06PeCob.pdf
There are several more Virtuoso whitepapers and presentations relevant to this topic, http://virtuoso.openlinksw.com/Whitepapers/ ... More than just Table-to-Class, Virtuoso maps Tables, Views, SQL Stored Procedures, and other Data Sources to Classes (which may not be made clear by these whitepapers without reference to or familiarity with the documentation). It seems that Virtuoso should be mentioned not only in 2.1.1 "Transformation RDB -> RDF" "Table to class" but also in 2.1.2 "Domain Semantics-driven", 2.1.3 "Automatic ontology-based mapping discovery", 2.2.1 "Querying" "SPARQL -> SQL", and 2.3 "Other", above. Especially note --
Virtuoso's SQL to RDF Technology Presentation (W3C RDF & DBMS Integration Workshop 10-25-2007
- M. Dean. Use of SWRL for Ontology Translation. 2008 Semantic Technology Conference, San Jose, CA, May 2008.
M. Fisher, M. Dean, and G. Joiner. A Tool for Semantic Relational Database Translation using OWL and SWRL. OWL Experiences and Directions 2008 DC, Gaithersburg, MD, April 2008. http://www.webont.org/owled/2008dc/papers/owled2008dc_paper_13.pdf
http://researchweb.watson.ibm.com/journal/sj/402/davidson.pdf
4.3. Related Topics
Olivier Curé, Raphael Squelbut, Semantic mapping to synchronize data and knowledge bases at the instance level http://www.eswc2006.org/poster-papers/FP34-Cure.pdf