News
25/04/2008: Linked Data: Principles and State of the Art talk at the W3C Track at WWW2008.
24/03/2008: Please contribute to the Linking Open Data Triplification Challenge at I-Semantics 2008 which will award a Mac Book Air and other nice prices the most promising approaches to exposing the content of existing Web applications as Linked Data on the Web as well as to projects that publish high-impact open datasets as Linked Data.
22/03/2008: Triplify - a small plugin for database backed Web applications for exposing Linked Data and RDF/JSON released.
27/02/2008: Tim Berners-Lee Talks with Talis about the Semantic Web and Linked Data - Podcast and Transcript available
03/02/2008: Linked Data Planet - Conference & Expo (June 17 - 18, 2008, Roosevelt Hotel, New York City, USA) announced.
31/01/2008: riese, the RDFizing and Interlinking the EuroStat Data Set Effort) has been launched - the goal is to serve the entire Eurostat dataset (some 3 billion triples). This is the first linked-data set deployed in XHTML+RDFa; riese offers links to other LOD datasets and introduces a new interlinking method, called User Contributed Interlinking (UCI).
18/12/2008: Open Data Commons Public Domain Dedication & Licence (PDDL) and Community Norms released for review.
28/11/2007: WWW2008 workshop about Linked Data on the Web (LDOW2008) announced.
15/11/2007: Linking Open Data project member Revyu.com wins the 2007 Semantic Web Challenge at ISWC+ASWC 2007.
06/11/2007: Two new Semantic Web Search Engines released: 1. Sindice developed by DERI Ireland, currently indexes 11 million RDF documents; 2. Falcons developed by IWS China, currently indexes 17 million "objects" from 2 million RDF documents.
08/10/2007: DBpedia now links to OpenCyc concepts.
09/12/2007: A Linked Data version of Wikicompany, the worldwide business directory that anyone can edit, is made available by OpenLink Software. It has been extracted using the DBpedia software. Example entries: Northwest Airlines, Apple Computer.
09/11/2007: flickr wrappr provides photos of DBpedia resources from flickr.
08/31/2007: Joshua Tauberer's GovTrack.us has followed the U.S. Congress since 2004. All members, bills and votes are now available via dereferenceable URIs, in addition to the existing RDF dumps and SPARQL endpoint.
08/24/2007: Michael Smethurst from the BBC has published data about the TV programmes "Top of the Pops" and "Later with Jools Holland" as linked data at http://bbc-hackday.dyndns.org:2825/. These data sets are interlinked with Musicbrainz and DBpedia.
07/19/2007: Tutorial on How to publish Linked Data on the Web published. The tutorial gives an overview about the ideas behind Linked Data and provides several practical recipies for publishing Linked Data on the Web.
07/04/2007: The first draft of the The Semantic Crawling sitemap extension is published. The extensions targets crawling issues associated with sites offerering a large number of URI/URLs as linked data.
06/01/2007: There is a poster presentation about the Linking Open Data project on Monday at ESWC2007. There is also a Linked Data dinner on Wednesday.
06/01/2007: There are some interesting papers about Linked Data at the ESWC Scripting for the Semantic Web workshop including papers about the RDF Book Mashup, Semantic Radar and PTSW, and Sindice
05/21/2007: Video covering Paul Miller's Opening the Silos - sustainable models for open data talk from XTech 2007 - Open Data track.
05/11/2007: Paul Miller from Talis published two blog posts about the Linked Data sessions at WWW2007 on Wednesday and Friday: Linked Data BOF and Linked Data once again.
Project Description
The Open Data Movement aims at making data freely available to everyone. There are already various interesting open data sets availiable on the Web. Examples include Wikipedia, Wikibooks, Geonames, MusicBrainz, WordNet, the DBLP bibliography and many more which are published under Creative Commons or Talis licenses.
The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources.
RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications.
The figure below shows the datasets that have been published and interlinked by the project so far. Collectively, the datasets consist of over two billion RDF triples, which are interlinked by around 3 million RDF links (October 2007).
Clickable version of this diagram. Please visit the Datasets page for an up-to-date list of all published datasets.
Project Pages
The project collects relevant material on several wiki pages. Please feel free to add aditional material, so that we get an overview about what is already there and what is currently happening.
Meetings & Gatherings
New York City (June 2008) Linked Data Planet (type: Conference & Expo)
Tenerife (ESWC08) Gathering (type: Face-Face Meeting)
Linked Data on the Web (LDOW2008) Workshop (type: Workshop)
BOF on Semantic Web Search Engines at WWW2008 (type: Face-Face Meeting)
Beijing (WWW2008) Gathering (type: Face-Face Meeting/Birthday Celebration)
Busan (ISWC2007) Gathering (type: Face-Face Meeting)
London (August 2007) Gathering (type: Face-Face Meeting)
Innsbruck (ESWC2007) Gathering (type: Face-Face Meeting)
Banff (WWW2007) Gathering (type: Face-Face Meeting)
See Also
Tim Berners-Lee: Linked Data (architecture note outlining the basic ideas of Linked Data)
Chris Bizer, Tom Heath, Tim Berners-Lee: Linked Data: Principles and State of the Art talk at the W3C Track at WWW2008.
Christian Bizer et al.: Interlinking Open Data on the Web (Two page document giving an overview about the Linking Open Data project)
Tim Berners-Lee: Browsable Data
Christian Bizer et al.: How to publish Linked Data on the Web (Tutorial)
Sauermann et al.: Cool URIs for the Semantic Web (tutorial on URI dereferencing and content-negotiation)
Alistair Miles et al.: Best Practice Recipes for Publishing RDF Vocabularies (W3C draft on serving RDF vocabularies according to the Linked Data principles)
Richard Cyganiak: Debugging Semantic Web sites with cURL (tutorial on how to test Semantic Web sites)
Ding, Finin: Characterizing the Semantic Web on the Web (kind of outdated but still interesting paper on RDF data on the Web)
Michael K. Bergman: More Structure, More Terminology and (hopefully) More Clarity (contextualization of Linked Data in the general development of the Web)
NetworkedData (old page on related topic)
RdfLite ("no bnodes" etc)
OpenLink Software: Deploying Linked Data using the Virtuoso Universal Server
OpenLink Software: Generating Linked Data from non RDF Data Sources via the Generating Linked Data from non RDF Data Sources via the Virtuoso Sponger
Following your Nose to the Web of Data Draft version of an Information Standards Quarterly article about Linked Data and the LOD project.
Mike Bergman: Linked Data Comes of Age - LinkedData Planet and LDOW Set the Pace for 2008
Uche Ogbuji: Real Web 2.0: Linking open data - Discover the community that sees Web 2.0 as a way to revolutionize information on the Web, IBM Developerworks, Feb. 2008.
Tom Heath: 'No more toy examples' - the Linking Open Data project, one year on. Talis Platform News, Feb. 2008.
Mailing List
As there are lots of interesting mail conversations around the project, we decided that we need a mailing list.
The list is hosted by W3C and you can subscribe it at http://lists.w3.org/Archives/Public/public-lod/.
The project's mailing list was hosted by the MIT SIMILE project until Spring 2008; MIT also maintains an archive of the list until that point.
FAQ
1. Please provide a brief description of your proposed project.
The Open Data Movement aims at making data freely available to everyone. There are already various interesting open data sources availiable on the Web. Examples include Wikipedia, Wikibooks, Geonames, MusicBrainz, WordNet, the DBLP bibliography and many more which are published under Creative Commons or Talis licenses.
The goal of the Linking Open Data project is to build a data commons by making various open data sources available on the Web as RDF and by setting RDF links between data items from different data sources.
RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications.
There are already some data publishing efforts. Examples include the DBpedia.org project, the Geonames Ontology, the D2R Server publishing the DBLP bibliography and the dbtune music server. There are also initial efforts to interlink these data sources. For instance, the dpedia RDF descriptions of cities includes owl:sameAs links to the Geonames data about the city (1). Another example is the RDF Book Mashup which links book authors to paper authors within the DBLP bibliography (2).
2. Why did you select this particular project?
For demonstrating the value of the Semantic Web it is essential to have more real-world data online. RDF is also the obvious technology to inter-link data from various sources.
3. Why do you think this project will have a wide impact?
A huge inter-linked data set would be beneficial for various Semantic Web development areas, including Semantic Web browsers and other user interfaces, Semantic Web crawlers, RDF repositories and reasoning engines.
Having a variety of useful data online would encourage people to link to it and could help bootstrapping the Semantic Web as a whole.
4. Can your project be easily integrated with other wide-spread systems? If so, which and how?
The data can instantely be browsed with Semantic Web browsers like Tabulator or Disco or OpenLink RDF Browser. Existing Semantic Web crawlers like SWSE, Swoogle can provide an integrated view on the data and sophisticated search interfaces.
5. Why is it that this project should be done right now, i.e. why should people prioritise this ahead of other projects?
It is getting boring to play around with toy examples as most Semantic Web projects do.
6. What can you contribute to the project?
We will keep on working on dbpedia and start serving RDF for all 1.6 million concepts in Wikipedia in a couple of weeks. As Wikipedia contains information about various domains, we think dbpedia URIs could function as a valuable linking-hub for interconnecting various data sources. We could link to related data from dbpedia as we already did with the links to geonames.
7. What contribution would you need from others?
- Propose additional open data sources that could be mapped to RDF.
- Convert a data source to RDF and serve it as linked data or SPARQL endpoint on the Web.
- Invent heuristics to auto-generate links between data items from different sources.
8. What standardisation should the Semantic Web community at large undertake to support the project?
There is already standardisation going on within the SWBP working group: Best Practice Recipes for Publishing RDF Vocabularies. It would also be useful to propagate Tim's linked data ideas.
9. How does your project encourage others not currently involved with Semantic Web technologies to get involved (by providing data or make a coding commitment)?
Having useful data online might initialize network effects. The project could raise awareness within the Open Data community about the benefits that RDF as a shared data model offers them. Having richly inter-linked data online might inspire people to create interesting mashups and other RDF-aware applications.
10. What would be the main benefit of using Semantic Web technologies to achieve the goals of the project, compared to other technologies?
RDF provides a flexible data model for integrating information from different sources. Especially its linking capabilities are not provided by any other data model.
Commitments
If you like this project, please write your name below and indicate what contribution you can make to the project. Possible forms of commitment are:
- I think this project is a good idea and it's realization would be useful.
- I would like to propose further data sources for being published as RDF
- I would like to convert a data source to RDF
- I could serve some data from my server (if somebody would give it to me)
- I would like to work on heuristics to auto-generate links between data items from different sources
- I could talk with other people that might want to contribute to the project
Chris Bizer and Richard Cyganiak proposed the project to the W3C SWEO. We maintain several Linked Data sources, including DBpedia, DBLP Berlin, CIA Factbook, Book Mashup and Eurostat, and do outreach and coordination work for the project.
Sören Auer - I try to contribute with regard to converting, serving data-sources and talking to people
Bernard Vatant - Already involved in Geonames ontology, and linking Geonames data and concepts to other sources such as INSEE data. Projects to do more, linking to GEMET concepts, Wikipedia categories etc.
Josh Tauberer - 700 million triples of U.S. Census data coming very soon now.... (Just having some free disk space issues loading it into MySQL.) Tieing this to GeoNames will be an interesting/useful project for someone looking for a project.
Tom Heath - Great idea. I can contribute the involvement of Revyu.com, (AFAIK) the only RDF-based reviewing and rating site in the wild. The sites exposes data using FOAF, the Review Vocab and Richard Newman's Tag Ontology, and everything gets dereferenceable URIs. The data set is modest, but growing. I'm really interested in developing heuristics to auto-generate sameAs links between URIs from Revyu and elsewhere, and ways to infer locations of things from reviews and tags and hook this into Geonames.
Kingsley Idehen, Orri Erling, and Frederick Giasson - Additional and complementary RDF Data Sources in line with Linked Data principles via projects such as PingTheSemanticWeb, SearchTheSemanticWeb, and OpenLink Data Spaces. In addition we will be making Virtuoso available as an RDF Data Store for scalability experimentation, exploration activities, etc.
Marc Wick - Implementation of Geonames RDF web services
Felix Van de Maele - A very interesting project. I developed an ontology-focused crawler and am currently working on the community-driven ontology matcher and mediator which might be handy to interlink rdf data sources.
Stefano Mazzocchi - As part of the MIT Simile Project, I've been RDFizing large datasets for years (unfortunately, most of these are not data I can make publicly available). We provide a way to export RDF data from all of our RDF browsing tools, but we haven't focused on providing URI dereferencing for such data and I agree that it might be important to start doing so. The juiciest dataset we have to offer is a 50Mt dump of the MIT Libraries catalog covering about a million books. I'm also currently working on an owl:sameAs-based RDF smoosher (which is already functional from the command line) and I'm planning on working on equivalence mining next. Also worth noting how the SIMILE Project has a large collection of RDFizing programs that can be used to generate large quantities of RDF from existing data.
DannyAyers - intend to look at ConverterFromRdf possibilities, linking the output semweb systems to "legacy" data consumers, also wondering about low-cost linkage/heuristics for describing datasets
Ed Summers - I'm a software developer at the Library of Congress interested in making bibliographic and authority datasets available to the semantic web.
Yves Raimond - I am a PhD student in the Centre for Digital Music, Queen Mary, University of London, and I am interested in linking music-related open data (Musicbrainz, Magnatune, Jamendo, Dogmazic, Mutopia, among others...). I am also part of two projects, in which I am trying to promote such an approach: EASAIER (Enabling Access to Sound Archives through Enrichment and Retrieval) and OMRAS2 (Online Music Recognition and Searching).
Vangelis Vassiliadis - I could work on heuristics to auto-generate links between data items from different sources and on adding domain knowledge to different datasets by means of OWL.
Dmitry Ulanov - I'm a developer of THALIA testbed. It can be used for benchmarking relational database to RDF mapping tools.
Georgi Kobilarov - I maintain the DBpedia extraction framework. I'm interested in developing tools to help data publishers interlink their databases and in building UIs for end-users .
Huajun Chen - developer of DartGrid which is a relational data integration toolkit using semantic web technologies. Two major components of DartGrid are a visulized semantic mapping tool and a view-based(or more generally rule-based) SPARQL-SQL query rewriting component. ISWC2006 Paper introduces the details.
Chris Wilper - I'm a developer of Fedora and also work with the National Science Digital Library group at Cornell. I'm interested in the collaborative development of OLTP benchmarks for triplestores.
Giovanni Tummarello - I created Sindice, together with Eyal Oren. Sindice is a linked data search engine which returns ranked lists of "SeeAlso" URLs which contain information about a given URI. In a sense it overcomes the problem of the need of the mandatory "SeeAlso" statements by looking anywhere on the web (via people providing direct Ping either to ourself or to ptsw and via our array of swse bots). "SeeAlso" statemens remain useful however for ranking purposes. Service has a simple http API, see for example all the links Sindice knows which talk about Tim Berners Lee here.
Sherman D. Monroe - I'm the author of Cypher, which is a transcoder with generates the RDF and SeRQL (working on SPARQL port) representation of natural language phrases and sentences. The project page can be found here. The Cypher project aims to collect and unify the various sources of data used for NLP tasks, such as WordNet, FrameNet, PropBank, as well as annotated corpora, and also to provide standard ontologies for things like part-of-speech tagging. We wish to provide a single resource which NLP applications can use to leverage this data. I'm also working on a Semantic Web web service called overdogg (currently in alpha) which is a new type of marketplace based on reverse auction for services, and another soon to be announced service which builds FOAF databases of users.
Adam Sobieski - Interested in event ontology and also wikitology. I'm making a website that allows users to select or create predicates and drag and drop nouns (noun phrases) from sentences into predicate slots. The interface will capture pronoun resolution and semantics from visitors reading. The sentences will be viewed in order from articles to obtain context information. The downloadable resource will be both a corpus (hopefully as useful as Penn treebank and Redwoods) and a collaborative ontology relating nouns from real-world encyclopedic articles.
Fabian M. Suchanek and Gjergji Kasneci - We provide YAGO, a large ontology. YAGO is available for querying and for download in different formats (RDFS, XML, database, text).
Troy Self - I maintain SemWebCentral, which is a development site for Open Source Semantic Web tools. I also maintain the RDF browser, ObjectViewer, the ontology summarizer, Ocelot, and was one of the primary developers of the Semantic Web Development Environment, SWeDE.
Bernhard Haslhofer - I am working on Semantic Web topics in the Digital Library and Archives domain. An easy way to link hundreds of data sources would be to build a wrapper for the Open Archives Protocol for Metadata Harvesting (OAI-PMH). I intend to get some work done in this direction.
David Peterson - I am working on getting large Australian science datasets converted to RDF and accessible via SPARQL. We work with over 13 large and diverse science organisations so I believe this will be a valuable contribution.
Joerg Diederich - I am working on Semantic Web topics and Digital Libraries and I am the maintainer of FacetedDBLP. I am planning to contribute my local DBLP data (updated weekly) by means of the D2R technology from FUBerlin very soon.
Danny Gagne -- I think this is a great idea. I'm going to work on building some tools, trying to create a small dataset, who knows what else
MichaelHausenblas -- Currently trying to RDFize the Eurostat data ...
Jonathan Gray -- I'm from the Open Knowledge Foundation and I'm interested in open data licensing and locating and listing open datasets.
Bernhard Schandl -- I am working on integration of semantic data into user desktops and file systems and am interested in the possibilities of publishing such data on the web.
David Huynh -- I'm interested in building UIs for browsing and viewing the collected data. I don't think that a SPARQL interface appeals to the general public, and a pure search text box a la Google takes sufficient advantage of the graph nature of the data.
Daniel Lewis -- I am a Technology Evangelist for OpenLink Software, my interests are in making the Social Web more Semantic, making the Semantic Web more User Friendly and making the web more intelligent. The only way to advance web applications is to expose and link data between domains - Semantic Web technology can do that. My blog is available here and I tend to talk about various subjects (not just about the Semantic Web).
Andreas Langegger -- I'm currently developing a middleware for virtual data integration based on SW technologies. It will be used inside the Austrian Grid for sharing structured scientific data but because of the relevance for the SW community, it will be released as SemWIQ (Semantic Web Integrator and Query Engine) in mid-2008. Among other goals, I try to keep setup / configuration as simple as possible. Stay tuned (currently optimization is on the top of the agenda)...
Sergey Chernyshev -- I'm running TechPresentations.org project based on Semantic MediaWiki (SMW) technologies and participating in a development community around SMW. Among my goals is to connect TechPresentations data to LOD dataset and to make use of SMW-related technologies to help crowd-sourced data get interconnected with other LOD datasets.
Rob Cakebread -- I run Doapspace.org and am mainly interested in DOAP and linking it with FOAF, BEATLe, SIOC. I'm a Gentoo Linux developer and I'm working on tools for users and package maintainers to benefit from the metadata provided by DOAP and related ontologies.
Francois Scharffe -- I'm a researcher at STI Innsbruck in the area ontology alignment. I'm involved in the EASAIER project where we publish music related datasets using the music ontology. I've worked on SPARQL++, a SPARQL extension allowing to transfer RDF data from one ontology to another. I'm interested in data fusion techniques. I'm also interested creating a classical music reference knowledge base that could be used as an anchor to publish classical music datasets.
Sign-up HOWTO
When you signup to the mailing list please do the following: Here is my little sign-up guide:
Sign up to the mailing list
- Send a little self introduction to the mailing list (include an intro to your project and associated RDF Data Sets where such exist or are planned)
Get yourself a Linked Data Web URI (an ID for you the Person Entity e.g URI for Kingsley Idehen an Entity of Type: Person)
Register at http://community.linkeddata.org/ods/index.html and use the Profile page to Link to your other URIs (if such exist) via the "Synonyms" input field (note: this system gets you a Person URI and an OpenID if you don't already have these)
Create a Wikiword (Topic) for yourself e.g Kingsley Idehen and expose one of your Person Entity URIs there
- Add your Wikiword to this page
- Add Project References to section above.
Add Data Set References to the RDF Data Sets page
