TaskForces/CommunityProjects/LinkingOpenData/DataSets
SWEO Community Project: Linking Open Data on the Seamntic Web
Datasets
This page collects open datasets.
It is a first draft and I will add more datasets in the next days. ChrisB
The page is part of the community project [[SweoIG/TaskForces/CommunityProjects/LinkingOpenData|]]
Datasets available with dereferencable URIs
- WordNet: WordNet is a large lexical database of English. Currently being RDFized by a Best Practices Task Force. Detail ...
- DBLP Bibliography: Provides bibliographic information about scientific papers. Size of the dataset. 800.000 articles and 400.000 authors, aprox. 15 million triples.
- RDF Book Mashup: Provides bibliographic information, reviews and sales offers for most books that have a ISBN number. Maps data from Amazon and Google base to RDF. Size of the dataset: Unknown, billions of triples.
- dbpedia: Dataset containing extracted data from Wikipedia. About 19 million triples. Please don't use for linking yet, as the URIs will change in the next weeks.
Datasets available as RDF Dump
- Lots. Please feel free to add plenty :-)
Datasets available via SPARQL Endpoints
See [[[SparqlEndpoints]]]
Datasets currently being RDFized
- MusicBrainz. Please ask Frederick Giasson for details.
- US Census Data. Please ask Josh Tauberer for details.
- GEMET. GEMET is the GEneral Multilingual Environmental Thesaurus of the European Environment Agency. Please ask Bernard Vatant for details.
Datasets that would be nice to have on the Semantic Web
- Lots. Please feel free to add plenty :-)