DatasetDynamics

From W3C Wiki

Dataset Dynamics

The Issue

Linked Datasets change in the course of time: resource representations and links between resources are created, updated and removed; entire graphs can change or disappear. The frequency and dimension of such changes depends on the nature of a linked data source. Sensor data are likely to change more frequently than archival data. Updates on individual resources cause minor changes when compared to a complete reorganization of a data source's infrastructure such as a change of the domain name. Anyway, in many scenarios linked data consuming applications need to deal with these kind of changes in order to keep their local data dependencies consistent. Dataset dynamics denotes a research activity that currently investigates how to deal with that problem.

Use Cases

The Dataset Dynamics interest group identified three representative use cases in which applications need to be informed about changes in remote linked datasets.

  • UC1 Link Maintenance: An application hosts resources that are linked with remote resources and uses remote data in its local application context. It needs to be informed when representations of these remote resources change or become unavailable under a given URI in order to keep these links valid.
  • UC2 Dataset Synchronization: A dataset consumer wants to mirror or replicate (parts of) a linked dataset. The periodically running synchronization process needs to know which triples have changed at what time in order to perform efficient updates in the local dataset.
  • UC3 Data Caching: An application that consumes data from one or more remote datasets uses a HTTP-level cache that stores local copies of remote data. These caches need to be invalidated when the remote data is changed.

Technical Infrastructure

These use cases require for a technical infrastructure comprising the following components:

  • A Dataset Dynamics Vocabulary that can express meta-information about the dynamics of a data set (e.g., change frequency, dimension of changes, last update, etc.) and provide a link to the update notification source URI.
  • Applications for detecting and dealing with changes

Examples and Demos

  • sparqlPuSH, SPARQL + pubsubhubbub, Alexandre Passant
  • GUO Graph Diff, a prototype script for performing "diffs" on RDF Graphs, Nathan
  • Linked Data Camp Vienna 09 demo, using voiD+dady and Atom.


Related Work

name type discovery notification change representation
DSNotify protocol, RDF schema no yes yes
Web of Data Link Maintenance Protocol protocol, XML schema ? yes yes
Ping the Semanitc Web centralised service ? ? yes
SemanticPingBack Pingback extension ? ? ?
Memento: Time Travel for the Web HTTP extension yes no ?
RFC4287 - Atom Syndication Format XML schema ? yes no
RFC5023 - Atom Publishing Protocol protocol yes yes no
Talis' Changesets RDF vocabulary ? ? yes
Triplify's Updates RDF vocabulary ? ? yes
Graph Update Ontology (GUO) RDF vocabulary ? ? yes
Guaranteed RDF Update Format (GRUF) format ? ? yes
Web Subscription (WebSub) protocol ? yes ?
dady (data source dynamic) RDF vocabulary yes no no
sparqlpush - pubsubhubbub (PuSH) interface for SPARQL endpoints. protocol no yes yes*
PubSubHubbub (PSHB) - an open, simple, web-scale pubsub protocol protocol yes yes yes* (via Atom or RSS)
Simple Update Protocol (SUP) - a simple and compact "ping feed" protocol yes yes no
Delta - an ontology for the distribution of differences between RDF graphs N3 vocabulary no no yes
SPARQL Inferencing Notation (SPIN) - SPIN SPARQL Syntax RDF vocabulary no no yes
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) protocol / XML Schema no no yes

Discussion

Meetings

see http://www.w3.org/wiki/DatasetDynamics/Meetings

Related