NamedGraphs in Life Sciences

The motivation for these pages came out of discussions at the Amsterdam F2F and was nicely summarized by Chimezie's follow-up email. NamedGraphs as defined by Carroll et al [1 ], are a means to group a collection of RDF triples and give this collection a unique URI. It is a form of reification, but at a larger scale than per single triple, and hence has some powerful utility when used in life sciences as a grouping mechanism. The origin of a formal logic for Named Graphs can probably be traced back to Ramanathan Guha's PhD thesis [11]. It's no coincidence that Ramanathan's early work on MCF [12] was the the immediate predecessor (and primary motivation) to RDF.

Another important point to consider is that in life sciences, most of what is stated is not fact, rather it is interpretation based on limited evidence and knowledge, or hypotheses that can never be proven true, but only proven false. Both kinds of statements require acknowledgement of KD45 propositional logic rather than S5 propositional logic see Amsterdam F2F slide, [8 ]. The fundamental difference between these two is that S5 requires all statements 'known' to be also 'true' ( know(phi) => phi ), whereas KD45 does not use this axiom, and substitutes the axiom that statements can be 'believed' and possibly false, but no known falsehood can be 'believed' ( believe(phi) => ¬ believe(¬phi), where ¬ means NOT ). This requirement demands the compartmentalization of statements and logic, which could be supported by using NamedGraphs.

What follows are some useful examples that make the case for NamedGraphs, or a comparable form for RDF compartmentalization and tracking.

Life Science usages

Some explicit NamedGraph examples

  1. Joanne believes { Alan states { LSIDs are insufficient } . Alan in Amsterdam_F2F}
  2. <doi:10.1038/ni1006-1021> describes { WNT_Pathway is_involved_in Hematopoiesis_Stem_Cell_Differentiation }

SPARQL and Named Graphs

The SPARQL specification's section on the RDF Dataset [9 ] represents the current critical mass concensus on how the RDF data model can be logically composed as named graphs. It defines an RDF Dataset as having a 'default' graph (which doesn't have a name), as well as other 'named' graphs - each associated with an IRI.

Existential Contexts?

A recent thread [10 ] on public-sparql-dev identified an additional usecase for named graphs: as a context for triples with 'unknown' origin. The conversation suggests the possible use Blank Nodes to identify such contexts. This can be used to express assertions such as: 'There exists a collection of statements about relationships of cardiovascular anatomy'

Papers and References

  1. Carroll et al, 2004

  2. Design Issues

  3. NamedGraphs using JENA

  4. Scoping assertions by context

  5. Relationships between contexts

  6. W3C NamedGraphs Activity

  7. RDFS NG extension

  8. Modal Epistemic Knowledge

  9. RDF Dataset

  10. SPARQL, named graphs and default graph

  11. Contexts: A Formalization and Some Applications

  12. Meta Content Framework Using XML

HCLS Home

Categories

Discussions

Post to HCLS listserv

Links

HCLSIG/NamedGraphs in Life Sciences (last edited 2006-10-18 18:10:13 by 67-135-38-162)