HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20100510

From W3C Wiki

Agenda meeting May 10

1. Overall goal of the group

(from recent email exchange with Tim, Tudor, Alex) with current status:

a) Make a schema for a coarse-grained paper (i.e. indicate section headers) - creating a link to help out the entity recognition algorithms and to ground medium-grained (level of claims) and fine-grained (level of entity) models in current practice (NEED TO DO NOW)

b) Integrate bibliography: part of the SWAN / myExperiment / MGED / OBI integration subtask - integrates SWAN-SIOC with myExperiment ontology of computational workflows, MGED and OBI ontologies of biomedical experiments (DONE)

c) Discourse ontology SWAN-SIOC: combined work of Harvard/MGH and DERI / NUIG - allows scientific claims, hypotheses and evidence (DONE)

d) Discourse relationships: TO BE DONE LATER

2. Different places to start with a):

Paolo/Tim: AO: Annotation Ontology [1] (slides)


Alex: In a linked data world one does not need to access the full content of the paper, instead one should be able to access the rdf representation of the paper in the form of link data so that one is able to relate specific sections in the papers, sentences, images, etc, to resources on the web -or part of resources on the web. (rdf circulated) RDF - corresponds to http://www.ncbi.nlm.nih.gov/pubmed/16630449

Tudor: Do we want to create a RDF schema by transforming, for example, NLM into RDFS (with focus on sections)? ... and then create some instance examples? Coraal example: pure RDF (previously transformed from the original XML article, and using the SALT ontologies) and it contains quite some information, including section titles, paragraphs, citations + citation contexts, etc (basically you can search in CORAAL for almost all these things). [2]

Tony: The basic stand-off annotation against XML documents, containing both the region and description. I see those as being separate pieces but internally I keep them together for the current work. It’s a very basic XML structure, below. OWL representation of the same information with classes etc example is at [[3]].

Anita: NLM Journal Publishing DTD v3 - tag library is here: [4] Sample: [5]


Tony's Example:

<AnnotationSet id="[docid]"> <Lookup> [description properties attributes as additional XML elements this can be per app, for BrainLink its just an ontology based identifier] <StartPointer startsAt="[start character offset within text node]" endsAt="[end character offset within text node]">[Full XPath to text node in document containing the starting point of the region being annotated]</StartPointer> <EndPointer endsAt="[end character offset within text node]">[Full XPath to text node in document containing the ending point of the region being annotated]</EndPointer> </Lookup> ..... </AnnotationSet>

Notes: This is just a core format as output from text mining so the regions and the descriptions are linked closely at this point. What is said about the region could be anything.

docid is my local document identifiers, not a URI as I donít need it to be dereferencable etc nor RDF for the work I'm doing. I have an export to RDF which takes care of that bit. I have multiple set for the same documents which in my XML repository have disctinct application based URIs also.

EndPointer is only required if the start and end points of the region are in different text nodes. If used the endsAt attribute is not present in the StartPointer.

The aim is to exactly mark the region within the XML being discussed as preceisly as possible. If you are annotating a specific element then you would use something like an ElementPointer which just had the XPath to the element (that covers the use case for existing structure where marked up in the XML already). For document level annotation the region would simply be the document identifier.

3. Next steps: whom, what, when.