MatthiasSamwald/DOLCE bio-zen patient-record proposal

From W3C Wiki

bio-zen / POMR proposal

The first stable version of the bio-zen ontology has recently been released, a release of an extension (‚bio-zen plus’) will soon follow. This proposal describes how bio-zen and the DOLCE foundational ontology could be used for the HCLSIG Semantic Web demo.


Ontologies used:

  • bio-zen
  • bio-zen plus
  • Problem-Oriented Medical Record Ontology

Download: bio-zen, bio-zen plus download page, POMR download page

bio-zen is based on the DOLCE foundational ontology, the Simple Knowledge Organisation System (SKOS) and Dublin Core. In the current version, it is focused on the description of molecular interaction pathways, but can also be used to many other biological phenomena in a more generic fashion. bio-zen also incorporates ontological constructs to make statements about digital information resources, based on my URI proposal.

bio-zen plus is based on bio-zen, and adds selected constructs from the SIOC (Semantically Interlinked Open Communities) ontology , the FOAF ontology and the Creative Commons Metadata schema. It can be used to add descriptions about persons (e.g. researchers), publications, organisations and online-communities. SIOC is basically a representation of bulletin boards, blogs and mailing lists in OWL. Both FOAF and SIOC are backed by a relatively large (and growing) community of users and developers. In the context of research, SIOC is an excellent tool to describe scientific discourse in a practical, web-centric manner. bio-zen plus also add some constructs to represent the basics of scientific discourse (e.g. one can make the statement that a certain posting / document / dataset is supported or in conflict with some other posting / document / dataset).

The Problem-Oriented Medical Record Ontology is described at [1]. It is also based on DOLCE and includes mappings to bio-zen, FOAF and Galen.

All of the bio-zen ontologies and datasets are valid OWL DL (FOAF, SIOC and DC are not originally valid OWL DL, so some minor modifications were made to these).

bio-zen-logo-NEW-smaller_01.gif logo_sm.gif dcmi_sm.gif SmileysTransp.gif sioc_logo.gif 220-cc.logo.circle.jpg opengalen.gif


Datasources and how they are mapped to the ontology:

  • Pubmed

Articles represented as foaf:Document (a subclass of bio-zen:data) Metadata (e.g. Title, Authors) of articles is represented with Dublin Core Links to fulltext / abstracts using constructs of bio-zen digital resource management Annotation with MeSH concepts represented as SKOS (already available)

  • Entrez Protein
  • Entrez Gene
  • possibly other Entrez databases

The following databases can be converted with the pax2zen converter from available BioPAX data:

Work to be done:

  • Contact and negotiate with the data providers (to make sure that they agree)
  • use the pax2zen converter to convert datasets where applicable, debug the resulting OWL files (see if they are valid OWL DL, if not, try to fix it).
  • use XQuery or a scripting language to convert Entrez datasets.
  • Try to map annotations in data to correct OBO concepts (e.g. INOH has ChEBI annotations). This might result in most of the manual work, but it would be worthwhile, as the concept annotations can act as a ‘glue’ between datasets from different data providers.

Taxonomies available as SKOS concepts (from OBO):

Evidence Codes Ontology

MeSH

Gene Ontology

celltype ontology

ChEBI (Chemical entities of biological interest)

NCBI Taxonomy

Sequence Ontology

INOH Molecule role/type ontology

Disease Ontology

The newly created “Synapse Ontology” might also be added as a SKOS concept hierarchy: http://syndb.cbi.pku.edu.cn:8080/graphy/so?level=3

Neuronames would also be a good candidate, but we need to contact the administrator of neuronames (it is not clear if we could offer a SKOS version of neuronames, as it is part of UMLS, which poses some legal restrictions).

If legally possible, other parts of UMLS could also be converted to SKOS. (Olivier?)


The connection 'from bench to bedside': Mapping to the Problem Oriented Medical Record Ontology

See [2]


Implementation

The data will be made accessible through Sparql endpoints. Each datasource (e.g. Pubmed, Entrez Protein, BIND…) should be represented by one Sparql endpoint, possibly on different domains. Some of the smaller datasources should be accessible for viewing and editing through OntoWiki. A demo of a prototype of OntoWiki can be found at http://3ba.se/

pOWL / Ontowiki is based on PHP and MySQL and can be downloaded from http://sourceforge.net/forum/forum.php?forum_id=631806

Federated queries can be demonstrated with a client-side Sparql query federation software (e.g. DARQ).


Relevance for Neuroscience (and Parkinson’s, Azheimer’s and Huntington’s):

While none of the information resources used are specialized on these diseases or neuroscience in general, they are including wide aspects of these fields of investigation. It should be easy to demonstrate the usability for neuroscientific research with these datasets.


Some positive aspects of the proposal:

  • It is not a mere ‘demo’, but a fully functional system that represents data that is of known use to researchers (including neuroscientists).
  • The bio-zen ontologies are ontologically consistent and valid OWL DL. The conversion from existing datasources will not be a mere syntactic translation, but will also encompass a semantic reinterpretation (from a database record towards a description of exemplaric entities in the real world).
  • Based on a foundational ontology (DOLCE). Better interoperability with ontologies / information outside our domain.
  • Based on existing Semantic Web ontologies and metadata standards. Good practice, demonstrates the power and the underlying philosophy of Semantic Web standards. Allows the usage of existing tools for these ontologies (e.g. FOAF visualisation, SIOC-enabled bulletin boards).
  • The ontologies used will be maintained beyond the scope of the HCLSIG.
  • The proposal spans the major working groups and existing proposals of the HCLSIG with a coherent and consistent system (BioRDF / conversion of existing data, ontologies, knowledge ecosystems / scientific self publishing, information resource resolution using ontologies)

Negative aspects of the proposal:

  • It overlaps with other activities; redundancy might make everything a bit more complicated

For a list of other proposals see HCLSIG Demo Proposals.