HCLSIG BioRDF Subgroup/Meetings/2006-02-27 Conference Call

Date of call: February 27, 2006

Convener: Susie Stephens
Scribe: Kei cheung

Meeting Minutes

Attendees: Davide Zaccagnini, Roger Cutler, John Barkley, Kei Cheung, Scott Marshall, Alan Ruttenberg, Susie Stephens, Joanne Luciano, Robin McEntire, Christophe Poulain

The meeting started with the discussion of the following two aspects of the BioRDF group.

1. Identification of different tasks. 2. Identification of datasets that can be converted into RDF format.

Someone pointed out that there was a need for identifying the scope of BioRDF. What is the purpose of this group? Should the group focus on giving a demo from the bench- to bed-side? Is the group focused on converting data (in different formats including XML, Excel, tabular structure) first using existing tools without the need for identifying use cases. It might be useful to learn how to use existing tools to convert data to RDF, and figure out the use cases later.

The tasks of this group include identification of the scope as well as the goals. One of the main goals is to learn how to use existing tools to convert data into RDF (this is one big mission of this subgroup). Another big mission is to build a demo that is useful to answer an important scientific question. To do this, we need to understand a scientific domain first and then identify commonly used datasets (brain data?). We need to know what queries or questions will be asked in neuroscience. There might be a need to break the group up into two subgroups: one to identify commonly used bio datasets and the other convert such datasets into RDF datasets. (brain data?).

Scott pointed out that translation of data into RDF should take data type definition into account. Currently, semantic mapping is lacking in some translation approaches. We need to work with the ontology working group in this aspect. Davide mentioned that there has been an ongoing effort (involving NLM and W3C) in translating free text into RDF. Susie said that this group should also coordinate with the unstructured data working group. She also said that the tasks of this group include: demo task and RDF data conversion task.

Alan said that the group should work based on a narrow scope rather than a broad one. It is important to get data into RDF form and sketch out a few data sources such as brain atlas (XML only) and OMIM (disease description in free text) that maps genes to diseases. Susie also pointed out that the group should probably focus on only a few datasets (e.g., five different datasets) for now. Someone also said that reusability is important (e.g., some XSLT programs already exist). Kei mentioned whether metadata describing the datasets should also be captured in RDF format (e.g., RSS). The group agreed to focus on syntactic translation first (without concerning semantic mapping).

Use cases can be useful in the sense that RDF conversion might be framed by the questions being asked. Identifying neuroscience use cases become important. The group can work with John Wilbank by asking him about which datasets are commonly used in neuroscience, how to best format the data and do the demo (e.g., relating to specific diseases).

Susie suggested that the group focus on assigning tasks to members. Susie would help define the contents of gene/protein task. We also need members to upload their code and conversion examples to the wiki site for the purpose of sharing experiences and ideas. For example, Alan has entrez gene data in tabular format. John may be able to share some targeted ontologies for describing algorithms and data structures in CS. Scott may be able to share an example of converting tabular data into RDF. Scott pointed out that there are multiple approaches to RDF conversion from which we can choose to fit our needs the best. We might be able to get some experiences with these approaches.