HCLSIG BioRDF Subgroup/QueryFederation2 old

From W3C Wiki

Microarray experiment use case (old version)

This use case extends the query federation task to include federation of microarray related data.

Below are several examples extracted from the NIH Neuroscience microarray consortium.

The data (annotation) are available in MAGEML (XML) format. The following papers are related to Ex1.

Is MAGE-TAB format relevant to our goal (can we somehow use it to represent gene lists)? Also, is there a way to convert MAGEML format into MAGE-TAB format?

Provenance: experimental protocol and data protocol. How to represent them?

Atags examples (Matthias)

| "We hypothesize that the neuroprotective effect of nAChRs is mediated by modulation of the changes in gene expressions elicited by NMDA." aTags: Nicotinic acetylcholine receptor Neuroprotection NMDA receptor (Source) |

| "Stress triggers alterations in gene expresssion of many proteins, especially transcription factors. The target genes of repeated stress partially overlap, but differ from those altered by single stress." aTags: Stress repeated stress Gene expression (Source) |

Gene Expression Atlas at EBI (Scott)

http://www.ebi.ac.uk/gxa/

Gene Expression Omnibus at NCBI (Kei)

http://www.ncbi.nlm.nih.gov/geo/

How can we represent such data in RDF/OWL format? What is best practice? How do we link to other ontologies (e.g., NIFSTD, GO, SWAN, etc) as part of the experiment, sample, gene annotation?

To help address these questions, we have started interaction with Helen Parkinson at EBI/ArrayExpress and Maryann Martone at NIF/NeuroLex. Helen is in the process of curating some example neuroscience microarray experiments in ArrayExpress with the help of the NeuroLex group. The BioRDF group will explore how to expose the gene expression data in semantic web format and incorporating such data into its query federation use case.

See data curation notes by Helen

Below is a list of terms/concepts that are of potential interest to the use case (this list is still evolving).

  • brain regions
  • neuron types
  • neuronal properties such as neurotransmitters, receptors and channels
  • drugs
  • species
  • experimental conditions
  • disease conditions
  • data analysis methods
  • experimental methods (e.g., what array platforms are used, sample extraction method, ...)

Ontologies that may be relevant for the above concepts/terms include NIFSTD, EFO, etc ... The following experiments have been newly added by Helen's group.

  • E-GEOD-3305 Transcription profiling of rat spinal cord and oculomotor nucleus samples from animals aged 6, 18 and 30 months
  • E-GEOD-3296 Transcription profiling of primary mouse embryonic fibroblasts (MEFs) from C57Bl/6x129/Sv F2 e14.5 embryos that contain a deletion in the CH1 domain of three of four alleles of CBP and p300
  • E-GEOD-3327 Transcription profiling of different regions of mouse brain to study adult mouse gene expression patterns in common strains.
  • E-GEOD-358 Transcription profiling of rat whole brain samples from animals with repeated exposure to the anaesthetic isoflurane
  • E-GEOD-3343 Transcription profiling of spinal cord and oculomotor neurons from control and SOD1 mutant mice to test the hypothesis that the oculomotor neurons are intrinsically protected in amyotrophic lateral sclerosis
  • E-GEOD-3621 Transcription profiling of HD R6/1 transgenic mouse line brain hemispheres, time series
  • E-GEOD-3489 Transcription profiling of human brain samples from HIV infected individuals with HIV encephalitis vs controls
  • E-GEOD-3963 Transcription profiling of hippocampus and amygdala from naive, conditioned and fear stimuls exposted mice
  • E-GEOD-4757 Transcription profiling of human neurons with and without neurofibriallary tangles from Alzheimer's patients
  • E-GEOD-3202 Transcription profiling of human H720 cell line - MK886 treatment of non-small cell lung cancer cell line
  • E-GEOD-9770 Transcription profiling of human neurons from different brain regions derived from individuals with mild cognitive impairment
  • E-GEOD-4034 Transcription profiling of two mouse lines displaying different phenotypes on fear conditioning to identify differences in gene expression in two key brain regions: amygdala and hippocampus (related set E-GEOD-4035)
  • E-GEOD-4035 Transcription profiling of two mouse lines displaying different phenotypes on fear conditioning to identify differences in gene expression in two key brain regions: amygdala and hippocampus
  • E-GEOD-4036 Transcription profiling of human cerebellum from schizophrenia patients and normal subjects
  • E-GEOD-6511 Transcription profiling of mouse brain from animals treated for 4 weeks with the antipsychotics Clozapine and Haloperidol
  • E-GEOD-4130 Transcription profiling of Rattus norvegicus hypothalamoneurohypophyseal system from euhydrated and dehydrated animals
  • E-GEOD-4174 Transcription profiling of whole Drosophila brains fron animals exposed to sleep deprivation
  • E-GEOD-995 Transcription profiling of human acute myeloid leukemia cells to identify compounds inducing the differentiation of acute myeloid leukemia cells
  • E-GEOD-12667 Discovery of somatic mutations in lung adenocarcinomas
  • E-GEOD-4192 Transcription profiling time series of Rattus norvegicus SCN2.2 cell line and in vivo suprachiasmatic nucleus samples to compare oscillatory properties of each transcriptome
  • E-GEOD-4600 Transcription profiling of human differentiated vs. undifferentiated SH-SY5Y cells transfected MeCP2 a transcriptional repressor elevated in mature neurons to investigate the neurodevelopmental disorder Rett syndrome
  • E-GEOD-4734 Transcription profiling of mouse brain regions
  • E-GEOD-3790 Transcruption profiling of human cerebellum, frontal cortex [BA4, BA9] and caudate nucleus HD tissue experiment
  • E-GEOD-4773 Transcription profiling of human SK-N-MC cell line model of Parkinson's disease
  • E-GEOD-8150 Transcription profiling of mouse brain to identify age-related transcriptional changes and the effect of dietary supplementation of vitamin E
  • E-GEOD-10748 Transcription profiling of rat brain treated with D-serine
  • E-GEOD-13793 Transcription profiling of rat cerebellum and hippocampus following exposure to neurotoxicant Aroclor 1254
  • E-GEOD-6285 Transcription profiling of brains of mice fed four different diets for a 2-week duration
  • E-GEOD-6614 Transcription profiling of mouse brains following nicotine-induced seizures
  • E-GEOD-13524 Transcription profiling of rat nucleus accumbens of alcohol-preferring animals following chronic ethanol consumption

We may start with the following activities:

  • Explore on a pilot basis how to use Gene Expression Atlas to semantically link neuroscience microarray experiments and integrate their gene expression data under various experiment/sample conditions
  • Federate microarray gene expression data with other types of data including gene ontology, pathways, images, TCM data, drug data, other gene expression atlases, and so on.
  • Some example questions include:
    • which genes are over/under-expressed in affected subjects (compared to normal subjects) for a given brain region for a paritcular disease
    • which genes are expressed or not-expressed for a given type of neuron (e.g., CA1 pyramidal neuron, stellate neuron) with or without pathological conditions (e.g., neurofibrillary tangles).
    • gene profiling data may be combined with different types of data (e.g., pathways, gene functions, drugs, and herbs)

Presentations: