Use Case DC5 - Agricultural Scientist
- Goal
- How can crops help or hinder soil microbes
- Summary
-
Jean has two current projects. For the first, he studies the interactions between soil microbes and plants. It is funded by the USDA to investigate the communities of soil microbes that are associated with heirloom corn crops. His work consists of setting up experimental plots at an agricultural field station and using 454 pyrosequencing to investigate soil microbe diversity associated with different species of corn. His work produces extremely high amounts of molecular data that requires high technical support. Right now his data are stored within his institution and he obtained money for a web developer to build a visualization interface into his data. Jean has no problem sharing his data after publication, but does not want to be scooped in publications or proposals. He uses MEINS guidelines for metadata.
-
His second project uses the tomato as a model system to study sympodial growth. This project involves two types of data: phenotypic and genomic. The phenotypic data are collected via pencil and paper after seeds are germinated. Jean uses pedigree numbers to connect genotype, phenotype and generation and all data are stored in a pedigree book. He doesn’t normally share this book because it wouldn’t make sense to others, but freely distributes seeds to colleagues that ask for them. He considers these seeds to be data. When he receives seeds from others he “vets” the data by germinating the seeds and confirming the phenotype. Jean is wary of going completely digital with his phenotype data because of stories he’s heard from other colleagues who have lost lots of work. However, he does transcribe data from paper to an excel sheet. He keeps the paper copy and sometimes refers back to it to jog his memory. The genomic data comes off of a sequencing machine and is assembled by a computer. Jean has two types of sequence data: whole genome and transcriptome. The assembled genomes he is willing to share immediately and thinks others should do the same. The transcriptome data are used to answer a biological question and thus are more sensitive. He would be willing to share the raw transcriptome data after publication. Repositories exist for genome data, but not for raw phenotypic or raw sequence reads. Jean uses standard gene nomenclature to describe mutants, but feels unqualified to handle metadata. There are no metadata standards for his discipline, probably because researchers are still trying to figure out how to handle and analyze the genomic data. He knows the plant ontology exists, but doesn’t use it because it does not serve his needs – too general.
- Queries
- Find all available data about gene expression in Arabidopsis and serve it up in a usable format
- Operations/Tasks
- reanalyze soil microbe metagenome data that have been collected with other crop species
- aggregate data about 500 tomato plant mutations from multiple laboratorie
1. find all existing data sets with the same experimental design as his own work so he can compare the results to his data.
- Data sets and associated metadata
- Metrics of completion/success
- Ontologies
- MEINS