Use Case 4 - Citizen Science - Biodiversity
species occurrence based on point data (e.g., bird counts)
- Goal
- Enable researcher to discover, access, and integrate a
number of potentially useful covariate data to assist with occurrence
modeling effort.
- Still need to elaborate upon useful linkages with genomics and other ecological/behavioral/trait observational data (nesting location and timing, dietary characteristics), via ontologies.
- Summary
-
Use Case 1: Predicting species occurrence based on point data (e.g., bird counts) and other biotic & abiotic data. Want to estimate population densities (presence/absence; abundance) of birds (just as an example; others include specimen location information, road-kill sitings, etc.) based on growing database of organismal occurrence data (species/location/date).
Assist researchers in locating and assembling for further analyses, data sets of covariates (predictor variables) to inform models of occurrence and abundance. Scope expansion-- if one has certain species, what behavioral/life history characteristics are available for these?
- Queries
- 1) what
variables are available as covariates for my occurrence data-- is there
information about mean daily temperature at lat/lon over xx time span?
- thematic search: weather/climate; soil, land-use, human demographics, hydrology, habitat type
-
return number of data sets, can be searched by facets: sector, realm,
institution, author, grain/resolution (in space-time) --> semantics
of these can be elaborated and referenced from ontology
-
how determine the catalog of measurements that are available, and the
provenance of their values-- modeled surface, nearest gage reading,
etc. E.g. there are many ways to estimate a temperature at some given
lat/lon. Lots of datasets doing it for some given lat/lon. Which
should analyst use? How would semantics guide in resolving these?
- what is the nearest freshwater source (stream/pond/lake) to lat/lon? what is growth rate of human popu over last XX years at lat/lon? what is land-use pattern at lat/lon? what are areas having XX diversity of birds at YY time? what studies about [nesting, feeding] in bird sp. X have been done in region Y?
- Operations/Tasks
- must have an easy way of discovering and acquiring useful abiotic and biotic data "coverages" to associate with point data; express model outputs as some standard set of coverages to inform other analyses
- enable flexible querying of a variety of geospatial coverage data relative (expressed in common projection with access to " catalog" of well-defined attributes, which reveal capacity to drill-down or roll-up those measurements, and include details as to their provenance-- owner, methodology for collection, etc.) to those point data in order to better understand underlying ecological drivers for those densities; enable extraction of coverage data as single-value supplements to table (e.g. landuse=agric; drill-down to soybean farm)
- must have flexible ways to define 'co-location'-- interest might be in radius of importance of features for breeding birds-- proximity to food, water, shelter.
- requires extracting coverage data values of a variety of abiotic and biotic data (land use, human census info, meteorological data, with flexible radii of relevance. E.g. proximity to freshwater-- 1km away from 3 major bodies of water or 100m away from small stream. Temporal aspects of importance as well.
- Need to incorporate specific data sets and associated metadata: NLCD; MODIS, "individual researcher" observational data sets and intepretation of attributes from these
-
CONCRETE: integrate bird point data with MODIS data with meteorological data. Determine programmatic access to these sources. (Maybe take lat/lon and semantic map that to Obs model to unify lat/lon anywhere appears, e.g. met data and bird point data).
eBird mapped to ObsOnt. Weather underground or IRI Data lib mapped to ObsOnt-- keying on lat/lon in both.
phylogenetic commu analysis, trait-commu analysis use cases to be developed?
- Data sets and associated metadata
- TBD
- Metrics of completion/success
- We refine some of above points, and predetermine subset of results. Challenge expected to automate more comprehensive results (larger number of coverages discovered, selected, rescaled, integrated
- Ontologies TBD
OTHER MATERIAL
Participants
- Mark*, Shawn, Steve, Damian, Benno, Flip
- General Integration Challenges--
- resolve scale issues: everything has to be at same resolution--system must be able to aggregate so that can union commensurate measures of data.
- measurement scale alignment-- simple example fahrenheit/celsius
- rebin/rescale grids to integrate on common scale-- semantic specification of operation across data sets?
- Ecological genetics-- need to engage some practitioners to help flesh this out.
- Trait-based community analysis-- need practitioners to help define
-
thematic alignment-- attributes referred to using synonyms, similar-to,
coarser-- resolve with ontologies (need specific example)
-
Tagging and retrieving data based on epiphenomena-- e.g. northern
migrational front/pattern; involves tracking delta/changes-- when are
birds in Mississippi River Delta; when are birds reaching Great Lakes;
generally indicating trends--** feedback from analysis/models into
metadata/data. Like in genomics commu-- include both highly structured tagging and idiosyncratic taging.