Scientific Observations Data Interoperability Challenge
Overview
Effectively sharing and integrating environmental data poses significant challenges due to the broad range of data types, structures, and semantic concepts used to represent and describe data sets, and the relatively few standardized methodologies for constraining how data are collected. A number of recent efforts have adopted approaches for enhancing the interpretation and interoperability of environmental data using models explicitly structured around scientific observations. These models generally include a number of core concepts for describing data such as measurements, units, measured traits, measured values, observed entities, and measurement context.
Goals
The goals of the Scientific Observations Data Interoperability Challenge are to gain an understanding of the capabilities of current observation models and supporting systems, and to identify the issues that must be addressed before interoperability among these approaches can be achieved. The specific aims of the challenge are to:
- Understand the similarities, differences, and scope of the existing models for describing scientific observations
- Understand the main modeling concepts and relationships used by the different approaches
- Understand the services offered by systems supporting each approach, e.g., for data discovery, integration, etc.
- Identify approaches for enabling interoperability among the different approaches and systems
- Bring together a community to further develop interoperability solutions for sharing and integrating environmental data
- Further define and evaluate a core observation model and set of services to enable improved interoperability among systems
To help achieve these goals, each team participating in the challenge is asked to represent within their scientific observations model a set of use cases from a range of scientific disciplines (i.e., oceanographic, atmospheric, ecological, hydrological, genomic, socioeconomic data). These use cases (described below) will provide a standardized basis for comparing the different models and approaches.
We will create a page on the SONet website for each participating team to post their results. Participants will discuss and refine the parameters for the Data Interoperability Challenge during MEETING, DATE at LOCATION. A further goal of the workshop will be to identify a core observations model and services for enabling interoperability among existing observation-based systems. We also anticipate organizing a special journal issue in which challenge participants will have the opportunity to contribute articles describing their results and approaches in addition to articles on broader issues concerning the sharing and integration of environmental data stemming from the workshop.
Use Cases
The use cases cover several representative domains of science (oceanographic, atmospheric, ecological, genomic, socioeconomic, hydrological) and several types of common data products (tables, rasters, vector, time series), as well as a range of tasks that start simple and grow increasingly complex. Each use case consists of one or more data sets and a set of tasks to be performed (e.g., metadata/discovery queries, queries for specific observations, data integration tasks, etc.). Each use case will provide a metadata description of the data set as well as the data set itself.
Use Case 1: Ecology (updates by Monday)
- Description:
- Data set and associated metadata:
- Queries:
- Integration tasks:
Use Case 2: Oceanography
- Description: User would like to get data from different platforms to compare different measurements on an area and time of interest. For example, to cross-calibrate instruments.
- Queries: User requests data constrained by parameter (e.g. sea Temperature, salinity, dissolved oxygen), time (start, end), geographic location (latitude / long bounding box), depth (min, max), procedure and event of interest. Procedures can be platforms, such as fixed platform (e.g. mooring), a mobile platform (autonomous underwater vehicle ) and a remote platform (e.g. satellite). Event of interest is related to values of parameters. For example favorable upwelling conditions (winds at mooring SE at more than 5 m/s).
- Data sets and associated metadata:
- Mooring data in the Monterey Bay for water temperature via SOS getObservation response. ( includes metadata and data). More examples at the NDBC web site.
- Autonomous underwater vehicle (AUV) 2009 missions from MBARI in NetCDF. Includes values for temperature, depth oxygen, nitrate, etc..
- Sea Surface Temperature (SST) from NOAA satellites available at Coastwatch. More here.
- Integration tasks: Display data from these different platform at the same time. Find when they overlap using radius of 5 KM and time < 4 hours, and depth < 10 m).
Use Case 3: Geology
- Description:
- Data set and associated metadata:
- Queries:
- Integration tasks:
Use Case 4: Social Science (updates by Monday)...
- Description:
- Data set and associated metadata:
- Queries:
- Integration tasks:
Challenge Instructions
Each participating team should carry out the following steps for each use case to complete the challenge.
- Determine how the observations within each Use Case data set would be described or represented using your model. Show how the observations are expressed using your model on your team page.
- Use your system to represent the observations contained within the data set directly. Describe how the observations were converted from the data set to your system (e.g., done manually, via a script, or through a mapping language). If your approach includes a serialized view of the representation, please attach it to your page.
- Use your system to perform the tasks defined for the use case. Show how the tasks were performed (e.g., the query expression used) and the result of performing the tasks on your team page. Outline any strengths or weaknesses of your system in performing each task, such as particularly fast or slow query execution.
- If some of the observations could not be captured in your model, or if the tasks could not be performed, add a brief explanation of why this was not possible using your approach.
- Provide links to background material (e.g., papers, etc.) needed for others to understand your results.