\documentclass{article}

\usepackage{multirow}
\usepackage{graphicx} 

\setlength{\topmargin}{0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textheight}{9.0in}
\setlength{\textwidth}{7.0in}
\setlength{\oddsidemargin}{-0.2in}
\setlength{\evensidemargin}{-0.2in}
%\setlength{\parindent}{0.25in}
%\setlength{\parskip}{0.25in}

\title{Conceptual Model for Ecological Data Management}
\author{}
\date{}
\begin{document}
\maketitle

This draft compares OBOE and several other conceptual models (CM), OGC O\&M
model and EQ in catching the semantics of observations and
measurements of ecological data. The comparison shows that OBOE is highly compatible with
other conceptual models and also can accommodate many important domain
ontologies, which are widely used in the ecology domain. 

\section{Introduction}

{\bf How to evaluate the quality of a conceptual model (CM)}\\
This question is posted based on the need on evaluating which conceptual model 
(e.g., OBOE, O\&M, etc.) is a good one, or is better than the other. 
Unfortunately, the literature \cite{DBLP:journals/dke/Moody05} shows that
there is no well-formed standard way to evaluate the quality of a conceptual model. 
On the contrary, most of the conceptual models are evaluated in an {\em ad hoc} manner. 
However, this article highlights a major principle in evaluating a conceptual model: 
a conceptual model is valuable only it is used in practice. 

Following this principle, we consider several factors in building our final model 
(either OBOE or an extended OBOE) in SONET project. The several
factors include:
\begin{enumerate} 
\item How compatible is a CM with other conceptual models? 
\item How easy it is for a CM to accomodate other domain knowledge?
\item How easy it is to perform operations on the model? 
\end{enumerate}

In what follows, we do analysis of OBOE from the above three
perspective. 

\section{Terminology analysis of CMs}
To start with, we first analyze what are the ``things'' or ``objects''
that each model is representing. Then, we show their term
correspondences. Finally, we sketch the algorithms to make them
compatible. 

\subsection{OBOE}
In OBOE, an {\em observation} represents any {\em measurement} of some {\em characteristic} 
(attribute) of some real-world entity or phenomenon.
A {\em measurement} consists of a realized value of some characteristic of an {\em entity}, 
expressed in some well-specified units (drawn from a measurement standard) 
One observation can provide {\em context} for other observations. 
E.g., observations of spatial or temporal information often provide
context for some other observation. 
Using the OBOE concepts, we can describe the type level or instance
level relationships. 
E.g., an entity type has characteristic types.
Or, an entity instance has characteristic instances. 

{\bf HP: more to come.}

\subsection{OBOE and EQ}

In EQ (Entity Quality) system \cite{EQwiki}, the key terms are as
follows. 

\begin{itemize}
\item Entity: describes some object in the real world. (e.g., dorsal fin)
\item Quality: describes an entity's attribute and its attribute
  value.  (e.g., shape = round, means dorsal fin's shape is round. ).
\item Character: is composed of Entity and Quality Attribute to
  represent the meaning of which entity's which attribute. E.g.,
  dorsal fin's shape.
\item Character State: Quality value (e.g., round, to represent dorsal
  fin's shape is round). 
\end{itemize}

\subsection{OBOE and O\&M}
To compare OBOE and O\&M, we focus on several key terms. 

First, O\&M also uses the terminologies {\em Observation} and {\em
  Measurement}. But their definitions show the different meaning. 
In OBOE, these two terms
refer to something, but in O\&M, these terms denote some
action.  
\begin{itemize}
\item An {\bf Observation} is an \underline{action} with a result which has a value describing
some phenomenon. (p1 \cite{OM}) or {\em an act of observing a
  property}.  (\cite{OMISO} Clause 4.10)
\item {\em Measurement is a set of operations having the object of
  determining the value of quantity.} (\cite{OMISO} Clause 4.9).
\end{itemize}

Second, terms used to describe real world object (OBOE) or phenomena
(O\&M). In OBOE, an {\em Entity} can have many {\em Characteristics}. 
Parallel to this, in O\&M, a {\em feature type} can have many common
{\em characteristics} (i.e., {\em property-type}s) shared by feature
instances. 
So, the corresponding relationship at this level is:
{\em Entity} in OBOE corresponds to {\em feature type} (or more
specifically, {\em feature of interest}. 
{\em Characteristic} in OBOE corresponds to {\em property type} (or
more specifically, {\em observed property} in
O\&M. 

\begin{itemize}
\item The {\bf featureOfInterest} is a feature of any type (ISO 19109, ISO 19101), which is a
representation of the observation target, being the real-world object regarding which the
observation is made. (p12 \cite{OM})

\item The {\bf observedProperty} identifies or describes the phenomenon for which the observation
result provides an estimate of its value. It must be a property associated with the type of
the feature of interest. (p12 \cite{OM})
\end{itemize}
 
Third, terms used to describe the observation and measurement that are
made on the objects of phenomena.
OBOE uses the terms {\em observation} and {\em measurement} to denote
these. O\&M uses {\em observation result} to denote the observation
made. 
So, here, we can see that the  %$observation_{OBOE}$and
$measurement_{OBOE}$ corresponds to $observation~result_{O\&M}$.
In OBOE, a measurement is on a characteristic; 
in O\&M, a result is on a property, which corresponds to OBOE
characteristic. 
The process used to get the measurement (or result) is called
$protocol_{OBOE}$ and $observation procedure_{O\&M}.$ 

\begin{itemize}
\item The {\bf procedure} is the description of a process used to generate the result. It must be
suitable for the observed property. (p12 \cite{OM}) Or, {\em method,
  algorithm or instrument, or system of these which may be used in
  making an observation.} (\cite{OMISO} Clause 4.11)

\item The {\bf result} contains the value generated by the procedure. The type of
the observation result must be consistent with the observed property,
and the scale or scope for the value must be consistent with the
quantity or category type.  (p13 \cite{OM}). Or, {\em Observation
  result} if an estimate of the value of a {\em property} determined
through a known procedure. (\cite{OMISO} Clause 4.13)
\end{itemize}

Fourth, terms used to describe the measurement (result) values. 
The values can be categorical or numerical. 

$ObservationContext_{O\&M}$ (\cite{OMISO} Fig 2, Clause 6.2.4 ) is equivalent to $context_{OBOE}$. 

%\begin{itemize}
%\item $OM\_observation$ represents a feature type. 
  %It has 5 attributes.
  %\begin{itemize}
  % \item phenomenonTime({\em om:phenomenonTime} in schema)
  % \item resultTime({\em om:resultTime} in schema)
  % \item validTime
   % \end{itemize}
%\end{itemize}
The summarized term correspondences used in both models can be found in
Table \ref{tb:cmcomp}. 

\subsection{Techniques comparison used in OBOE and O\&M}

OBOE uses OWL DL (Wed Ontology Language Description Logic) to describe
its model while $O\&M$ utilizes the UML  (Unified Modeling Language) to
represent its conceptual schemas.
In what follows, we briefly compare these two techniques. 


UML defines in a shared package a common core set of language
structures. These constructs focus on the {\bf
    representation} of {\bf static} structural information, i.e.,
  ``class diagram'' \cite{comp_uml_owl}.  UML is generally used together with Object
Constraint Language (OCL \cite{ocl}) to complement to its static
modeling feature. Both of these are used in O\&M. In what follows,
when we use the term UML only to represent UML without the OCL. 
However, some study also use UML Full to represent UML+OCL. 

There are a lot of similarities \cite{DBLP:conf/semweb/BrockmansVEL04,DBLP:conf/ijcai/CranefieldP99,DBLP:conf/ismis/CaliCGL02} between UML (+OCL) and OWL DL. 
UML uses a diagrammatic representation and an XML representation
called XML Metadata Interchange (XMI) \cite{xmi}. 
OWL uses XML syntax-ed RDF and OWL language. 
Both UML and OWL define object-centered, intention-based
representation of knowledge about a system \cite{comp_uml_owl}.
Both languages have two layers of knowledge representation, instance
level and type level. 
Some similarities are
\begin{itemize}
\item Represent an instance individuals: {\em owl:individual} in OWL and
{\em InstanceSpecification}
Define class membership, OWL uses the RDF {\em type} relation or by
using the name of the classifier. UML uses {\em instanceOf} or the
colon-based naming convention. 
\item Define Class. Using both languages, we can define the sub class relationship ({\em
  rdfs:subClassOf} and {\em A subClassOf B}), equivalent class
relationship ({\em owl:equivalentClass} and {\em A equivalentClass
  B}), disjoint class ({\em owl:disjointWith} and {\em A disjointWith
  B}). It also support {\em owl:unionOf}
\item Property values are defined using the the name of a property or
relating it to an object or data value. 
\begin{itemize}
\item {\em owl:ObjectProperty, owl:DatatypeProperty} are mapped to binary,
unidirectional UML associations. 
\item {\em owl:inverseOf,  owl:FunctionalProperty} can be represented in UML using bidirectional
    associations to combine two inverse, binary and unidirectional
    associations. 
\end{itemize}
\end{itemize}

However, these two techniques also different a lot. The basis difference is on its
underlying {\bf assumption} that OWL takes open world assumption while
UML uses close world assumption. This difference affects a lot of
interpretations of the constructs. 

In addition, UML does not support {\bf synonyms} while OWL DL supports it. 
OWL allows the definition of synonyms for
classes, properties and individual descriptions. It has explicit
constructs to define equivalent classes, properties, and
individuals. This way, the same real-world element may have be
referred to as different names. So, OWL provides the flexibility to describe the same
real world system by using different kinds of terminologies based on
people's preferences. 
Besides in OWL, the identifiers for classes, properties, and
individuals are distinct. I.e., the same name always refer to the same
real-world element. 
To summarize, in OWL, the function from class,property and individual
names to real-world element is non-injective because different names
to refer to the same real-world element. 
UML on the other hand, does not support synonyms. I.e., if two
different names represent two different interpretations. 
Also because of this, UML has the Unique Name Assumption (UNA). 
So, the function from name to real-wold element is injective. I.e.,
the same name refers to the same elements and different names have
different interpretations. 
Since UML does not have the  Unique Name Assumption (UNA), it has
{\em owl:sameAs}, {\em owl:differentFrom},  and {\em owl:AllDiffereent} to state
the identity of individuals. 


Besides the above modeling differences, 
several studies \cite{comp_uml_owl}  in comparing UML (+OCL) and OWL show
that 
\begin{itemize}
\item UML itself is generally used as a representation-oriented set of
  paradigm. This is similar to the Entity-Relationship modeling
  paradigm in database development. Since UML provide intuitive
  diagrams to describe concepts, at a higher analysis level,
  UML is more appropriate than OWL DL for people to discuss ideas in the design. 
\item OWL DL provides many constructs to represent knowledge by a
  vocabulary and logical definitions. 
\item Due to its static nature, UML itself cannot represent a lot of
  constraints that are needed in describing web semantic services. 
  A rough list can be retrieved from  \cite{comp_uml_owl} as follows. 
\begin{itemize}
\item {\em owl:oneOf,owl:intersectionOf, owl:complementOf} OWL allows to define a class by constraining it's instances be in an
enumeration list by using {\em owl:OneOf}. However, UML only allows the the enumeration of data types. 
OWL allows to directly define a class to be intersection of other
classes using {\em owl:intersectionOf}. In UML, we cannot directly
define the intersection class. But we can define a class to the
sub-class of several classes. So, an instance of this class is implicitly instances of its
super-classes.
\item  {\em owl:allValuesFrom, owl:someValuesFrom,
  owl:hasValue,owl:maxCardinality, owl:minCardinality,
  owl:cardinality}
\item {\em rdfs:domain, rdfs:range, owl:TransitiveProperty, owl:SymmetricProperty}
\end{itemize}
\item OCL can represent the equivalent semantics: 
     \begin{itemize}
     \item{\em owl:allValuesFrom, owl:someValuesFrom,,owl:maxCardinality,
    owl:minCardinality,owl:cardinality}. 
     \item {\em rdfs:domain, rdfs:range, owl:InverseFunctionalProperty, owl:TransitiveProperty, owl:SymmetricProperty}
    \end{itemize}
\item Even though OCL enriches the representation of UML a lot, it
  still cannot fully represent some OWL DL constructs: 
{\em owl:hasValue, owl:subPropertyOf, owl:equivalentProperty}
\end{itemize}

On one hand, we can see how UML (+OCL) represent the same semantics which
can be expressed using the OWL DL constructs. 
To summarize, if UML is not used together with OCL, it can best serve
as a graphical representation of the model. But it cannot support most of
the more precise constraints that can be defined using OWL DL. 
On the other hand, OWL does not have the counterpart concepts for {\em
  UML's aggregation and composition relationships}. 

With OWL DL, reasoning can be supported using some tools such as Pe
UML together with OCL also support reasoning. In addition, there
exists tools to support this. E.g., OCLE \cite{ocle}, Oclarity: Plugin
for Rational Rose \cite{oclarity}, etc. (A more complete list can be
found at
\verb|http://www.jordicabot.com/research/OCLSurvey/index.html|).  

{\bf NOTE from Huiping: not sure yet how the OCI reasing can be
  done. What's the capabiliy difference between the OCI reasing and
  OWL DL reasoning. }

{\bf NOTE: from discussion with Mark\\
(1) Add the content of the Annotation content with observatioin type,
measurement type, etc. \\
(2) OBOE + annotation language ==> OBOE. This need to be discussed
with Shawn. What OBOE specification should include. \\
(3)  What are the different expressive power of the implementation
techniques for O\&M and OBOE? 

Need to add: \\
(1) Annotation specification? 
Relationships that can be caught by OBOE, can they be expressed using
O\&M? 
E.g., one observation can have multiple measurements? 
}


\subsection{Term correspondences}
Table \ref{tb:cmcomp} summarizes the above analysis with term
correspondences. 

\begin{table}[htb]
\begin{tabular}{|l|l|p{1.8in}|p{2.5in}|}
\hline ER & OBOE & O\&M &EQ\\\hline
                & Entity &Observation::featureOfInterest& Entity \\\hline
Entity       & Observation & & \\\hline
Characteristic & Measurement & OM\_Observation& Quality
value or Character State \\\hline
               & Standard    & Result type & \\\hline
               & Characteristic & Observation::observedProperty&
               Quality attribute or Character State\\\hline
Relationship   & Context &  ObservationContext& \\\hline
Value             & Characteristic Value &  Result& \\\hline
                  & Protocol & Observation:procedure& \\\hline
& & Observation::phenomenonTime&\\\hline
& & Observation::resultTime&\\\hline
\end{tabular}
\caption{CM comparison}\label{tb:cmcomp}
\end{table}

\begin{figure}[htb]
\includegraphics{oboe}
\caption{UML representation of OBOE with major classes}
\end{figure}

Relationships between  the concepts: 
\begin{itemize}
\item {\em OWL functional property} is convered to many to one mapping
  cardinality in UML. E.g., OBOE {\em characteristicOf} is a fFunctional property, i.e., it
  infers that $Characteristic_{OBOE}$ is for EXACTLY ONE  $Entity_{OBOE}$. 
So, it is many to one relationship. \\
O\&M uses class {\em OM\_Observation} to connect {\em
  featureOfIntereste} and {\em observedProperty}. Since the
relationship from {\em  featureOfIntereste} to {\em OM\_Observation}
\item {\em OWL transitive property}. 

\item SOME restriction {\em owl:someValuesFrom} . If a class A has

{\tt
\verb|    <rdfs:label>Characteristic</rdfs:label>|\\
\verb|        <owl:equivalentClass>|\\
\verb|            <owl:Restriction>|\\
\verb|                <owl:onProperty rdf:resource="#hasCharacteristicValue"/>|\\
\verb|                <owl:someValuesFrom rdf:resource="#CharacteristicValue"/>|\\
\verb|            </owl:Restriction>|\\
\verb|        </owl:equivalentClass>|\\
\verb|...|
}
It means that at least one of the {\em hasCharacteristicValue}
property of a {\em Characteristic} must point to an individual that is
a {\em CharacteristicValue}.
 
\item ONLY restriction {\em owl:allValuesFrom}

\end{itemize}

{\bf HP: \\
Q1: SOME and ONLY for hasCharacteristic, etc. ObjectProperty.\\
Q2: ER paper: observation and measurement should be one to many
relationship, but not many to many relationship. 
}

\section{Model compatibility}

This section, we show the algorithms on how to convert data complying
with different models and illustrate the with several examples. 

\subsection{Conversion between OBOE and O\&M}
In what follows, we describe how to convert OBOE to O\&M compliant
file and also the conversion in the opposite direction. 

The brief algorithm to convert an O\&M-compliant document to OBOE
model.
\begin{itemize}
\item For each {\em om:featureOfInterest}, generate an {\em OBOE::Entity}
  instance $e_i$ such that $o_i$ has property {\em ofEntity} $e_i$
\item For each Observation in O\&M, denoted using {\em om:observation},
 generate an  {\em OBOE:observation} instance $o_i$. 
\item For each  {\em om:result}, generate an  {\em OBOE:Measurement} instance
  $m_i$ such that $m_i$ has the property {\em measurementFor} $o_i$.
\item For each  {\em om:procedure}, generate an  {\em OBOE:protocol} instance
  $p_i$ such that $m_i$ has the property {\em usesProtocal} $p_i$.
\item For each  {\em om:observedProperty}, generate an
   {\em OBOE::Characteristic} instance $ch_i$ such that $m_i$ has property
  {\em ofCharacteristic} $ch_i$. 
\end{itemize}

More to come on the sampling and specialized observations. 

\subsection{Mappings in other domains}

We generated an example for mappings in other domains using O\&M and
OBOE. 
Table \ref{tb:cmeo} shows the mappings in the Earth observation
domain.
 
\begin{table}[htb]
\begin{tabular}{|p{1.5in}|p{1.0in}|p{2.0in}|p{2.0in}|}
\hline EO & example & O\&M &OBOE\\\hline
 observation value, measurement value, observation & 
35$\mu g/m^3$& 
Observation:: result&
measurement: {\em hasValue} 35 with unit $\mu g/m^3$ \\\hline
method, sensor&
ASTER, U.S.EPA Federal Reference Method for $PM_{2.5}$&
Observation:: procedure&
protocol: measurement {\em usesProtocal}\\\hline

parameter, variable&
Reflectance, Particulate Matter 2.5&
Observation:: observedProperty&
Characteristic: measurement is {\em ofCharacteristic}\\\hline

2-D swath or scene&
Sampling grid&
Observation:: featureOfInterest: Sampling Surface&
Entity: measurement is for some observation ({\em forObservation}),
which is of some entity ({\em ofEntity}). \\\hline

Earth surface&
&
SamplingSurface: sampledFeature&
Entity: sub class of an entity\\\hline

3-D sampling space&
Sampling grid&
Observation:: featureOfInterest: SamplingSolid&
Entity: a sub class of an entity\\\hline

media (air, water, $\cdots$), Global Change Maser Directory ``Topic''
&
troposphere&
SamplingSolid:: sampledFeature&
Entity: a sub class of an entity\\\hline

\end{tabular}
\caption{CM comparison with Earth Observations (EO)}\label{tb:cmeo}
\end{table}



\section{Use cases: compatibility of OBOE with domain ontologies}

We have several use cases that we can test the different data models. 
In what follows, we would show how compatible PATO is with OBOE. 

\subsection{PATO}
Phenotype And Trait Ontology (PATO) \cite{patodownload, patowiki} is a
phenotype quality ontology proposed by Ashburner and Lewis. 
This ontology is presented with the purpose of capturing qualitative
and quantitative information about phenotypes in a species-neutral
way. 

PATO is recommended by Open Biomedical Ontologies (OBO) \cite{obo} .
PATO is used in several research groups. 
The entity, attribute, value (EAV) model relies on PATO. 
Nottingham Arabidopsis Stock Center (NASC) database uses the EAV model
to describe mutant phenotypes and natural variants in Arabidopsis \cite{PSO07}. 
Some other EAV model organism databases, e.g., 
ZFIN \cite{zfin, zfin2} and FlyBase \cite{flybase} also uses PATO. 
In addition, Phenoscape \cite{phenoscape} also uses PATO (and other ontologies) to link natural
phenotypic diversity to zebrafish mutants.
\cite{zookeys09} also used PATO as their controlled vocabulary list. 
Because of the wide usage of PATO, it is very important to see that
PATO is compatible with the OBOE model. In what follows, we discuss how to
convert  PATO to OBOE. 

PATO does not provide the detailed information of entities in OBOE. 
In real world, domain scientists generally use PATO and other
ontologies together to annotate datasets. 
This will not be in our discussion focus in this section. 

PATO has six different slims to denote the different views that people
can use to categorize the classes. 
In these six slims, four of them (cell\_quality, abnormal\_slim,
absent\_slim, relational\_slim) are for specific domain usage. 
The other two slims attribute\_slim and  value\_slim are at a more general
level to denote whether the classes (concepts) are attributes or are
attribute values. 

Based on this meaning, the classes denoted with attribute\_slim can be
mapped to {\em characteristics} in OBOE. 
While the classes denoted with attribute\_slim can be mapped to {\em
  Standard Characteristic Value} in OBOE, which is a sub-class of
`Characteristic Value'. 
These standard characteristic values are for a measurement standard.
This measurement standard then has object property `forCharacteristic'
only for the newly characteristic and property `hasStandardValue' only
in the newly created standard characteristic value. 

For example,  class {\em intensity} in PATO denoted with
`attribute\_slim' can be mapped to as a characteristic {\em
  PATO:intensity}.  The sub-classes of intensity in PATO are \{mild,
moderate, severe, `increased intensity', `decreased intensity',
remittent\} are denoted with `value\_slim'. Then they can mapped to a
Standard Characteristic value {\em PATO:intensity value}. 
To impose the constraint that characteristic {\em
  PATO:intensity} can only take values in  {\em PATO:intensity value}.
We can create a measurement standard 
{\em PATO:intensityStandard} it has two object properties
`forCharacteristic'
only for the characteristic  {\em PATO:intensity} and
`hasStandardValue' only  {\em PATO:intensity value}. 

Formally, this can be done in a bottom-up manner. 
\begin{itemize}
\item Create a sub-class $CH_i$ of {\em Characteristic} for a PATO class $C_i$
  denoted with `attribute\_slim'. 
If $C_i$ has a sub-class $C_j$ marked with $`attribute\_slim'$ in PATO, 
then in OBOE, we create a corresponding characteristic $CH_j$ for
class $C_j$ and treat $CH_j$ as a sub-class of $CH_i$ in OBOE. 
\item Create a sub-class $ChSV_i$  of {\em Characteristic Standard Value}
 and map all the direct children of $C_i$ with
  `value\_slim'  to  sub-classes of $ChSV_i$. 
Here  $\langle CH_i, ChSV_i \rangle$ is a corresponding Characteristic-value pair. 
   If $C_i$ has a sub-class $C_j$ marked with $`attribute\_slim'$ in
   PATO,  then we create a sub-class $ChSV_j$ of $ChSV_i$  and map all the direct children of $C_j$ with `value\_slim'  to
     sub-classes of $ChSV_j$. 
\item For each corresponding Characteristic-value pair $\langle CH_i, ChSV_i\rangle$,
  create a measurement standard $MSi$ with two object
  properties: {\em `forCharacteristic' only} $CH_i$ and {\em `hasStandardValue'
  only}  $ChSV_i$. 
\end{itemize}

\subsection{EnvO}

\subsection{Trait Ontology}
This will be provided by Marie-Angelique. 

\bibliographystyle{abbrv}
\bibliography{model}

\end{document}
