ESA title
Cross-Ontology Browser Communication Flow
Enabling & Support

Enriching EO ontology services using Product Trees (PRODTREES)

15/06/2017 827 views 2 likes
ESA / Enabling & Support / Space Engineering & Technology / Shaping the Future
 Programme:  TRP Workplan  Achieved TRL:  4
 Reference:  T126-301GT  Closure:  2016
 Contractor(s):  SAS (BE), CNR-IIA (IT), NKUA (GR)

 
The demand for aerial and spatial imagery, and derived products, has been increasing over the years, in parallel with technological advances that allow producing a bigger variety of data with an increasing quality and accuracy. As a consequence of these advances, and the multiplication of deployed sensors, the amount of Earth Observation data collected and stored has exploded (Envisat transmitted 250 gigabytes of data every day until the communication was lost in April 2012, while the first three satellites of the Sentinel constellation are expected to beam together around 10,000 gigabytes in the same time range).
Such current huge amounts of EO data may not be annotated and catalogued manually. Existing catalogues and search engines rely on controlled vocabularies, mapped ontologies and reasoning rules, to allow users searching for EO products in an intuitive manner. The solutions developed until now are based on project specific mechanisms and data structures. The EO products are annotated with keywords that make them searchable using a given system but that are not be compatible with others. To overcome this problem, it is necessary to propose a standard that brings a compatibility layer across the products and the software systems.

Cross-Ontology Browser
Cross-Ontology Browser

Objectives
The objectives of the Prod-Trees project are to specify a new convention for the netCDF standard, compatible with the CF-netCDF conventions, with descriptive metadata compliant with state-of-the-art ontologies, develop and extend supporting libraries and tools, and demonstrate the outcome in a semantically-enabled EO products search platform.

The key elements are further explained hereafter:

  1. netCDF (network Common Data Form) is a data model for scientific data, a freely distributed collection of access libraries implementing support for that data model, and a machine independent format. Together, the interfaces, libraries and format support the creation, access and sharing of multi-dimensional scientific data. netCDF supports the encoding of geospatial data, that is, digital geospatial information representing space and time-varying phenomena. The new netCDF convention specified during the project is named EO-netCDF.
  2. CF-netCDF (Climate and Forecast netCDF) is a set of netCDF conventions intended for use with climate and forecast data, for atmosphere, surface and ocean, and was designed with model-generated data particularly in mind. Its main purpose therefore, is to propose a clear, adequate and flexible definition of the metadata needed for  climate and forecast data. The CF conventions define metadata that provide a definitive description of what the data in each variable represents, as well as the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, re-gridding, and display capabilities. This standard is one of the most used set of conventions for the netCDF format in particular in the meteo-ocean communities.
  3. Descriptive metadata are used to annotate the EO-netCDF records with information allowing end-users to easily analyze and figure out the nature and characteristics of the data products. With these data, the users will know in which time range a given EO product is valid, its origins and how it has been generated (e.g. algorithms, processing chains), its accuracy and uncertainty.
  4. State-of-the-art ontologies have been analyzed and assessed for their usefulness and applicability. Mappings have been created between the ontologies and between ontology concepts and the EO-netCDF vocabulary entries. This allows developing tools that permit users to search for products by using concepts they are familiar with, thus helping more accurate retrieval of the products.
  5. Supporting tools and libraries are readily available from various sources. These allow creating, manipulating and visualizing information encoded in [CF-]netCDF in various manners. The most popular open-source software products have been tested against EO-netCDF records.

A semantically-enabled EO products search platform has been used to demonstrate the new features brought by the EO-netCDF convention. The platform re-uses components from the RARE platform (previously developed by Space Applications Services) and integrates a number of new components specifically developed in the Prod-Trees project.

Search Results List Showing Thumbnails and Footprints
Search Results List Showing Thumbnails and Footprints

Achievements and status
Prod-Trees has addressed such issue in two ways: (1) providing data producers/providers with a standardized way to describe and encode datasets (EO-netCDF) with full information about the sensor characteristics, and to search them with proper queryables (OpenSearch extensions); (2) providing data users with the capability to search datasets based on application terms instead of data characteristics.

The following sections describe the main technical achievements that have been reached in the project.

EO-netCDF Convention
The initial objective of Prod-Trees concerning netCDF enhancements is the extension of CF conventions to support EO metadata. The definition of a new convention poses the challenge of how to relate it with other existing (and future) conventions. The EO convention is designed to be compliant with CF-netCDF. To this aim it does not cover what is already implemented in the CF conventions (e.g. coordinate reference system support), and avoiding overlapping (e.g. of standard names). Therefore, when needed, the CF convention can be used along with the EO convention to obtain a netCDF dataset compliant with both CF and EO conventions. The same approach has been adopted for compliance with the draft uncertainty conventions (netCDF-U). As a design choice, EO conventions only cover remote sensing data – without considering in-situ data – supporting EO concepts from multiple ontologies. Based on the requirement analysis current EO conventions support the EOP profile of O&M. It is noteworthy that, being EO conventions based on a profile of the general-purpose Observation & Measurements data model, the extension to other data, such as in-situ data, is in principle feasible. The final version (v1.2) of EO conventions for netCDF has been released and provides the full mapping of the latest EOP specifications available ("Earth Observation Metadata profile of Observations & Measurements"- OGC 10-157r4) and supports encoding in both v4 and v3. The EO conventions documents are available online. The standard names of the EO convention have been also collected in an EO vocabulary stored in RDF format.

Conversion Software and Conversion of the Test Data
In order to prepare test data specific Java software modules have been developed. They carry out:

a) the data transformation from the original format (e.g. JPEG, GeoTIFF) to netCDF;

b) the metadata mapping from the original format (e.g. SAFE annotations) to EO-netCDF. The mapping rules have been documented in dedicated reports;

c) during the data transformation, portrayals for quick look have been also generated to be shown during the discovery phase for evaluation purposes.

Semantic Extensions to the Discovery and Access Broker (DAB)
The Discovery and Access Broker is part of the GI-suite Brokering Framework, a suite of technologies developed by CNR-IIA to implement an information Brokering Framework that allows for uniform semantically enriched discovery and access to heterogeneous geospatial data sources. In particular the following components are used in Prod-Trees:

a) The Discovery broker (GI-cat) is a component which is able to connect disparate (distributed and heterogeneous) metadata sources, exposing them through a set of standard catalogue interfaces. By means of metadata harmonization and protocol adaptation, it is able to search metadata from different sources and transform query results to a uniform and consistent metadata model.

b) The Semantic Enhancement Module (GI-sem) is a component which implements semantic query expansion. If the semantic query is enabled (by configuration), when a query includes a keyword, it is passed as a parameter of a semantic query to a set of connected knowledge bases to search for "related" terms. Each of the resulting terms is then used as a keyword in a separate geospatial query. The results are then assembled to provide the complete response to the user.

Ontologies and Ontology Mapping
In order to facilitate the search of EO products by end-users, several ontologies in use in EO domains have been mapped to the vocabulary used in the new EO-netCDF convention. More precisely, a number of ontologies have been mapped to a main ontology, and this ontology has been mapped to the EO-netCDF vocabulary. This second mapping is implemented as reasoning rules. As a result, users may use in a semantic-enabled user interface terms they are familiar with to initiate searches for EO products.

The Ontology Service
The ontology concepts and the mapping relationships displayed in the Cross-Ontology Browser are fetched from the server using HTTP requests. The Prod-Trees ontology browser communicates with GI-sem, which in turn interacts with a Strabon (RDF Datastore) endpoint that contains the definition of the ontologies and of the mapping links. The fetched data is passed to GI-sem and then back to the client where it is displayed. GI-sem exposes an OpenSearch interface which supports the OGC "OpenSearch Extension for Earth Observation" (OGC 13-026) in addition to the standard geographical (geo) and temporal (time) extensions. GI-sem operations allow searching for concepts, navigating through related concepts, and obtaining concepts properties.
When the user selects a concept in the Cross-Ontology Browser, all the available information about this concept (labels, description, notes, mappings, etc.) are fetched and displayed. Multiple requests are sent to GI-sem for retrieving the concept properties.

The Semantic Search Form
The Semantic Search Page is the landing page of the Prod-Trees platform. It provides the most straightforward and convenient way of searching for products as it allows performing a search by providing a single free-text search string and the results are displayed in the same page.

The Cross-Ontology Browser
The Cross-Ontology Browser allows the user to navigate within and across the ontologies supported by the platform. Its role is to support the user in the query creation phase, as a disambiguation and discovery tool. In particular, it is designed to help the user exploit the knowledge contained in the ontologies, by providing relevant information for each concept and by highlighting the connections between different (but related) concepts belonging to the same or the other ontologies.
The Cross-Ontology Browser is a JavaScript application that fetches the ontology data from the GI-Sem component and provides a graphical and interactive representation to the users.

The EO Resources Reasoner and the EO-netCDF Reasoning Rules
The Prod-Trees platform uses the EO-netCDF Resources Reasoner for applying reasoning rules. The EO-netCDF Resources Reasoner is developed as a SOAP-based Web service which accepts requests on an HTTP interface. The reasoning process is performed in two steps: in the first step, the application term of the ontology is mapped to a number of parameters, called Application Requirements Parameters, and each parameter is given a specific value. This mapping is modeled as a number of rules expressed in XML format; in the second step, a translation table (two "columns" encoded in an XML document) is used to map the parameters to EO-netCDF queryable attribute.

The Multi-Criteria Search Form and the EO-netCDF Model Browser (WebUI)
The Multi-Criteria Search page provides a secondary, more traditional, mean to search for EO products. The primary (semantic-enabled) means is provided by the Semantic Search Page/Form. It supports the traditional "What", "Where" and "When" fields and also allows specifying a variable amount of EO product specific filters.
The EO-netCDF Model Browser contains the following parts: the model tree, lists the queryable attributes and the information about the selected attribute.

Benefits
Prod-Trees enabled the advanced search through a specific knowledge base that maps application terms to relevant sensor characteristics. This approach decouples data description from data usage, greatly enhancing the potential exploitation of EO data.

Next steps
Future EO-netCDF extension, e.g. based on EOP standard revisions, or including other relevant ontologies, may support new sensors (potentially including in-situ observations, crowdsourcing, etc.).