ESA joins European effort to create Digital Libraries for science
Spacecraft constantly scan the Earth, creating hundreds of gigabytes of new data products daily. Working with this ever-growing mass of information has made ESA's Earth Observation Directorate a pioneer user of powerful Grid computing.
Now ESA is participating in an ambitious project that applies this same technique to information distribution and retrieval, with the aim of creating 'Digital Libraries' for global scientific collaboration.
The Grid concept is named after the electricity grid, and works in the same way: a geographically dispersed (or 'distributed') network of computers provides users with access to advanced computing services, processing power and memory, enabling the solution of complex tasks beyond the capabilities of a single machine or local network.
Within the field of Earth Observation, use of a Grid enables researchers to access large amounts of spatial and temporal data and perform high-level processing as well as complex applications such as data fusion and modelling, and all within previously impossible timescales. Applying a Grid, tasks that might otherwise take days to accomplish are reduced to a few hours, minutes or even seconds.
New mosaics of the world can be put together as soon as fresh satellite imagery becomes available, as can updated atmospheric maps, vegetation indexes or ocean chlorophyll products.
And rather than having to directly download vast amounts of data in order to locally evaluate new processing algorithms, researchers can directly combine their algorithm with vast amounts of data within the virtual Grid system to test and tweak it based on outputs that are now made available in near-real time, and with much larger geographical or temporal coverage than previously feasible.
With Grid computing for data processing already well established, the next step is to evaluate how Grids can support the predicted 21st century shift to 'e-science'. Whether in the fields of medicine, particle physics or environmental monitoring, the trend is towards large-scale science efforts carried out by diverse scientists and institutions joining together to achieve a shared research goal through networked global collaboration.
The 'virtual research organisations' set up for such temporary enterprises will need common access to very large data collections, large-scale computing resources and a secure, shared context within which to collaborate – the sort of knowledge architecture often termed a 'Digital Library'.
As part of a variety of Grid-related research activities within its Sixth Framework Programme, the European Commission (EC) has begun a project called DILIGENT – short for a DIgital Library Infrastructure on Grid ENabled Technology.
Following on from successful collaboration with the EC during its previous DataGRID project, ESA has become a user partner in DILIGENT. The project aims to integrate Digital Library and Grid technology to create a test-bed system for e-Science, based on the system infrastructure created for the EC's currently running follow-up to DataGrid, called Enabling Grids for E-science in Europe (EGEE).
The concept of a Digital Library is an old one, but what few actual examples that do exist up until now serve single large and stable organisations, and are largely based around text documents, since the computing power necessary to handle multimedia documents and data is rarely available to a single institution.
"Up until now, creating a Digital Library has been very expensive, because of the need to set up all the one-off infrastructure and gather the content required," said Donatella Castelli of Italy's National Research Centre Institute of Information Science and Technology (CNR-ISTI) in Pisa, scientific co-ordinator of the project.
"The objective of DILIGENT is to create an infrastructure for the creation of Digital Libraries on demand, so a new one could be created each time a virtual organisation needs it as a supporting instrument for its activity. Each Digital Library would exist as long as it is useful for the virtual organisation.
"DILIGENT will be built by integrating Grid and Digital Library technologies. The merging of these two different technologies will result in an innovative level of functionality providing the foundations of next generation collaboration environments able to serve many different research and industrial applications.
"A large number of Digital Libraries could be hosted on the same, shared Grid resources at the same time, although the work of individual communities would not be affected by this fact – their own Digital Library would be available to them whenever they need it. The composition of a Digital Library will be dynamic since it will depend on the currently available resources and many other parameters such as usage workload, connectivity and so on.
"This development model will make it possible to avoid heavy investments, long delays and radical changes in the organisations setting up these applications, and so foster the broader use of Digital Libraries as communication and collaboration means."
The high computing power available from the Grid will make possible enhanced Digital Library functionality, such as intelligent retrieval for imagery, video and sound as well as word content.
"The Grid framework will also enable the provision of a number of new functions whose implementation has until now been limited by the high cost in terms of computational, storage and data transfer capacity" Castelli explained. "Examples include multimedia document and geographical information processing, 3D handling and spatial data manipulation."
As an organisation that handles large amounts of data, and collaborates with a large number of other research institutions, ESA's role within DILIGENT is that of a user. To start off with, the Agency is submitting user requirements. Beyond that, ESA will work with others to validate DILIGENT's Digital Library designs in a working context.
"The plan is to create a Digital Library to support activities in the environmental sector, supporting environmental assessment and statutory reporting requirements," added Castelli. "It could also be used for planning responses to environmental accidents."
One example of a possible user community within this sector are the 21 state signatories to the 1976 Barcelona Convention - an agreement dedicated to protecting the Mediterranean marine and coastal environment - along with other actors such as selected coastguard offices, the United Nations Educational, Scientific and Cultural Organisation's Intergovernmental Oceanographic Commission (UNESCO-IOC) and the Regional Marine Pollution Emergency Response Centre for the Mediterranean Sea (REMPEC).
Other users might be non-governmental organisations active in the Mediterranean including the World Wildlife Fund, the International Tanker Owners Pollution Federation Limited (ITOPF) and Mediterranean Oil Industry Group (MOIG), and many scientific bodies such as Italy's Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare.
Just as current use of Grids enables researchers to send their algorithms to the data, rather than having to have a mass of data come to them, this Digital Library would come to its users rather than the other way round, its design being decentralised and service-oriented.
The vast Earth Observation catalogue stored at ESA's European Space Research Institute (ESRIN) in Italy will be just one source of data within this Digital Library, along with relevant databases from its users, and archives of past Mediterranean-oriented scientific activities, such as the now-completed Regional earth observation Application for Mediterranean Sea Emergency Surveillance (RAMSES) project, which evaluated satellite-based oil spill detecting in the region.
Within the virtual Digital Library system, complex data acquisition and fusion procedures could be routinely carried out. To picture how it would work in practice, imagine a user acquiring first radar imagery of oil spills, then overlaying that with complementary optical imagery.
This information could then be overlaid with tracks of major tanker routes to highlight any correlations, then checked against a WWF coastal map highlighting coastlines of maximum biodiversity or the latest mosaic of Mediterranean chlorophyll populations. Finally wave and wind meteorological data could be applied to the data to model the behaviour and impact of these spills, either retrospectively or in near-real time.
DILIGENT formally began in September 2004. The research project has a planned duration of three years. Led by the EC, scientific co-ordination is being carried out by CNR-ISTI while administrative co-ordination is the task of the European Research Consortium for Informatics and Mathematics (ERCIM).
The other scientific partners of DILIGENT are the National and Kapodistran University of Athens, the Swiss Federal Institute of Technology, German firm Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eV, the University for Health Informatics and Technology Tyrol, the European Organisation for Nuclear Research (CERN) and the University of Strathclyde.
The industrial partners are Italian firm Ingegneria Informatica S.P.A, Norwegian company Fast Search and Transfer A.S.A. and Hungary-based 4D Soft Szamitastechnikai KFT.
Besides ESA, the DILIGENT user community also includes Pisa's Scuola Normale Superiore's Centre for the Data Processing of Texts and Images in the Literary Tradition, and Italy's national broadcaster RAI is making its educational audio-video archives accessible for use by the project.