ESSI – Earth & Space Science Informatics
ESSI1.1 – Informatics in Oceanography and Ocean Science
EGU2020-873 | Displays | ESSI1.1
The new online Black Sea Oceanographic DatabaseElena Zhuk, Maxim Vecalo, and Andrey Ingerov
The new online Black Sea Oceanographic Database
Elena.V. Zhuk
Marine Hydrophysical Institute, Russian Academy of Science, Russia
alenixx@gmail.com
The new improvements of the Black Sea Oceanographic Database (BSOD) dedicated to the online access of the hydrological and hydro-chemical data, taking into account users priorities, data types, methods and time of data access are presented.
According to the results of the free DBMS analysis, the PostgreSQL object-relational DBMS was selected for archiving the data in the BSOD. PostgreSQL provides high performance and reliability and the ability to work with a big data. Moreover, the PostgreSQL has the functions allowing to work with GIS objects, using the PostGIS extension and has built-in support for poorly structured data in JSON format. For the development provided the capability to select large data set in accordance with the criteria specified by metadata selection. Taking these two features into account, the part of the database responsible for accessing the metadata, was designed for interactive transaction processing (OLTP access template), while the other part, responsible for the in-situ data archiving was developed in accordance with the “star” architecture, which is typical for the OLAP access template.
After analyzing the oceanographic in-situ observations, the following main entities were identified: Cruise, Ship, Station, Measurements, as well as Measured parameters and the relationships between them. A set of attributes was compiled for each of the entities and the tables were designed. The BSOD includes the following:
- Metadata tables : Cruises, ships, stations, stations_parameters.
- Data tables: measurements.
-Vocabularies: vocabularies were constructed using the SeaDataCloud BODC vocabularies parameters.
-Referencedata tables: GEBCO, EDMO, p01_vocabuary, p02_vocabuary, p06_vocabuary, l05_vocabuary.
To provide the online data access to the Black Sea Oceanographic Database, a User Interface-UI was implemented. It was developed using jQuery and mapBox GL javascript libraries and provides visual data selection for date period, cruises, parameters such as temperature, salinity, oxygen, nitrates, nitrites, phosphates and other metadata.
Acknowledgements: the work was carried out in the framework of the Marine Hydrophysical Institute of the Russian Academy of Science task No. 0827-2018-0002.
Keywords: Black Sea, oceanographic database, PostgreSQL, online data access, Geo-information system.
How to cite: Zhuk, E., Vecalo, M., and Ingerov, A.: The new online Black Sea Oceanographic Database , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-873, https://doi.org/10.5194/egusphere-egu2020-873, 2020.
The new online Black Sea Oceanographic Database
Elena.V. Zhuk
Marine Hydrophysical Institute, Russian Academy of Science, Russia
alenixx@gmail.com
The new improvements of the Black Sea Oceanographic Database (BSOD) dedicated to the online access of the hydrological and hydro-chemical data, taking into account users priorities, data types, methods and time of data access are presented.
According to the results of the free DBMS analysis, the PostgreSQL object-relational DBMS was selected for archiving the data in the BSOD. PostgreSQL provides high performance and reliability and the ability to work with a big data. Moreover, the PostgreSQL has the functions allowing to work with GIS objects, using the PostGIS extension and has built-in support for poorly structured data in JSON format. For the development provided the capability to select large data set in accordance with the criteria specified by metadata selection. Taking these two features into account, the part of the database responsible for accessing the metadata, was designed for interactive transaction processing (OLTP access template), while the other part, responsible for the in-situ data archiving was developed in accordance with the “star” architecture, which is typical for the OLAP access template.
After analyzing the oceanographic in-situ observations, the following main entities were identified: Cruise, Ship, Station, Measurements, as well as Measured parameters and the relationships between them. A set of attributes was compiled for each of the entities and the tables were designed. The BSOD includes the following:
- Metadata tables : Cruises, ships, stations, stations_parameters.
- Data tables: measurements.
-Vocabularies: vocabularies were constructed using the SeaDataCloud BODC vocabularies parameters.
-Referencedata tables: GEBCO, EDMO, p01_vocabuary, p02_vocabuary, p06_vocabuary, l05_vocabuary.
To provide the online data access to the Black Sea Oceanographic Database, a User Interface-UI was implemented. It was developed using jQuery and mapBox GL javascript libraries and provides visual data selection for date period, cruises, parameters such as temperature, salinity, oxygen, nitrates, nitrites, phosphates and other metadata.
Acknowledgements: the work was carried out in the framework of the Marine Hydrophysical Institute of the Russian Academy of Science task No. 0827-2018-0002.
Keywords: Black Sea, oceanographic database, PostgreSQL, online data access, Geo-information system.
How to cite: Zhuk, E., Vecalo, M., and Ingerov, A.: The new online Black Sea Oceanographic Database , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-873, https://doi.org/10.5194/egusphere-egu2020-873, 2020.
EGU2020-2004 | Displays | ESSI1.1
VM-ADCP backscatter data management using QGISPaola Picco, Roberto Nardini, Sara Pensieri, Roberto Bozzano, Luca Repetti, and Maurizio Demarte
VM-ADCP (Vessel Mounted Acoustic Doppler Current Profiler) are regularly operating on board of several research vessels with the aim of providing 3-D ocean currents fields. Along with ocean currents, these instruments also measure acoustic backscatter profile on a known frequency, that can be of great advantages for other environmental investigations such as the zooplankton migrations. The presence of zooplankton can be detected by a variation of acoustic backscatter changing with the depth at a periodic (diurnal or semidiurnal) variability, related to the vertical migration of these organisms. GIS has proven to be a powerful tool to manage the huge amount of VM-ADCP backscatter data obtained during the oceanographic campaigns. Moreover, this allows to extract relevant information on zooplankton distribution and abundance, even when the monitoring strategy of the experiment does not completely meet the temporal and spatial resolution required for these studies. The application here described has been developed on QGIS and tested on the Ligurian Sea (Mediterranean Sea). In order to obtain the comparability of data from instruments operating at different frequencies and sampling set-up, echo intensity data are converted into volume backscatter strength and corrected for the slant-range. Using high-resolution bathymetry rasters acquired and processed by the Italian Hydrographic Institute, allows to discard the anomalous high backscatter values due to presence of the bottom. Another advantage of the GIS is the possibility to easily identify night-collected data from the daily ones and their spatial distribution, as well as those from the surface and the deeper layer. All the possible combinations can be then visualised and analysed.
How to cite: Picco, P., Nardini, R., Pensieri, S., Bozzano, R., Repetti, L., and Demarte, M.: VM-ADCP backscatter data management using QGIS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2004, https://doi.org/10.5194/egusphere-egu2020-2004, 2020.
VM-ADCP (Vessel Mounted Acoustic Doppler Current Profiler) are regularly operating on board of several research vessels with the aim of providing 3-D ocean currents fields. Along with ocean currents, these instruments also measure acoustic backscatter profile on a known frequency, that can be of great advantages for other environmental investigations such as the zooplankton migrations. The presence of zooplankton can be detected by a variation of acoustic backscatter changing with the depth at a periodic (diurnal or semidiurnal) variability, related to the vertical migration of these organisms. GIS has proven to be a powerful tool to manage the huge amount of VM-ADCP backscatter data obtained during the oceanographic campaigns. Moreover, this allows to extract relevant information on zooplankton distribution and abundance, even when the monitoring strategy of the experiment does not completely meet the temporal and spatial resolution required for these studies. The application here described has been developed on QGIS and tested on the Ligurian Sea (Mediterranean Sea). In order to obtain the comparability of data from instruments operating at different frequencies and sampling set-up, echo intensity data are converted into volume backscatter strength and corrected for the slant-range. Using high-resolution bathymetry rasters acquired and processed by the Italian Hydrographic Institute, allows to discard the anomalous high backscatter values due to presence of the bottom. Another advantage of the GIS is the possibility to easily identify night-collected data from the daily ones and their spatial distribution, as well as those from the surface and the deeper layer. All the possible combinations can be then visualised and analysed.
How to cite: Picco, P., Nardini, R., Pensieri, S., Bozzano, R., Repetti, L., and Demarte, M.: VM-ADCP backscatter data management using QGIS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2004, https://doi.org/10.5194/egusphere-egu2020-2004, 2020.
EGU2020-8073 | Displays | ESSI1.1
Delivering marine data from the cloud using the SeaDataCloud Discovery and Access servicePeter Thijsse, Dick Schaap, and Michele Fichaut
SeaDataNet is an operational pan-European infrastructure for managing marine and ocean data and its core partners are National Oceanographic Data Centres (NODC’s) and oceanographic data focal points from 34 coastal states in Europe. Currently SeaDataNet gives discovery and access to more than 2.3 million data sets for physical oceanography, chemistry, geology, geophysics, bathymetry and biology from more than 650 data originators. The population has increased considerably in cooperation with and involvement in many associated EU projects and initiatives such as EMODnet. The SeaDataNet infrastructure has been set up in a series of projects in last two decades. Currently the SeaDataNet core services and marine data management standards are upgraded in the EU HORIZON 2020 ‘SeaDataCloud’ project that runs for 4 years from 1st November 2016. The upgraded services include a movement “to the cloud” via a strategic and technical cooperation of the SeaDataNet consortium with the EUDAT consortium of e-infrastructure service providers. This is an important step into the EOSC domain.
One of the main components of SeaDataNet is the CDI Data Discovery and Access service that provides users access to marine data from 100 connected data centres. The previous version of the CDI service was appreciated for harmonising the dataset, but also had some flaws towards usability of the interface and performance. Under SeaDataCloud the CDI Data Discovery and Access service has now been upgraded by introducing a central data buffer in the cloud that continuously synchronises by replication from the data centres. The “datacache” itself is being hosted and horizontally synchronised between 5 EUDAT e-data centres. During the implementation of the replication prcoess additional quality control mechanisms have been included on the central metadata and associated data in the buffer.
In October 2019 the actual public launch took place of the operational production version of the upgraded CDI Data Discovery and Access service. The user interface has been completely redeveloped, upgraded, reviewed and optimised, offering a very efficient query and shopping experience with great performance. Also, the import process for new and updated CDI metadata and associated data sets has been innovated, introducing successfully cloud technology.
The upgraded user interface has been developed and tested in close cooperation with the users. It now also includes the “MySeaDataCloud” concept in which various services are offered to meet the latest demands of users: e.g. save searches, sharing datasearches and eventually even pushing data in the SDC VRE. The user interface and machine-to-machine interfaces have improved the overall quality, performance and ease-of-use of the CDI service towards human users and machine processes.
The presentation will provide more technical background on the upgrading of the CDI Data Discovery and Access service, and adopting the cloud. It will report on the current release (https://cdi.seadatanet.org), demonstrate the wealth of data, present the experiences of developing services in the cloud, and demonstrate the advantages of this system for the scientific community.
How to cite: Thijsse, P., Schaap, D., and Fichaut, M.: Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8073, https://doi.org/10.5194/egusphere-egu2020-8073, 2020.
SeaDataNet is an operational pan-European infrastructure for managing marine and ocean data and its core partners are National Oceanographic Data Centres (NODC’s) and oceanographic data focal points from 34 coastal states in Europe. Currently SeaDataNet gives discovery and access to more than 2.3 million data sets for physical oceanography, chemistry, geology, geophysics, bathymetry and biology from more than 650 data originators. The population has increased considerably in cooperation with and involvement in many associated EU projects and initiatives such as EMODnet. The SeaDataNet infrastructure has been set up in a series of projects in last two decades. Currently the SeaDataNet core services and marine data management standards are upgraded in the EU HORIZON 2020 ‘SeaDataCloud’ project that runs for 4 years from 1st November 2016. The upgraded services include a movement “to the cloud” via a strategic and technical cooperation of the SeaDataNet consortium with the EUDAT consortium of e-infrastructure service providers. This is an important step into the EOSC domain.
One of the main components of SeaDataNet is the CDI Data Discovery and Access service that provides users access to marine data from 100 connected data centres. The previous version of the CDI service was appreciated for harmonising the dataset, but also had some flaws towards usability of the interface and performance. Under SeaDataCloud the CDI Data Discovery and Access service has now been upgraded by introducing a central data buffer in the cloud that continuously synchronises by replication from the data centres. The “datacache” itself is being hosted and horizontally synchronised between 5 EUDAT e-data centres. During the implementation of the replication prcoess additional quality control mechanisms have been included on the central metadata and associated data in the buffer.
In October 2019 the actual public launch took place of the operational production version of the upgraded CDI Data Discovery and Access service. The user interface has been completely redeveloped, upgraded, reviewed and optimised, offering a very efficient query and shopping experience with great performance. Also, the import process for new and updated CDI metadata and associated data sets has been innovated, introducing successfully cloud technology.
The upgraded user interface has been developed and tested in close cooperation with the users. It now also includes the “MySeaDataCloud” concept in which various services are offered to meet the latest demands of users: e.g. save searches, sharing datasearches and eventually even pushing data in the SDC VRE. The user interface and machine-to-machine interfaces have improved the overall quality, performance and ease-of-use of the CDI service towards human users and machine processes.
The presentation will provide more technical background on the upgrading of the CDI Data Discovery and Access service, and adopting the cloud. It will report on the current release (https://cdi.seadatanet.org), demonstrate the wealth of data, present the experiences of developing services in the cloud, and demonstrate the advantages of this system for the scientific community.
How to cite: Thijsse, P., Schaap, D., and Fichaut, M.: Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8073, https://doi.org/10.5194/egusphere-egu2020-8073, 2020.
EGU2020-10292 | Displays | ESSI1.1
New BioGeoChemical products provided by the Copernicus Marine ServiceVirginie Racapé, Vidar Lien, Nilsen Jan Even Øie, Havard Vindenes, Leonidas Perivoliotis, and Seppo Kaitala
The Copernicus Marine service is a “one-stop-shop” providing freely available operational data on the state of the marine environment for use by marine managers, advisors, and scientists, as well as intermediate and end users in marine businesses and operations. The Copernicus Marine service offers operationally updated and state-of-the-art products that are well documented and transparent. The European Commission’s long-term commitment to the Copernicus program offers long-term visibility and stability of the Copernicus Marine products. Furthermore, Copernicus Marine offers a dedicated service desk, in addition to training sessions and workshops.
Here, we present the in situ biogeochemical data products distributed by the Copernicus Marine System since 2018. It offers available data of chlorophyll-a, oxygen, and nutrients collected across the globe. These products integrate observation aggregated from the Regional EuroGOOS consortium (Arctic-ROOS, BOOS, NOOS, IBI-ROOS, MONGOOS) and Black Sea GOOS as well as from SeaDataNet2 National Data Centers (NODCs) and JCOMM global systems (Argo, GOSUD, OceanSITES, GTSPP, DBCP) and the Global telecommunication system (GTS) used by the Met Offices.
The in situ Near Real Time biogeochemical product is updated every month whereas the reprocessed product is updated two times per year. Products are delivered on NetCDF4 format compliant with the CF1.7 standard and well-documented quality control procedures.
How to cite: Racapé, V., Lien, V., Jan Even Øie, N., Vindenes, H., Perivoliotis, L., and Kaitala, S.: New BioGeoChemical products provided by the Copernicus Marine Service , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10292, https://doi.org/10.5194/egusphere-egu2020-10292, 2020.
The Copernicus Marine service is a “one-stop-shop” providing freely available operational data on the state of the marine environment for use by marine managers, advisors, and scientists, as well as intermediate and end users in marine businesses and operations. The Copernicus Marine service offers operationally updated and state-of-the-art products that are well documented and transparent. The European Commission’s long-term commitment to the Copernicus program offers long-term visibility and stability of the Copernicus Marine products. Furthermore, Copernicus Marine offers a dedicated service desk, in addition to training sessions and workshops.
Here, we present the in situ biogeochemical data products distributed by the Copernicus Marine System since 2018. It offers available data of chlorophyll-a, oxygen, and nutrients collected across the globe. These products integrate observation aggregated from the Regional EuroGOOS consortium (Arctic-ROOS, BOOS, NOOS, IBI-ROOS, MONGOOS) and Black Sea GOOS as well as from SeaDataNet2 National Data Centers (NODCs) and JCOMM global systems (Argo, GOSUD, OceanSITES, GTSPP, DBCP) and the Global telecommunication system (GTS) used by the Met Offices.
The in situ Near Real Time biogeochemical product is updated every month whereas the reprocessed product is updated two times per year. Products are delivered on NetCDF4 format compliant with the CF1.7 standard and well-documented quality control procedures.
How to cite: Racapé, V., Lien, V., Jan Even Øie, N., Vindenes, H., Perivoliotis, L., and Kaitala, S.: New BioGeoChemical products provided by the Copernicus Marine Service , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10292, https://doi.org/10.5194/egusphere-egu2020-10292, 2020.
EGU2020-10296 | Displays | ESSI1.1
EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seasDick M. A. Schaap and Thierry Schmitt
Access to marine data is a key issue for the EU Marine Strategy Framework Directive and the EU Marine Knowledge 2020 agenda and includes the European Marine Observation and Data Network (EMODnet) initiative. EMODnet aims at assembling European marine data, data products and metadata from diverse sources in a uniform way.
The EMODnet Bathymetry project is active since 2008 and has developed Digital Terrain Models (DTM) for the European seas, which are published at a regular interval, each time improving quality and precision, and expanding functionalities for viewing, using, and downloading. The DTMs are produced from survey and aggregated data sets that are referenced with metadata adopting the SeaDataNet Catalogue services. SeaDataNet is a network of major oceanographic data centres around the European seas that manage, operate and further develop a pan-European infrastructure for marine and ocean data management. The latest EMODnet Bathymetry DTM release also includes Satellite Derived Bathymetry and has a grid resolution of 1/16 arcminute (circa 125 meters), covering all European sea regions. Use has been made of circa 9400 gathered survey datasets, composite DTMs and SDB bathymetry. Catalogues and the EMODnet DTM are published at the dedicated EMODnet Bathymetry portal including a versatile DTM viewing and downloading service.
As part of the expansion and innovation, more focus has been directed towards bathymetry for near coastal waters and coastal zones. And Satellite Derived Bathymetry data have been produced and included to fill gaps in coverage of the coastal zones. The Bathymetry Viewing and Download service has been upgraded to provide a multi-resolution map and including versatile 3D viewing. Moreover, best-estimates have been determined of the European coastline for a range of tidal levels (HAT, MHW, MSL, Chart Datum, LAT), thereby making use of a tidal model for Europe. In addition, a Quality Index layer has been formulated with indicators derived from the source data and which can be queried in the The Bathymetry Viewing and Download service. Finally, extra functonality has been added to the mechanism for downloading DTM tiles in various formats and special high-resolution DTMs for interesting areas.
This results in many users visiting the portal, browsing the DTM Viewer, downloading the DTM tiles and making use of the OGC Web services for using the EMODnet Bathymetry in their applications.
The presentation will highlight key details of the EMODnet Bathymetry DTM production process and the Bathymetry portal with its extensive functionality.
How to cite: Schaap, D. M. A. and Schmitt, T.: EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seas , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10296, https://doi.org/10.5194/egusphere-egu2020-10296, 2020.
Access to marine data is a key issue for the EU Marine Strategy Framework Directive and the EU Marine Knowledge 2020 agenda and includes the European Marine Observation and Data Network (EMODnet) initiative. EMODnet aims at assembling European marine data, data products and metadata from diverse sources in a uniform way.
The EMODnet Bathymetry project is active since 2008 and has developed Digital Terrain Models (DTM) for the European seas, which are published at a regular interval, each time improving quality and precision, and expanding functionalities for viewing, using, and downloading. The DTMs are produced from survey and aggregated data sets that are referenced with metadata adopting the SeaDataNet Catalogue services. SeaDataNet is a network of major oceanographic data centres around the European seas that manage, operate and further develop a pan-European infrastructure for marine and ocean data management. The latest EMODnet Bathymetry DTM release also includes Satellite Derived Bathymetry and has a grid resolution of 1/16 arcminute (circa 125 meters), covering all European sea regions. Use has been made of circa 9400 gathered survey datasets, composite DTMs and SDB bathymetry. Catalogues and the EMODnet DTM are published at the dedicated EMODnet Bathymetry portal including a versatile DTM viewing and downloading service.
As part of the expansion and innovation, more focus has been directed towards bathymetry for near coastal waters and coastal zones. And Satellite Derived Bathymetry data have been produced and included to fill gaps in coverage of the coastal zones. The Bathymetry Viewing and Download service has been upgraded to provide a multi-resolution map and including versatile 3D viewing. Moreover, best-estimates have been determined of the European coastline for a range of tidal levels (HAT, MHW, MSL, Chart Datum, LAT), thereby making use of a tidal model for Europe. In addition, a Quality Index layer has been formulated with indicators derived from the source data and which can be queried in the The Bathymetry Viewing and Download service. Finally, extra functonality has been added to the mechanism for downloading DTM tiles in various formats and special high-resolution DTMs for interesting areas.
This results in many users visiting the portal, browsing the DTM Viewer, downloading the DTM tiles and making use of the OGC Web services for using the EMODnet Bathymetry in their applications.
The presentation will highlight key details of the EMODnet Bathymetry DTM production process and the Bathymetry portal with its extensive functionality.
How to cite: Schaap, D. M. A. and Schmitt, T.: EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seas , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10296, https://doi.org/10.5194/egusphere-egu2020-10296, 2020.
EGU2020-13614 | Displays | ESSI1.1
Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research EnvironmentCharles Troupin, Alexander Barth, Merret Buurman, Sebastian Mieruch, Léo Bruvry Lagadec, Themis Zamani, and Peter Thijsse
A typical hurdle faced by scientists when it comes to process data is the installation and maintenance of software tools: the installation procedures are sometimes poorly documented, while there is often several dependencies that may create incompatibilities issues. In order to make easier the life of scientists and experts, a Virtual Research Environment (VRE) is being developed in the frame of SeaDataCloud project.
The goal is to provide them with a computing environment where the tools are already deployed and datasets are available for direct processing. In the context of SeaDataCloud, the tools are:
- WebODV, able to perform data reading, quality check, subsetting, among many other possibilities.
- DIVAnd, for the spatial interpolation of in situ measurements.
- A visualisation toolbox for both the input data and the output, gridded fields.
DIVAnd
DIVAnd (Data-Interpolating Variational Analysis in n dimensions) is a software tool designed to generate a set of gridded fields from in situ observations. The code is written in Julia a high-performance programming language (https://julialang.org/), particularly suitable for the processing of large matrices.
The code, developed and improved on a regular basis, is distributed via the hosting platform GitHub: https://github.com/gher-ulg/DIVAnd.jl. It supports Julia-1.0 since its version 2.1.0 (September 2018).
Notebooks
Along with the source code, a set of jupyter-notebooks describing the different steps for the production of a climatology are provided, with an increasing level of complexity: https://github.com/gher-ulg/Diva-Workshops/tree/master/notebooks.
Deployment in the VRE
JupyterHub (https://jupyter.org/hub), is a multiple-user instance of jupyter notebooks. It has proven an adequate solution to allow several users to work simultaneously with the DIVAnd tool and it offers different ways to isolate the users. The approach selected in the frame of this project is the Docker containers, in which the software tools, as well as their dependencies, are stored. This solution allows multiple copies of a container to be run efficiently in a system and also makes it easier to perform the deployment in the VRE. The authentication step is also managed by JupyterHub.
Docker container
The Docker container is distributed via Docker Hub (https://hub.docker.com/r/abarth/divand-jupyterhub) and includes the installation of:
- The Julia language (currently version 1.3.1);
- Libraries and tools such as netCDF, unzip, git;
- Various Julia packages such as PyPlot (plotting library), NCDatasets (manipulation of netCDF files) and DIVAnd.jl.
- The most recent version of the DIVAnd notebooks.
All in all, Docker allows one to provide a standardized computing environment to all users and helped significantly the development of the VRE.
How to cite: Troupin, C., Barth, A., Buurman, M., Mieruch, S., Bruvry Lagadec, L., Zamani, T., and Thijsse, P.: Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13614, https://doi.org/10.5194/egusphere-egu2020-13614, 2020.
A typical hurdle faced by scientists when it comes to process data is the installation and maintenance of software tools: the installation procedures are sometimes poorly documented, while there is often several dependencies that may create incompatibilities issues. In order to make easier the life of scientists and experts, a Virtual Research Environment (VRE) is being developed in the frame of SeaDataCloud project.
The goal is to provide them with a computing environment where the tools are already deployed and datasets are available for direct processing. In the context of SeaDataCloud, the tools are:
- WebODV, able to perform data reading, quality check, subsetting, among many other possibilities.
- DIVAnd, for the spatial interpolation of in situ measurements.
- A visualisation toolbox for both the input data and the output, gridded fields.
DIVAnd
DIVAnd (Data-Interpolating Variational Analysis in n dimensions) is a software tool designed to generate a set of gridded fields from in situ observations. The code is written in Julia a high-performance programming language (https://julialang.org/), particularly suitable for the processing of large matrices.
The code, developed and improved on a regular basis, is distributed via the hosting platform GitHub: https://github.com/gher-ulg/DIVAnd.jl. It supports Julia-1.0 since its version 2.1.0 (September 2018).
Notebooks
Along with the source code, a set of jupyter-notebooks describing the different steps for the production of a climatology are provided, with an increasing level of complexity: https://github.com/gher-ulg/Diva-Workshops/tree/master/notebooks.
Deployment in the VRE
JupyterHub (https://jupyter.org/hub), is a multiple-user instance of jupyter notebooks. It has proven an adequate solution to allow several users to work simultaneously with the DIVAnd tool and it offers different ways to isolate the users. The approach selected in the frame of this project is the Docker containers, in which the software tools, as well as their dependencies, are stored. This solution allows multiple copies of a container to be run efficiently in a system and also makes it easier to perform the deployment in the VRE. The authentication step is also managed by JupyterHub.
Docker container
The Docker container is distributed via Docker Hub (https://hub.docker.com/r/abarth/divand-jupyterhub) and includes the installation of:
- The Julia language (currently version 1.3.1);
- Libraries and tools such as netCDF, unzip, git;
- Various Julia packages such as PyPlot (plotting library), NCDatasets (manipulation of netCDF files) and DIVAnd.jl.
- The most recent version of the DIVAnd notebooks.
All in all, Docker allows one to provide a standardized computing environment to all users and helped significantly the development of the VRE.
How to cite: Troupin, C., Barth, A., Buurman, M., Mieruch, S., Bruvry Lagadec, L., Zamani, T., and Thijsse, P.: Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13614, https://doi.org/10.5194/egusphere-egu2020-13614, 2020.
EGU2020-16449 | Displays | ESSI1.1
Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainabilitySara Garavelli and Dick M. A. Schaap
The European Open Science Cloud (EOSC) is an initiative launched by the European Commission in 2016, as part of the European Cloud Initiative. EOSC aims to provide a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines, leveraging and federating the existing data infrastructures.
Following its launch several Calls have been published and several projects have been granted for developing (parts of) the EOSC, such as for example ENVRI-FAIR. For the marine domain a dedicated call was launched as part of ‘The Future of Seas and Oceans Flagship Initiative’, combining interests of developing a thematic marine EOSC cloud and serving the Blue Economy, Marine Environment and Marine Knowledge agendas.
The winning H2020 Blue-Cloud project is dedicated to marine data management and it is coordinated by Trust-IT with MARIS as technical coordinator. The aims are:
- To build and demonstrate a Pilot Blue Cloud by combining distributed marine data resources, computing platforms, and analytical services
-
To develop services for supporting research to better understand & manage the many aspects of ocean sustainability
- To develop and validate a number of demonstrators of relevance for marine societal challenges
- To formulate a roadmap for expansion and sustainability of the Blue Cloud infrastructure and services.
The project will federate leading European marine data management infrastructures (SeaDataNet, EurOBIS, Euro-Argo, Argo GDAC, EMODnet, ELIXIR-ENA, EuroBioImaging, CMEMS, C3S, and ICOS-Marine), and horizontal e-infrastructures (EUDAT, DIAS, D4Science) to capitalise on what exists already and to develop and deploy the Blue Cloud. The federation will be at the levels of data resources, computing resources and analytical service resources. A Blue Cloud data discovery and access service will be developed to facilitate sharing with users of multi-disciplinary datasets. A Blue Cloud Virtual Research Environment (VRE) will be established to facilitate that computing and analytical services can be shared and combined for specific applications.
This innovation potential will be explored and unlocked by developing five dedicated Demonstrators as Virtual Labs together with excellent marine researchers. There is already a large portfolio of existing services managed by the Blue Cloud founders which will be activated and integrated to serve the Blue-Cloud.
The modular architecture of the VRE will allow scalability and sustainability for near-future expansions, such as connecting additional infrastructures, implementing more and advanced blue analytical services, configuring more dedicated Virtual Labs, and targeting more (groups of) users.
The presentation will describe the vision of the Blue-Cloud framework, the Blue-Cloud data discovery and access service (to find and retrieve data sets from a diversified array of key marine data infrastructures dealing with physics, biology, biodiversity, chemistry, and bio genomics), the Blue-Cloud VRE (to facilitate collaborative research using a variety of data sets and analytical tools, complemented by generic services such as sub-setting, pre-processing, harmonizing, publishing and visualization). The technical architecture of Blue-Cloud will be presented via 5 real-life use-cases to demonstrate the impact that such innovation can have on science and society.
How to cite: Garavelli, S. and Schaap, D. M. A.: Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainability , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16449, https://doi.org/10.5194/egusphere-egu2020-16449, 2020.
The European Open Science Cloud (EOSC) is an initiative launched by the European Commission in 2016, as part of the European Cloud Initiative. EOSC aims to provide a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines, leveraging and federating the existing data infrastructures.
Following its launch several Calls have been published and several projects have been granted for developing (parts of) the EOSC, such as for example ENVRI-FAIR. For the marine domain a dedicated call was launched as part of ‘The Future of Seas and Oceans Flagship Initiative’, combining interests of developing a thematic marine EOSC cloud and serving the Blue Economy, Marine Environment and Marine Knowledge agendas.
The winning H2020 Blue-Cloud project is dedicated to marine data management and it is coordinated by Trust-IT with MARIS as technical coordinator. The aims are:
- To build and demonstrate a Pilot Blue Cloud by combining distributed marine data resources, computing platforms, and analytical services
-
To develop services for supporting research to better understand & manage the many aspects of ocean sustainability
- To develop and validate a number of demonstrators of relevance for marine societal challenges
- To formulate a roadmap for expansion and sustainability of the Blue Cloud infrastructure and services.
The project will federate leading European marine data management infrastructures (SeaDataNet, EurOBIS, Euro-Argo, Argo GDAC, EMODnet, ELIXIR-ENA, EuroBioImaging, CMEMS, C3S, and ICOS-Marine), and horizontal e-infrastructures (EUDAT, DIAS, D4Science) to capitalise on what exists already and to develop and deploy the Blue Cloud. The federation will be at the levels of data resources, computing resources and analytical service resources. A Blue Cloud data discovery and access service will be developed to facilitate sharing with users of multi-disciplinary datasets. A Blue Cloud Virtual Research Environment (VRE) will be established to facilitate that computing and analytical services can be shared and combined for specific applications.
This innovation potential will be explored and unlocked by developing five dedicated Demonstrators as Virtual Labs together with excellent marine researchers. There is already a large portfolio of existing services managed by the Blue Cloud founders which will be activated and integrated to serve the Blue-Cloud.
The modular architecture of the VRE will allow scalability and sustainability for near-future expansions, such as connecting additional infrastructures, implementing more and advanced blue analytical services, configuring more dedicated Virtual Labs, and targeting more (groups of) users.
The presentation will describe the vision of the Blue-Cloud framework, the Blue-Cloud data discovery and access service (to find and retrieve data sets from a diversified array of key marine data infrastructures dealing with physics, biology, biodiversity, chemistry, and bio genomics), the Blue-Cloud VRE (to facilitate collaborative research using a variety of data sets and analytical tools, complemented by generic services such as sub-setting, pre-processing, harmonizing, publishing and visualization). The technical architecture of Blue-Cloud will be presented via 5 real-life use-cases to demonstrate the impact that such innovation can have on science and society.
How to cite: Garavelli, S. and Schaap, D. M. A.: Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainability , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16449, https://doi.org/10.5194/egusphere-egu2020-16449, 2020.
EGU2020-20215 | Displays | ESSI1.1
Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon dataBenjamin Pfeil, Steve Jones, Maren Karlsen, Camilla Stegen Landa, Rocio Castano Primo, Alex Vermeulen, and Oleg Mirzov
Essential Ocean Variable Inorganic Carbon observations collected from instruments at sea are typically processed by individual PIs before submitting to data centres and other data archives. Often this work is done on an ad-hoc basis using unpublished, self-built software, and published in unique formats. This conflicts with the Interoperability and Reusability aspects of the FAIR data principles: such data requires significant reformatting efforts by data centres and/or end users, and reproducibility is impossible without a full record of the processing performed and QC decisions made by PIs. The manual nature of this process implies additional workload for PIs who need to submit their data to multiple archives/data product. There is a clear need to standardise the data workflow from measurement to publication using common, open source, and documented tools whose algorithms are fully accessible and all processing is recorded for full transparency.
The Ocean Thematic Centre of the European Research Infrastructure ICOS (Integrated Carbon Observation System) is developing QuinCe, a browser-based tool for uploading, processing, automatic and manual quality control, and publication of data from underway pCO₂ systems on ships and moorings. Data can be uploaded directly from instruments in any text format, where it is standardised and processed using algorithms approved by the scientific community. Automatic QC algorithms can detect many obvious data errors; afterwards PIs can perform full quality control of the data following Standard Operating Procedures and best practises. All records of QC decisions, with enforced explanatory notes, are recorded by the software to enable full traceability and reproducibility. The final QCed dataset can be downloaded by the PI, and is sent to the ICOS Carbon Portal and SOCAT project for publication. The ICOS Carbon Portal integrates marine data with ICOS data from the ecosystem and atmosphere on a regional scale and data is integrated via SOCAT in the annual Global Carbon Budgets of the Global Carbon Project where it informs policy/decision makers, the scientific community and the general public.
For platforms with operational data flows, the data is transmitted directly from ship to shore, QuinCe processes, quality controls and publishes Near Real Time data to the ICOS Carbon Portal and to Copernicus Marine Environmental Monitoring Services In Situ TAC as soon as it is received with no human intervention, greatly reducing the time from measurement to data availability.
Full metadata records for instruments are kept and maintained at the ICOS Carbon Portal, utilising existing standardised vocabularies and version control to maintain a complete history. The correct metadata for any given dataset is available at any time, and can be converted to any required format, allowing compliance with the United Nations Sustainable Development Goal 14.3.1 methodology ‘average marine acidity (pH) measured at agreed suite of representative sampling stations’ and ICOS data relevant to SDG 14.3 is distributed to IOC UNESCO’s IODE. While much of this work is currently performed manually, international efforts are underway to develop fully automated systems and these will be integrated as they become available.
How to cite: Pfeil, B., Jones, S., Karlsen, M., Stegen Landa, C., Castano Primo, R., Vermeulen, A., and Mirzov, O.: Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20215, https://doi.org/10.5194/egusphere-egu2020-20215, 2020.
Essential Ocean Variable Inorganic Carbon observations collected from instruments at sea are typically processed by individual PIs before submitting to data centres and other data archives. Often this work is done on an ad-hoc basis using unpublished, self-built software, and published in unique formats. This conflicts with the Interoperability and Reusability aspects of the FAIR data principles: such data requires significant reformatting efforts by data centres and/or end users, and reproducibility is impossible without a full record of the processing performed and QC decisions made by PIs. The manual nature of this process implies additional workload for PIs who need to submit their data to multiple archives/data product. There is a clear need to standardise the data workflow from measurement to publication using common, open source, and documented tools whose algorithms are fully accessible and all processing is recorded for full transparency.
The Ocean Thematic Centre of the European Research Infrastructure ICOS (Integrated Carbon Observation System) is developing QuinCe, a browser-based tool for uploading, processing, automatic and manual quality control, and publication of data from underway pCO₂ systems on ships and moorings. Data can be uploaded directly from instruments in any text format, where it is standardised and processed using algorithms approved by the scientific community. Automatic QC algorithms can detect many obvious data errors; afterwards PIs can perform full quality control of the data following Standard Operating Procedures and best practises. All records of QC decisions, with enforced explanatory notes, are recorded by the software to enable full traceability and reproducibility. The final QCed dataset can be downloaded by the PI, and is sent to the ICOS Carbon Portal and SOCAT project for publication. The ICOS Carbon Portal integrates marine data with ICOS data from the ecosystem and atmosphere on a regional scale and data is integrated via SOCAT in the annual Global Carbon Budgets of the Global Carbon Project where it informs policy/decision makers, the scientific community and the general public.
For platforms with operational data flows, the data is transmitted directly from ship to shore, QuinCe processes, quality controls and publishes Near Real Time data to the ICOS Carbon Portal and to Copernicus Marine Environmental Monitoring Services In Situ TAC as soon as it is received with no human intervention, greatly reducing the time from measurement to data availability.
Full metadata records for instruments are kept and maintained at the ICOS Carbon Portal, utilising existing standardised vocabularies and version control to maintain a complete history. The correct metadata for any given dataset is available at any time, and can be converted to any required format, allowing compliance with the United Nations Sustainable Development Goal 14.3.1 methodology ‘average marine acidity (pH) measured at agreed suite of representative sampling stations’ and ICOS data relevant to SDG 14.3 is distributed to IOC UNESCO’s IODE. While much of this work is currently performed manually, international efforts are underway to develop fully automated systems and these will be integrated as they become available.
How to cite: Pfeil, B., Jones, S., Karlsen, M., Stegen Landa, C., Castano Primo, R., Vermeulen, A., and Mirzov, O.: Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20215, https://doi.org/10.5194/egusphere-egu2020-20215, 2020.
EGU2020-20337 | Displays | ESSI1.1
Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-useLeda Pecci, Michele Fichaut, and Dick Schaap
Marine and ocean data represent a significant resource that can be used to improve the global knowledge of the seas. A huge amount of data is produced every day by ocean observations all around Europe. The ability to leverage this valuable potential depends on the capacity of the already established European (EU) ocean data infrastructures to support new needs in the field of ocean data management and to adopt the emerging technologies.
The SeaDataNet e-infrastructure (https://www.seadatanet.org), built up in early 2000 years, plays an important role for marine scientists and other ocean stakeholders communities, giving access to more than 2.2 million multidisciplinary harmonised marine and ocean data sets coming mainly from the European seas collected by more than 110 data centers, and offering data products and metadata services. Thanks to the 4-year SeaDataCloud Horizon 2020 project, started the 1st of November 2016, the development of a more efficient electronic infrastructure, kept up with the times and offering new services, based on the cloud and High Performance Computing (HPC) technologies, was addressed. It has renewed the original SeaDataNet Information Technology (IT) architecture. The collaboration with the EUDAT consortium, composed of a number of research communities and large European computer and data centres, enabled the migration of the data storage and services into the cloud environment, new instruments, such as High-Frequency Radar (HFR), Flow Cytometer and Glider data, have been standardised in agreement with the respective user communities. Furthermore, a Virtual Research Environment will support research collaboration.
SDN infrastructure is focused on historical digital ocean data and also supports the management of data streams from sensors based on the Sensor Web Enablement (SWE) standards of the Open Geospatial Consortium (OGC).
Harmonisation of ocean data allows more countries to be able to use data for scientific research and for decision-making purpose but data re-use is related also to the trust that the ocean scientific community places in the data. The latter issue involves a well-defined process of data quality checks. In SDN, data producers have to label each individual measurement with a value according to the SDN Quality Check (QC) Flags, and they follow specific procedures presented in the SDN-QC guideline (https://www.seadatanet.org/Standards/Data-Quality-Control). Furthermore, a range of checks are carried out on the data, as part of the process of data products generation to improve the overall quality.
A relevant issue that limits data re-use is that some researchers are reluctant to share their own data, the push to encourage them it is to give them the right acknowledgment for the work done by means of the data citation, for this reason from the SDN portal a Digital Object Identifier (DOI) minting service is freely available for every data producer that shares their data. In addition, data versioning is available on the cloud platform for reproducible analysis.
The SeaDataCloud project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement Nº 730960.
How to cite: Pecci, L., Fichaut, M., and Schaap, D.: Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-use, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20337, https://doi.org/10.5194/egusphere-egu2020-20337, 2020.
Marine and ocean data represent a significant resource that can be used to improve the global knowledge of the seas. A huge amount of data is produced every day by ocean observations all around Europe. The ability to leverage this valuable potential depends on the capacity of the already established European (EU) ocean data infrastructures to support new needs in the field of ocean data management and to adopt the emerging technologies.
The SeaDataNet e-infrastructure (https://www.seadatanet.org), built up in early 2000 years, plays an important role for marine scientists and other ocean stakeholders communities, giving access to more than 2.2 million multidisciplinary harmonised marine and ocean data sets coming mainly from the European seas collected by more than 110 data centers, and offering data products and metadata services. Thanks to the 4-year SeaDataCloud Horizon 2020 project, started the 1st of November 2016, the development of a more efficient electronic infrastructure, kept up with the times and offering new services, based on the cloud and High Performance Computing (HPC) technologies, was addressed. It has renewed the original SeaDataNet Information Technology (IT) architecture. The collaboration with the EUDAT consortium, composed of a number of research communities and large European computer and data centres, enabled the migration of the data storage and services into the cloud environment, new instruments, such as High-Frequency Radar (HFR), Flow Cytometer and Glider data, have been standardised in agreement with the respective user communities. Furthermore, a Virtual Research Environment will support research collaboration.
SDN infrastructure is focused on historical digital ocean data and also supports the management of data streams from sensors based on the Sensor Web Enablement (SWE) standards of the Open Geospatial Consortium (OGC).
Harmonisation of ocean data allows more countries to be able to use data for scientific research and for decision-making purpose but data re-use is related also to the trust that the ocean scientific community places in the data. The latter issue involves a well-defined process of data quality checks. In SDN, data producers have to label each individual measurement with a value according to the SDN Quality Check (QC) Flags, and they follow specific procedures presented in the SDN-QC guideline (https://www.seadatanet.org/Standards/Data-Quality-Control). Furthermore, a range of checks are carried out on the data, as part of the process of data products generation to improve the overall quality.
A relevant issue that limits data re-use is that some researchers are reluctant to share their own data, the push to encourage them it is to give them the right acknowledgment for the work done by means of the data citation, for this reason from the SDN portal a Digital Object Identifier (DOI) minting service is freely available for every data producer that shares their data. In addition, data versioning is available on the cloud platform for reproducible analysis.
The SeaDataCloud project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement Nº 730960.
How to cite: Pecci, L., Fichaut, M., and Schaap, D.: Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-use, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20337, https://doi.org/10.5194/egusphere-egu2020-20337, 2020.
EGU2020-21387 | Displays | ESSI1.1
Cultivating a mutually beneficial ocean science data management relationship with Brexit NationsAndrew Conway, Adam Leadbetter, and Tara Keena
Integration of data management systems is a persistent problem in European projects that span multiple agencies. Months, if not years of projects are often expended on the integration of disparate database structures, data types, methodologies and outputs. Moreover, this work is usually confined to a single effort, meaning it is needlessly repeated on subsequent projects. The legacy effect of removing these barriers could therefore yield monetary and time savings for all involved, far beyond a single cross-jurisdictional project.
The European Union’s INTERREG VA Programme has funded the COMPASS project to better manage marine protected areas (MPA) in peripheral areas. Involving five organisations, spread across two nations, the project has developed a cross-border network for marine monitoring. Three of those organisations are UK-based and bound for Brexit (the Agri-Food and Biosciences Institute, Marine Scotland Science and the Scottish Association of Marine Science). With that network under construction, significant efforts have been placed on harmonizing data management processes and procedures between the partners.
A data management quality management framework (DM-QMF) was introduced to guide this harmonization and ensure adequate quality controls would be enforced. As lead partner on data management, the Irish Marine Institute (MI) initially shared guidelines for infrastructure, architecture and metadata. The implementation of those requirements were then left to the other four partners, with the MI acting as facilitator. This led to the following being generated for each process in the project:
Data management plan: Information on how and what data were to be generated as well as where it would be stored.
Flow diagrams: Diagrammatic overview of the flow of data through the project.
Standard Operating Procedures: Detailed explanatory documents on the precise workings of a process.
Data management processes were allowed to evolve naturally out of a need to adhere to this set standard. Organisations were able to work within their operational limitations, without being required to alter their existing procedures, but encouraged to learn from each other. Very quickly it was found that there were similarities in processes, where previously it was thought there were significant differences. This process of sharing data management information has created mutually benefiting synergies and enabled the convergence of procedures within the separate organisations.
The downstream data management synergies that COMPASS has produced have already taken effect. Sister INTERREG VA projects, SeaMonitor and MarPAMM, have felt the benefits. The same data management systems cultivated as part of the COMPASS project are being reused, while the groundwork in creating strong cross boundary channels of communication and cooperation are saving significant amounts of time in project coordination.
Through data management, personal and institutional relationships have been strengthened, both of which should persist beyond the project terminus in 2021, well into a post-Brexit Europe. The COMPASS project has been an exemplar of how close collaboration can persist and thrive in a changing political environment, in spite of the ongoing uncertainty surrounding Brexit.
How to cite: Conway, A., Leadbetter, A., and Keena, T.: Cultivating a mutually beneficial ocean science data management relationship with Brexit Nations, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21387, https://doi.org/10.5194/egusphere-egu2020-21387, 2020.
Integration of data management systems is a persistent problem in European projects that span multiple agencies. Months, if not years of projects are often expended on the integration of disparate database structures, data types, methodologies and outputs. Moreover, this work is usually confined to a single effort, meaning it is needlessly repeated on subsequent projects. The legacy effect of removing these barriers could therefore yield monetary and time savings for all involved, far beyond a single cross-jurisdictional project.
The European Union’s INTERREG VA Programme has funded the COMPASS project to better manage marine protected areas (MPA) in peripheral areas. Involving five organisations, spread across two nations, the project has developed a cross-border network for marine monitoring. Three of those organisations are UK-based and bound for Brexit (the Agri-Food and Biosciences Institute, Marine Scotland Science and the Scottish Association of Marine Science). With that network under construction, significant efforts have been placed on harmonizing data management processes and procedures between the partners.
A data management quality management framework (DM-QMF) was introduced to guide this harmonization and ensure adequate quality controls would be enforced. As lead partner on data management, the Irish Marine Institute (MI) initially shared guidelines for infrastructure, architecture and metadata. The implementation of those requirements were then left to the other four partners, with the MI acting as facilitator. This led to the following being generated for each process in the project:
Data management plan: Information on how and what data were to be generated as well as where it would be stored.
Flow diagrams: Diagrammatic overview of the flow of data through the project.
Standard Operating Procedures: Detailed explanatory documents on the precise workings of a process.
Data management processes were allowed to evolve naturally out of a need to adhere to this set standard. Organisations were able to work within their operational limitations, without being required to alter their existing procedures, but encouraged to learn from each other. Very quickly it was found that there were similarities in processes, where previously it was thought there were significant differences. This process of sharing data management information has created mutually benefiting synergies and enabled the convergence of procedures within the separate organisations.
The downstream data management synergies that COMPASS has produced have already taken effect. Sister INTERREG VA projects, SeaMonitor and MarPAMM, have felt the benefits. The same data management systems cultivated as part of the COMPASS project are being reused, while the groundwork in creating strong cross boundary channels of communication and cooperation are saving significant amounts of time in project coordination.
Through data management, personal and institutional relationships have been strengthened, both of which should persist beyond the project terminus in 2021, well into a post-Brexit Europe. The COMPASS project has been an exemplar of how close collaboration can persist and thrive in a changing political environment, in spite of the ongoing uncertainty surrounding Brexit.
How to cite: Conway, A., Leadbetter, A., and Keena, T.: Cultivating a mutually beneficial ocean science data management relationship with Brexit Nations, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21387, https://doi.org/10.5194/egusphere-egu2020-21387, 2020.
EGU2020-21908 | Displays | ESSI1.1
EMODnet: FAIR and open source marine data, digital products and servicesJan-Bart Calewaert, Kate Larkin, Conor Delaney, Andree Anne Marsan, and Tim Collart
Unlocking the potential of big ocean data relies on Findable, Reusable, Interoperable and Reusable (FAIR) data. This is a core principle of the European Marine Observation and Data network (EMODnet), a leading long-term marine data service provider, funded by the EU. Over 150 organizations deliver harmonized data through seven portals spanning bathymetry, geology, physics, chemistry, biology, seabed habitats and human activities, with a central portal. Recent data and data products include a high-resolution digital terrain model bathymetry product, digital vessel density maps, marine litter maps and products on geological features. International use cases include sustainable fisheries management and offshore wind farm development. The EMODnet Data Ingestion Service enhances data sharing and the EMODnet Associated Partnership Scheme offers benefits for industry and wider stakeholders. Increasingly, EMODnet is interacting and collaborating with other key marine data initiatives in Europe and globally. This include collaborations with Copernicus Marine Service (CMEMS), SeaDataCloud and others to develop the pilot Blue Cloud as a marine component of the European Open Science Cloud (EOSC), as well as with China, USA and international organisations such as IODE/IOC. This presentation/contribution will provide an update on EMODnet developments, with a future outlook considering main challenges and opportunities and touch upon key collaborations with other marine data initiatives in Europe and globally.
How to cite: Calewaert, J.-B., Larkin, K., Delaney, C., Marsan, A. A., and Collart, T.: EMODnet: FAIR and open source marine data, digital products and services, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21908, https://doi.org/10.5194/egusphere-egu2020-21908, 2020.
Unlocking the potential of big ocean data relies on Findable, Reusable, Interoperable and Reusable (FAIR) data. This is a core principle of the European Marine Observation and Data network (EMODnet), a leading long-term marine data service provider, funded by the EU. Over 150 organizations deliver harmonized data through seven portals spanning bathymetry, geology, physics, chemistry, biology, seabed habitats and human activities, with a central portal. Recent data and data products include a high-resolution digital terrain model bathymetry product, digital vessel density maps, marine litter maps and products on geological features. International use cases include sustainable fisheries management and offshore wind farm development. The EMODnet Data Ingestion Service enhances data sharing and the EMODnet Associated Partnership Scheme offers benefits for industry and wider stakeholders. Increasingly, EMODnet is interacting and collaborating with other key marine data initiatives in Europe and globally. This include collaborations with Copernicus Marine Service (CMEMS), SeaDataCloud and others to develop the pilot Blue Cloud as a marine component of the European Open Science Cloud (EOSC), as well as with China, USA and international organisations such as IODE/IOC. This presentation/contribution will provide an update on EMODnet developments, with a future outlook considering main challenges and opportunities and touch upon key collaborations with other marine data initiatives in Europe and globally.
How to cite: Calewaert, J.-B., Larkin, K., Delaney, C., Marsan, A. A., and Collart, T.: EMODnet: FAIR and open source marine data, digital products and services, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21908, https://doi.org/10.5194/egusphere-egu2020-21908, 2020.
ESSI1.12 – Innovative Evaluation Frameworks and Platforms for Weather and Climate Research
EGU2020-729 | Displays | ESSI1.12 | Highlight
A Conceptual Framework for Modelling the Climate Change and its Impacts within a River Basin using Remote Sensing dataSathyaseelan Mayilvahanam, Sanjay Kumar Ghosh, and Chandra Shekhar Prasad Ojha
Abstract
In general, modelling the climate change and its impacts within a hydrological unit brings out an understanding of the system and, its behaviour with various model constrains. The climate change and global warming studies are being under research and development phase, because of its complex and dynamic nature. The IPCC 5th Assessment Report on global warming states that in the 21st century, there may be an increase in temperature of the order of ~1.5°C. This transient climate may cause significant impacts or any discrepancies in the water availability of the hydrological unit. This may lead to severe impacts in countries with high population such as India, China, etc., The Remote sensing datasets play an essential role in modelling the climatic changes for a river basin at different spatial and temporal scales. This study aims to propose a conceptual framework for the above-defined problem with emphasising on remote sensing datasets. This framework involves five entities such as the data component, process component, impact component, feedback component and, uncertainty component. The framework flow begins with the data component entity that involves two significant inputs, such as the hydro-meteorological data and the land-hydrology data. The essential attributes of the hydro-meteorological data entities are the precipitation, temperature, relative humidity, wind speed and solar radiation. These datasets may be obtained and analysed from empirical or statistical methods, in-situ based or satellite-based methods, respectively. These mathematical models on long-run historical climate data may provide knowledge on climate change detections or its trends. The meteorological data derived from the satellites may have a measurable bias with that of the in situ data. The satellite-based land-hydrology data component involves various attributes such as topography, soil, vegetation, water bodies, other land use / land cover, soil moisture, evapotranspiration. The process component involves complex land-hydrology processes that may be well established and modelled by customizable hydrological models. Here, we may emphasise the use of remote-sensing based model parameter values in the equations either directly or indirectly. Also, the land-atmospheric process component involves various complex processes that may take place in this zone. These processes may be well established and solved by customizable atmospheric weather models. The land components play a significant role in modelling the climate changes, because these land processes may trigger global warming by various anthropogenic agents. The main objective of this framework is to emphasise the climate change impacts using remote sensing. Hence, the impact component entity plays an essential role in this conceptual framework. The climate change impact within a river basin at various spatial and temporal scales are identified using different hydrological responses. The feedback entity is the most sensitive part of this framework, because it may alter the climate forcing either positive or negative. An uncertainty model component handles the uncertainty in the model framework. The highlight of this conceptual framework is to use the remote sensing datasets in climate change studies. The limitations on the correctness of the remote sensing data with the insitu data at every location is not feasible.
How to cite: Mayilvahanam, S., Ghosh, S. K., and Ojha, C. S. P.: A Conceptual Framework for Modelling the Climate Change and its Impacts within a River Basin using Remote Sensing data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-729, https://doi.org/10.5194/egusphere-egu2020-729, 2020.
Abstract
In general, modelling the climate change and its impacts within a hydrological unit brings out an understanding of the system and, its behaviour with various model constrains. The climate change and global warming studies are being under research and development phase, because of its complex and dynamic nature. The IPCC 5th Assessment Report on global warming states that in the 21st century, there may be an increase in temperature of the order of ~1.5°C. This transient climate may cause significant impacts or any discrepancies in the water availability of the hydrological unit. This may lead to severe impacts in countries with high population such as India, China, etc., The Remote sensing datasets play an essential role in modelling the climatic changes for a river basin at different spatial and temporal scales. This study aims to propose a conceptual framework for the above-defined problem with emphasising on remote sensing datasets. This framework involves five entities such as the data component, process component, impact component, feedback component and, uncertainty component. The framework flow begins with the data component entity that involves two significant inputs, such as the hydro-meteorological data and the land-hydrology data. The essential attributes of the hydro-meteorological data entities are the precipitation, temperature, relative humidity, wind speed and solar radiation. These datasets may be obtained and analysed from empirical or statistical methods, in-situ based or satellite-based methods, respectively. These mathematical models on long-run historical climate data may provide knowledge on climate change detections or its trends. The meteorological data derived from the satellites may have a measurable bias with that of the in situ data. The satellite-based land-hydrology data component involves various attributes such as topography, soil, vegetation, water bodies, other land use / land cover, soil moisture, evapotranspiration. The process component involves complex land-hydrology processes that may be well established and modelled by customizable hydrological models. Here, we may emphasise the use of remote-sensing based model parameter values in the equations either directly or indirectly. Also, the land-atmospheric process component involves various complex processes that may take place in this zone. These processes may be well established and solved by customizable atmospheric weather models. The land components play a significant role in modelling the climate changes, because these land processes may trigger global warming by various anthropogenic agents. The main objective of this framework is to emphasise the climate change impacts using remote sensing. Hence, the impact component entity plays an essential role in this conceptual framework. The climate change impact within a river basin at various spatial and temporal scales are identified using different hydrological responses. The feedback entity is the most sensitive part of this framework, because it may alter the climate forcing either positive or negative. An uncertainty model component handles the uncertainty in the model framework. The highlight of this conceptual framework is to use the remote sensing datasets in climate change studies. The limitations on the correctness of the remote sensing data with the insitu data at every location is not feasible.
How to cite: Mayilvahanam, S., Ghosh, S. K., and Ojha, C. S. P.: A Conceptual Framework for Modelling the Climate Change and its Impacts within a River Basin using Remote Sensing data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-729, https://doi.org/10.5194/egusphere-egu2020-729, 2020.
EGU2020-1612 | Displays | ESSI1.12
Building Web Processing Services with BirdhouseCarsten Ehbrecht, Stephan Kindermann, Ag Stephens, and David Huard
The Web Processing Service (WPS) is an OGC interface standard to provide processing tools as Web Service.
The WPS interface standardizes the way processes and their inputs/outputs are described,
how a client can request the execution of a process, and how the output from a process is handled.
Birdhouse tools enable you to build your own customised WPS compute service
in support of remote climate data analysis.
Birdhouse offers you:
- A Cookiecutter template to create your own WPS compute service.
- An Ansible script to deploy a full-stack WPS service.
- A Python library, Birdy, suitable for Jupyter notebooks to interact with WPS compute services.
- An OWS security proxy, Twitcher, to provide access control to WPS compute services.
Birdhouse uses the PyWPS Python implementation of the Web Processing Service standard.
PyWPS is part of the OSGeo project.
The Birdhouse tools are used by several partners and projects.
A Web Processing Service will be used in the Copernicus Climate Change Service (C3S) to provide subsetting
operations on climate model data (CMIP5, CORDEX) as a service to the Climate Data Store (CDS).
The Canadian non profit organization Ouranos is using a Web Processing Service to provide climate indices
calculation to be used remotely from Jupyter notebooks.
In this session we want to show how a Web Processing Service can be used with the Freva evaluation system.
Freva plugins can be made available as processes in a Web Processing Service. These plugins can be run
using a standard WPS client from a terminal and Jupyter notebooks with remote access to the Freva system.
We want to emphasise the integrational aspects of the Birdhouse tools: supporting existing processing frameworks
to add a standardized web service for remote computation.
Links:
- http://bird-house.github.io
- http://pywps.org
- https://www.osgeo.org/
- http://climate.copernicus.eu
- https://www.ouranos.ca/en
- https://freva.met.fu-berlin.de/
How to cite: Ehbrecht, C., Kindermann, S., Stephens, A., and Huard, D.: Building Web Processing Services with Birdhouse, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1612, https://doi.org/10.5194/egusphere-egu2020-1612, 2020.
The Web Processing Service (WPS) is an OGC interface standard to provide processing tools as Web Service.
The WPS interface standardizes the way processes and their inputs/outputs are described,
how a client can request the execution of a process, and how the output from a process is handled.
Birdhouse tools enable you to build your own customised WPS compute service
in support of remote climate data analysis.
Birdhouse offers you:
- A Cookiecutter template to create your own WPS compute service.
- An Ansible script to deploy a full-stack WPS service.
- A Python library, Birdy, suitable for Jupyter notebooks to interact with WPS compute services.
- An OWS security proxy, Twitcher, to provide access control to WPS compute services.
Birdhouse uses the PyWPS Python implementation of the Web Processing Service standard.
PyWPS is part of the OSGeo project.
The Birdhouse tools are used by several partners and projects.
A Web Processing Service will be used in the Copernicus Climate Change Service (C3S) to provide subsetting
operations on climate model data (CMIP5, CORDEX) as a service to the Climate Data Store (CDS).
The Canadian non profit organization Ouranos is using a Web Processing Service to provide climate indices
calculation to be used remotely from Jupyter notebooks.
In this session we want to show how a Web Processing Service can be used with the Freva evaluation system.
Freva plugins can be made available as processes in a Web Processing Service. These plugins can be run
using a standard WPS client from a terminal and Jupyter notebooks with remote access to the Freva system.
We want to emphasise the integrational aspects of the Birdhouse tools: supporting existing processing frameworks
to add a standardized web service for remote computation.
Links:
- http://bird-house.github.io
- http://pywps.org
- https://www.osgeo.org/
- http://climate.copernicus.eu
- https://www.ouranos.ca/en
- https://freva.met.fu-berlin.de/
How to cite: Ehbrecht, C., Kindermann, S., Stephens, A., and Huard, D.: Building Web Processing Services with Birdhouse, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1612, https://doi.org/10.5194/egusphere-egu2020-1612, 2020.
EGU2020-3501 | Displays | ESSI1.12 | Highlight
The Arctic Ocean Observation Operator for 6.9 GHz (ARC3O)Clara Burgard, Dirk Notz, Leif T. Pedersen, and Rasmus T. Tonboe
The diversity in sea-ice concentration observational estimates retrieved from brightness temperatures measured from space is a challenge for our understanding of past and future sea-ice evolution as it inhibits reliable climate model evaluation and initialisation. To address this challenge, we introduce a new tool: the Arctic Ocean Observation Operator (ARC3O).
ARC3O allows us to simulate brightness temperatures at 6.9 GHz at vertical polarisation from standard output of an Earth System Model to be compared to observations from space at this frequency. We use simple temperature and salinity profiles inside the snow and ice column based on the output of the Earth System Model to compute these brightness temperatures.
In this study, we evaluate ARC3O by simulating brightness temperatures based on three assimilation runs of the MPI Earth System Model (MPI-ESM) assimilated with three different sea-ice concentration products. We then compare these three sets of simulated brightness temperatures to brightness temperatures measured by the Advanced Microwave Scanning Radiometer Earth Observing System (AMSR-E) from space. We find that they differ up to 10 K in the period between October and June, depending on the region and the assimilation run. However, we show that these discrepancies between simulated and observed brightness temperature can be mainly attributed to the underlying observational uncertainty in sea-ice concentration and, to a lesser extent, to the data assimilation process, rather than to biases in ARC3O itself. In summer, the discrepancies between simulated and observed brightness temperatures are larger than in winter and locally reach up to 20 K. This is caused by the very large observational uncertainty in summer sea-ice concentration but also by the melt-pond parametrisation in MPI-ESM, which is not necessarily realistic.
ARC3O is therefore capable to realistically translate the simulated Arctic Ocean climate state into one observable quantity for a more comprehensive climate model evaluation and initialisation, an exciting perspective for further developing this and similar methods.
How to cite: Burgard, C., Notz, D., Pedersen, L. T., and Tonboe, R. T.: The Arctic Ocean Observation Operator for 6.9 GHz (ARC3O), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3501, https://doi.org/10.5194/egusphere-egu2020-3501, 2020.
The diversity in sea-ice concentration observational estimates retrieved from brightness temperatures measured from space is a challenge for our understanding of past and future sea-ice evolution as it inhibits reliable climate model evaluation and initialisation. To address this challenge, we introduce a new tool: the Arctic Ocean Observation Operator (ARC3O).
ARC3O allows us to simulate brightness temperatures at 6.9 GHz at vertical polarisation from standard output of an Earth System Model to be compared to observations from space at this frequency. We use simple temperature and salinity profiles inside the snow and ice column based on the output of the Earth System Model to compute these brightness temperatures.
In this study, we evaluate ARC3O by simulating brightness temperatures based on three assimilation runs of the MPI Earth System Model (MPI-ESM) assimilated with three different sea-ice concentration products. We then compare these three sets of simulated brightness temperatures to brightness temperatures measured by the Advanced Microwave Scanning Radiometer Earth Observing System (AMSR-E) from space. We find that they differ up to 10 K in the period between October and June, depending on the region and the assimilation run. However, we show that these discrepancies between simulated and observed brightness temperature can be mainly attributed to the underlying observational uncertainty in sea-ice concentration and, to a lesser extent, to the data assimilation process, rather than to biases in ARC3O itself. In summer, the discrepancies between simulated and observed brightness temperatures are larger than in winter and locally reach up to 20 K. This is caused by the very large observational uncertainty in summer sea-ice concentration but also by the melt-pond parametrisation in MPI-ESM, which is not necessarily realistic.
ARC3O is therefore capable to realistically translate the simulated Arctic Ocean climate state into one observable quantity for a more comprehensive climate model evaluation and initialisation, an exciting perspective for further developing this and similar methods.
How to cite: Burgard, C., Notz, D., Pedersen, L. T., and Tonboe, R. T.: The Arctic Ocean Observation Operator for 6.9 GHz (ARC3O), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3501, https://doi.org/10.5194/egusphere-egu2020-3501, 2020.
EGU2020-4658 | Displays | ESSI1.12
Integrating e-infrastructures for remote climate data processingChristian Pagé, Wim Som de Cerff, Maarten Plieger, Alessandro Spinuso, Iraklis Klampanos, Malcolm Atkinson, and Vangelis Karkaletsis
Accessing and processing large climate data has nowadays become a particularly challenging task for end users, due to the rapidly increasing volumes being produced and made available. Access to climate data is crucial for sustaining research and performing climate change impact assessments. These activities have strong societal impact as climate change affects and requires that almost all economic and social sectors need adapting.
The whole climate data archive is expected to reach a volume of 30 PB in 2020 and up to 2000 PB in 2024 (estimated), evolving from 0.03 PB (30 TB) in 2007 and 2 PB in 2014. Data processing and analysis must now take place remotely for the users: users typically have to rely on heterogeneous infrastructures and services between the data and their physical location. Developers of Research Infrastructures have to provide services to those users, hence having to define standards and generic services to fulfil those requirements.
It will be shown how the DARE eScience Platform (http://project-dare.eu) will help developers to develop needed services more quickly and transparently for a large range of scientific researchers. The platform is designed for efficient and traceable development of complex experiments and domain-specific services. Most importantly, the DARE Platform integrates the following e-infrastructure services: the climate IS-ENES (https://is.enes.org) Research Infrastructure front-end climate4impact (C4I: https://climate4impact.eu), the EUDAT CDI (https://www.eudat.eu/eudat-collaborative-data-infrastructure-cdi) B2DROP Service, as well as the ESGF (https://esgf.llnl.gov). The DARE Platform itself can be deployed by research communities on local, public or commercial clouds, thanks to its containerized architecture.
More specifically, two distinct Use Cases for the climate science domain will be presented. The first will show how an open source software to compute climate indices and indicators (icclim: https://github.com/cerfacs-globc/icclim) is leveraged using the DARE Platform to enable users to build their own workflows. The second Use Case will demonstrate how more complex tools, such as an extra-tropical and tropical cyclone tracking software (https://github.com/cerfacs-globc/cyclone_tracking), can be easily made available to end users by infrastructure and front-end software developers.
How to cite: Pagé, C., Som de Cerff, W., Plieger, M., Spinuso, A., Klampanos, I., Atkinson, M., and Karkaletsis, V.: Integrating e-infrastructures for remote climate data processing, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4658, https://doi.org/10.5194/egusphere-egu2020-4658, 2020.
Accessing and processing large climate data has nowadays become a particularly challenging task for end users, due to the rapidly increasing volumes being produced and made available. Access to climate data is crucial for sustaining research and performing climate change impact assessments. These activities have strong societal impact as climate change affects and requires that almost all economic and social sectors need adapting.
The whole climate data archive is expected to reach a volume of 30 PB in 2020 and up to 2000 PB in 2024 (estimated), evolving from 0.03 PB (30 TB) in 2007 and 2 PB in 2014. Data processing and analysis must now take place remotely for the users: users typically have to rely on heterogeneous infrastructures and services between the data and their physical location. Developers of Research Infrastructures have to provide services to those users, hence having to define standards and generic services to fulfil those requirements.
It will be shown how the DARE eScience Platform (http://project-dare.eu) will help developers to develop needed services more quickly and transparently for a large range of scientific researchers. The platform is designed for efficient and traceable development of complex experiments and domain-specific services. Most importantly, the DARE Platform integrates the following e-infrastructure services: the climate IS-ENES (https://is.enes.org) Research Infrastructure front-end climate4impact (C4I: https://climate4impact.eu), the EUDAT CDI (https://www.eudat.eu/eudat-collaborative-data-infrastructure-cdi) B2DROP Service, as well as the ESGF (https://esgf.llnl.gov). The DARE Platform itself can be deployed by research communities on local, public or commercial clouds, thanks to its containerized architecture.
More specifically, two distinct Use Cases for the climate science domain will be presented. The first will show how an open source software to compute climate indices and indicators (icclim: https://github.com/cerfacs-globc/icclim) is leveraged using the DARE Platform to enable users to build their own workflows. The second Use Case will demonstrate how more complex tools, such as an extra-tropical and tropical cyclone tracking software (https://github.com/cerfacs-globc/cyclone_tracking), can be easily made available to end users by infrastructure and front-end software developers.
How to cite: Pagé, C., Som de Cerff, W., Plieger, M., Spinuso, A., Klampanos, I., Atkinson, M., and Karkaletsis, V.: Integrating e-infrastructures for remote climate data processing, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4658, https://doi.org/10.5194/egusphere-egu2020-4658, 2020.
The Climate Data Operators [1] tool kit (CDO) is a worldwide popular infrastructure software developed and maintained at the Max Planck Institute for Meteorology (MPI-M). It comprises a large number of command line operators for gridded data, including statistics, interpolation, or arithmetics. Users benefit from the extensive support facilities provided by the MPI-M and the DKRZ.
As a part of the sixth phase of the Coupled Model Intercomparison Project (CMIP6), the German Federal Ministry of Education and Research (BMBF) is funding activities promoting the use of the CDOs for CMIP6 data preparation and analysis.
The operator ‘cmor’ has been developed to enable users to prepare their data according to the CMIP6 data standard. It is part of the web-based CMIP6 post-processing infrastructure [2] which is developed at DKRZ and used by different Earth System Models. The CDO metadata and its data model have been expanded to include the CMIP6 data standard so that users can use the tool for project data evaluation.
As a second activity, operators for 27 climate extremes indices, which were defined by the Expert Team on Climate Change Detection and Indices (ETCCDI), have been integrated into the tool. As with CMIP5, the ETCCDI climate extremes indices will be part of CMIP6 model analyses due to their robustness and straightforward interpretation.
This contribution provides an insight into advanced CDO application and offers ideas for post-processing optimization.
[1] Schulzweida, U. (2019): CDO user guide. code.mpimet.mpg.de/projects/cdo , last access: 01.13.2020.
[2] Schupfner, M. (2020): The CMIP6 Data Request WebGUI. c6dreq.dkrz.de , last access: 01.13.2020.
How to cite: Wachsmann, F.: CDOs for CMIP6 and Climate Extremes Indices, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8543, https://doi.org/10.5194/egusphere-egu2020-8543, 2020.
The Climate Data Operators [1] tool kit (CDO) is a worldwide popular infrastructure software developed and maintained at the Max Planck Institute for Meteorology (MPI-M). It comprises a large number of command line operators for gridded data, including statistics, interpolation, or arithmetics. Users benefit from the extensive support facilities provided by the MPI-M and the DKRZ.
As a part of the sixth phase of the Coupled Model Intercomparison Project (CMIP6), the German Federal Ministry of Education and Research (BMBF) is funding activities promoting the use of the CDOs for CMIP6 data preparation and analysis.
The operator ‘cmor’ has been developed to enable users to prepare their data according to the CMIP6 data standard. It is part of the web-based CMIP6 post-processing infrastructure [2] which is developed at DKRZ and used by different Earth System Models. The CDO metadata and its data model have been expanded to include the CMIP6 data standard so that users can use the tool for project data evaluation.
As a second activity, operators for 27 climate extremes indices, which were defined by the Expert Team on Climate Change Detection and Indices (ETCCDI), have been integrated into the tool. As with CMIP5, the ETCCDI climate extremes indices will be part of CMIP6 model analyses due to their robustness and straightforward interpretation.
This contribution provides an insight into advanced CDO application and offers ideas for post-processing optimization.
[1] Schulzweida, U. (2019): CDO user guide. code.mpimet.mpg.de/projects/cdo , last access: 01.13.2020.
[2] Schupfner, M. (2020): The CMIP6 Data Request WebGUI. c6dreq.dkrz.de , last access: 01.13.2020.
How to cite: Wachsmann, F.: CDOs for CMIP6 and Climate Extremes Indices, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8543, https://doi.org/10.5194/egusphere-egu2020-8543, 2020.
EGU2020-9532 | Displays | ESSI1.12 | Highlight
Flexible Java based platform for integration of models and datasets in Earth Systems Science Prediction Systems: methodology and implementation for predicting spreading of radioactive contamination from accidentsDmytro Trybushnyi, Wolfgang Raskob, Ievgen Ievdin, Tim Müller, Oleksandr Pylypenko, and Mark Zheleznyak
An important aspect of an Earth Systems Science Prediction Systems (ESSPS) is to describe and predict the behavior of contaminants in different environmental compartments following severe accidents at chemical and nuclear installations. Such an ESSPS could be designed as a platform allowing to integrate models describing atmospheric, hydrological, oceanographic processes, physical-chemical transformation of the pollutants in the environment, contamination of food chain, and finally the overall exposure of the population with harmful substances. Such a chain of connected simulation models needed to describe the consequences of severe accidents in the different phases of an emergency should use different input data ranging from real-time online meteorological to long-term numerical weather prediction or ocean data.
One example of an ESSPS is the Decision Support Systems JRODOS for off-site emergency management after nuclear emergencies. It integrates many different simulation models, real-time monitoring, regional GIS information, source term databases, and geospatial data for population and environmental characteristics.
The development of the system started in 1992 supported by European Commission’s RTD Framework programs. Attracting more and more end users, the technical basis of of the system had to be considerably improved. For this, Java has been selected as a high level software language suitable for development of distributed cross-platform enterprise quality applications. From the other hand, a great deal of scientific computational software is available only as C/C++/FORTRAN packages. Moreover, it is a common scenario when some outputs of model A should act as inputs of model B, but the two models do not share common exchange containers and/or are written in different programming languages.
To combine the flexibility of Java language and the speed and availability of scientific codes, and to be able to connect different computational codes into one chain of models, the notion of distributed wrapper objects (DWO) has been introduced. DWO provides logical, visual and technical means for the integration of computational models into the core of the system system, even if models and the system use different programming languages. The DWO technology allows various levels of interactivity including pull- and push driven chains, user interaction support, and sub-models calls. All the DWO data exchange is realized in memory and does not include IO disk operations, thus eliminating redundant reader/writer code and minimizing slow disk access. These features introduce more stability and performance of an ESSPS that is used for decision support.
The current status of the DWO realization in JRODOS is presented focusing on the added value compared to traditional integration of different simulation models into one system.
How to cite: Trybushnyi, D., Raskob, W., Ievdin, I., Müller, T., Pylypenko, O., and Zheleznyak, M.: Flexible Java based platform for integration of models and datasets in Earth Systems Science Prediction Systems: methodology and implementation for predicting spreading of radioactive contamination from accidents, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9532, https://doi.org/10.5194/egusphere-egu2020-9532, 2020.
An important aspect of an Earth Systems Science Prediction Systems (ESSPS) is to describe and predict the behavior of contaminants in different environmental compartments following severe accidents at chemical and nuclear installations. Such an ESSPS could be designed as a platform allowing to integrate models describing atmospheric, hydrological, oceanographic processes, physical-chemical transformation of the pollutants in the environment, contamination of food chain, and finally the overall exposure of the population with harmful substances. Such a chain of connected simulation models needed to describe the consequences of severe accidents in the different phases of an emergency should use different input data ranging from real-time online meteorological to long-term numerical weather prediction or ocean data.
One example of an ESSPS is the Decision Support Systems JRODOS for off-site emergency management after nuclear emergencies. It integrates many different simulation models, real-time monitoring, regional GIS information, source term databases, and geospatial data for population and environmental characteristics.
The development of the system started in 1992 supported by European Commission’s RTD Framework programs. Attracting more and more end users, the technical basis of of the system had to be considerably improved. For this, Java has been selected as a high level software language suitable for development of distributed cross-platform enterprise quality applications. From the other hand, a great deal of scientific computational software is available only as C/C++/FORTRAN packages. Moreover, it is a common scenario when some outputs of model A should act as inputs of model B, but the two models do not share common exchange containers and/or are written in different programming languages.
To combine the flexibility of Java language and the speed and availability of scientific codes, and to be able to connect different computational codes into one chain of models, the notion of distributed wrapper objects (DWO) has been introduced. DWO provides logical, visual and technical means for the integration of computational models into the core of the system system, even if models and the system use different programming languages. The DWO technology allows various levels of interactivity including pull- and push driven chains, user interaction support, and sub-models calls. All the DWO data exchange is realized in memory and does not include IO disk operations, thus eliminating redundant reader/writer code and minimizing slow disk access. These features introduce more stability and performance of an ESSPS that is used for decision support.
The current status of the DWO realization in JRODOS is presented focusing on the added value compared to traditional integration of different simulation models into one system.
How to cite: Trybushnyi, D., Raskob, W., Ievdin, I., Müller, T., Pylypenko, O., and Zheleznyak, M.: Flexible Java based platform for integration of models and datasets in Earth Systems Science Prediction Systems: methodology and implementation for predicting spreading of radioactive contamination from accidents, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9532, https://doi.org/10.5194/egusphere-egu2020-9532, 2020.
EGU2020-13105 | Displays | ESSI1.12
Web-based post-processing workflow composition for CMIP6Martin Schupfner and Fabian Wachsmann
CMIP6 defines a data standard as well as a data request (DReq) in order to facilitate analysis across results from different climate models. For most model output, post-processing is required to make it CMIP6 compliant. The German Federal Ministry of Education and Research (BMBF) is funding a project [1] providing services which help with the production of quality-assured CMIP6 compliant data according to the DReq.
In that project, a web-based GUI [2] has been developed which guides the modelers through the different steps of the data post-processing workflow, allowing to orchestrate the aggregation, diagnostic and standardizing of the model data in a modular manner. Therefor the website provides several functionalities:
1. A DReq generator, based on Martin Juckes’ DreqPy API [3], can be used to tailor the DReq according to the envisaged experiments and supported MIPs. Moreover, the expected data volume can be calculated.
2. The mapping between variables of the DReq and of the raw model output can be specified. These specifications (model variable names, units, etc.) may include diagnostic algorithms and are stored in a database.
3. The variable mapping information can be retrieved as a mapping table (MT). Additionally, this information can be used to create post-processing script fragments. One of the script fragments contains processing commands based on the diagnostic algorithms entered into the mapping GUI, whereas the other rewrites the (diagnosed) data in a CMIP6 compliant format. Both script fragments use the CDO tool kit [4] developed at the Max Planck Institute for Meteorology, namely the CDO expr and cmor [5] operators. The latter makes use of the CMOR3 library [6] and parses the MT. The script fragments are meant to be integrated into CMIP6 data workflows or scripts. A template for such a script, that allows for a modular and flexible process control of the single workflow steps, will be included when downloading the script fragments.
4. User specific metadata can be generated, which supply the CDO cmor operator with the required and correct metadata as specified in the CMIP6 controlled vocabulary (CV).
[1] National CMIP6 Support Activities. https://www.dkrz.de/c6de , last access 9.1.2020.
[2] Martin Schupfner (2018): CMIP6 Data Request WebGUI. https://c6dreq.dkrz.de/ , last access 9.1.2020.
[3] Martin Juckes (2018): Data Request Python API. Vers. 01.00.28. http://proj.badc.rl.ac.uk/svn/exarch/CMIP6dreq/tags/latest/dreqPy/docs/dreqPy.pdf , last access 9.1.2020.
[4] Uwe Schulzweida (2019): CDO User Guide. Climate Data Operators. Vers. 1.9.8. https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf , last access 9.1.2020.
[5] Fabian Wachsmann (2017): The cdo cmor operator. https://code.mpimet.mpg.de/attachments/19411/cdo_cmor.pdf , last access 9.1.2020.
[6] Denis Nadeau (2018): CMOR version 3.3. https://cmor.llnl.gov/pdf/mydoc.pdf , last access 9.1.2020.
How to cite: Schupfner, M. and Wachsmann, F.: Web-based post-processing workflow composition for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13105, https://doi.org/10.5194/egusphere-egu2020-13105, 2020.
CMIP6 defines a data standard as well as a data request (DReq) in order to facilitate analysis across results from different climate models. For most model output, post-processing is required to make it CMIP6 compliant. The German Federal Ministry of Education and Research (BMBF) is funding a project [1] providing services which help with the production of quality-assured CMIP6 compliant data according to the DReq.
In that project, a web-based GUI [2] has been developed which guides the modelers through the different steps of the data post-processing workflow, allowing to orchestrate the aggregation, diagnostic and standardizing of the model data in a modular manner. Therefor the website provides several functionalities:
1. A DReq generator, based on Martin Juckes’ DreqPy API [3], can be used to tailor the DReq according to the envisaged experiments and supported MIPs. Moreover, the expected data volume can be calculated.
2. The mapping between variables of the DReq and of the raw model output can be specified. These specifications (model variable names, units, etc.) may include diagnostic algorithms and are stored in a database.
3. The variable mapping information can be retrieved as a mapping table (MT). Additionally, this information can be used to create post-processing script fragments. One of the script fragments contains processing commands based on the diagnostic algorithms entered into the mapping GUI, whereas the other rewrites the (diagnosed) data in a CMIP6 compliant format. Both script fragments use the CDO tool kit [4] developed at the Max Planck Institute for Meteorology, namely the CDO expr and cmor [5] operators. The latter makes use of the CMOR3 library [6] and parses the MT. The script fragments are meant to be integrated into CMIP6 data workflows or scripts. A template for such a script, that allows for a modular and flexible process control of the single workflow steps, will be included when downloading the script fragments.
4. User specific metadata can be generated, which supply the CDO cmor operator with the required and correct metadata as specified in the CMIP6 controlled vocabulary (CV).
[1] National CMIP6 Support Activities. https://www.dkrz.de/c6de , last access 9.1.2020.
[2] Martin Schupfner (2018): CMIP6 Data Request WebGUI. https://c6dreq.dkrz.de/ , last access 9.1.2020.
[3] Martin Juckes (2018): Data Request Python API. Vers. 01.00.28. http://proj.badc.rl.ac.uk/svn/exarch/CMIP6dreq/tags/latest/dreqPy/docs/dreqPy.pdf , last access 9.1.2020.
[4] Uwe Schulzweida (2019): CDO User Guide. Climate Data Operators. Vers. 1.9.8. https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf , last access 9.1.2020.
[5] Fabian Wachsmann (2017): The cdo cmor operator. https://code.mpimet.mpg.de/attachments/19411/cdo_cmor.pdf , last access 9.1.2020.
[6] Denis Nadeau (2018): CMOR version 3.3. https://cmor.llnl.gov/pdf/mydoc.pdf , last access 9.1.2020.
How to cite: Schupfner, M. and Wachsmann, F.: Web-based post-processing workflow composition for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13105, https://doi.org/10.5194/egusphere-egu2020-13105, 2020.
EGU2020-13306 | Displays | ESSI1.12 | Highlight
CMIP model evaluation with the ESMValTool v2.0Axel Lauer, Fernando Iglesias-Suarez, Veronika Eyring, and the ESMValTool development team
The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest version (2.0) of the ESMValTool has been developed as a large community effort to specifically target the increased data volume of the Coupled Model Intercomparison Project Phase 6 (CMIP6) and the related challenges posed by analysis and evaluation of output from multiple high-resolution and complex ESMs. For this, the core functionalities have been completely rewritten in order to take advantage of state-of-the-art computational libraries and methods to allow for efficient and user-friendly data processing. Common operations on the input data such as regridding or computation of multi-model statistics are now centralized in a highly optimized preprocessor written in Python. The diagnostic part of the ESMValTool includes a large collection of standard recipes for reproducing peer-reviewed analyses of many variables across atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation introduces the diagnostics newly implemented into ESMValTool v2.0 including an extended set of large-scale diagnostics for quasi-operational and comprehensive evaluation of ESMs, new diagnostics for extreme events, regional model and impact evaluation and analysis of ESMs, as well as diagnostics for emergent constraints and analysis of future projections from ESMs. The new diagnostics are illustrated with examples using results from the well-established CMIP5 and the newly available CMIP6 data sets.
How to cite: Lauer, A., Iglesias-Suarez, F., Eyring, V., and development team, T. E.: CMIP model evaluation with the ESMValTool v2.0, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13306, https://doi.org/10.5194/egusphere-egu2020-13306, 2020.
The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest version (2.0) of the ESMValTool has been developed as a large community effort to specifically target the increased data volume of the Coupled Model Intercomparison Project Phase 6 (CMIP6) and the related challenges posed by analysis and evaluation of output from multiple high-resolution and complex ESMs. For this, the core functionalities have been completely rewritten in order to take advantage of state-of-the-art computational libraries and methods to allow for efficient and user-friendly data processing. Common operations on the input data such as regridding or computation of multi-model statistics are now centralized in a highly optimized preprocessor written in Python. The diagnostic part of the ESMValTool includes a large collection of standard recipes for reproducing peer-reviewed analyses of many variables across atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation introduces the diagnostics newly implemented into ESMValTool v2.0 including an extended set of large-scale diagnostics for quasi-operational and comprehensive evaluation of ESMs, new diagnostics for extreme events, regional model and impact evaluation and analysis of ESMs, as well as diagnostics for emergent constraints and analysis of future projections from ESMs. The new diagnostics are illustrated with examples using results from the well-established CMIP5 and the newly available CMIP6 data sets.
How to cite: Lauer, A., Iglesias-Suarez, F., Eyring, V., and development team, T. E.: CMIP model evaluation with the ESMValTool v2.0, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13306, https://doi.org/10.5194/egusphere-egu2020-13306, 2020.
EGU2020-14745 | Displays | ESSI1.12
ESMValTool pre-processing functions for eWaterCycleFakhereh Alidoost, Jerom Aerts, Bouwe Andela, Jaro Camphuijsen, Nick van De Giesen, Gijs van Den Oord, Niels Drost, Yifat Dzigan, Ronald van Haren, Rolf Hut, Peter C. Kalverla, Inti Pelupessy, Stefan Verhoeven, Berend Weel, and Ben van Werkhoven
eWaterCycle is a framework in which hydrological modelers can work together in a collaborative environment. In this environment, they can, for example, compare and analyze the results of models that use different sources of (meteorological) forcing data. The final goal of eWaterCycle is to advance the state of FAIR (Findable, Accessible, Interoperable, and Reusable) and open science in hydrological modeling.
Comparing hydrological models has always been a challenging task. Hydrological models exhibit great complexity and diversity in the exact methodologies applied, competing for hypotheses of hydrologic behavior, technology stacks, and programming languages used in those models. Pre-processing of forcing data is one of the roadblocks that was identified during the FAIR Hydrological Modelling workshop organized by the Lorentz Center in April 2019. Forcing data can be retrieved from a wide variety of sources with discrepant variable names and frequencies, and spatial and temporal resolutions. Moreover, some hydrological models make specific assumptions about the definition of the forcing variables. The pre-processing is often performed by various sets of scripts that may or may not be included with model source codes, making it hard to reproduce results. Generally, there are common steps in the data preparation among different models. Therefore, it would be a valuable asset to the hydrological community if the pre-processing of FAIR input data could also be done in a FAIR manner.
Within the context of the eWaterCycle II project, a common pre-processing system has been created for hydrological modeling based on ESMValTool (Earth System Model Evaluation Tool). ESMValTool is a community diagnostic and performance metrics tool developed for the evaluation of Earth system models. The ESMValTool pre-processing functions cover a broad range of operations on data before diagnostics or metrics are applied; for example, vertical interpolation, land-sea masking, re-gridding, multi-model statistics, temporal and spatial manipulations, variable derivation and unit conversion. The pre-processor performs these operations in a centralized, documented and efficient way. The current pre-processing pipeline of the eWaterCycle using ESMValTool consists of hydrological model-specific recipes and supports ERA5 and ERA-Interim data provided by the ECMWF (European Centre for Medium-Range Weather Forecasts). The pipeline starts with the downloading and CMORization (Climate Model Output Rewriter) of input data. Then a recipe is prepared to find the data and run the preprocessors. When ESMValTool runs a recipe, it will also run the diagnostic script that contains model-specific analysis to derive required forcing variables, and it will store provenance information to ensure transparency and reproducibility. In the near future, the pipeline is extended to include Earth observation data, as these data are paramount to the data assimilation in eWaterCycle.
In this presentation we will show how using the pre-processor from ESMValTool for Hydrological modeling leads to connecting Hydrology and Climate sciences, and increase the impact and sustainability of ESMValTool.
How to cite: Alidoost, F., Aerts, J., Andela, B., Camphuijsen, J., van De Giesen, N., van Den Oord, G., Drost, N., Dzigan, Y., van Haren, R., Hut, R., Kalverla, P. C., Pelupessy, I., Verhoeven, S., Weel, B., and van Werkhoven, B.: ESMValTool pre-processing functions for eWaterCycle, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14745, https://doi.org/10.5194/egusphere-egu2020-14745, 2020.
eWaterCycle is a framework in which hydrological modelers can work together in a collaborative environment. In this environment, they can, for example, compare and analyze the results of models that use different sources of (meteorological) forcing data. The final goal of eWaterCycle is to advance the state of FAIR (Findable, Accessible, Interoperable, and Reusable) and open science in hydrological modeling.
Comparing hydrological models has always been a challenging task. Hydrological models exhibit great complexity and diversity in the exact methodologies applied, competing for hypotheses of hydrologic behavior, technology stacks, and programming languages used in those models. Pre-processing of forcing data is one of the roadblocks that was identified during the FAIR Hydrological Modelling workshop organized by the Lorentz Center in April 2019. Forcing data can be retrieved from a wide variety of sources with discrepant variable names and frequencies, and spatial and temporal resolutions. Moreover, some hydrological models make specific assumptions about the definition of the forcing variables. The pre-processing is often performed by various sets of scripts that may or may not be included with model source codes, making it hard to reproduce results. Generally, there are common steps in the data preparation among different models. Therefore, it would be a valuable asset to the hydrological community if the pre-processing of FAIR input data could also be done in a FAIR manner.
Within the context of the eWaterCycle II project, a common pre-processing system has been created for hydrological modeling based on ESMValTool (Earth System Model Evaluation Tool). ESMValTool is a community diagnostic and performance metrics tool developed for the evaluation of Earth system models. The ESMValTool pre-processing functions cover a broad range of operations on data before diagnostics or metrics are applied; for example, vertical interpolation, land-sea masking, re-gridding, multi-model statistics, temporal and spatial manipulations, variable derivation and unit conversion. The pre-processor performs these operations in a centralized, documented and efficient way. The current pre-processing pipeline of the eWaterCycle using ESMValTool consists of hydrological model-specific recipes and supports ERA5 and ERA-Interim data provided by the ECMWF (European Centre for Medium-Range Weather Forecasts). The pipeline starts with the downloading and CMORization (Climate Model Output Rewriter) of input data. Then a recipe is prepared to find the data and run the preprocessors. When ESMValTool runs a recipe, it will also run the diagnostic script that contains model-specific analysis to derive required forcing variables, and it will store provenance information to ensure transparency and reproducibility. In the near future, the pipeline is extended to include Earth observation data, as these data are paramount to the data assimilation in eWaterCycle.
In this presentation we will show how using the pre-processor from ESMValTool for Hydrological modeling leads to connecting Hydrology and Climate sciences, and increase the impact and sustainability of ESMValTool.
How to cite: Alidoost, F., Aerts, J., Andela, B., Camphuijsen, J., van De Giesen, N., van Den Oord, G., Drost, N., Dzigan, Y., van Haren, R., Hut, R., Kalverla, P. C., Pelupessy, I., Verhoeven, S., Weel, B., and van Werkhoven, B.: ESMValTool pre-processing functions for eWaterCycle, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14745, https://doi.org/10.5194/egusphere-egu2020-14745, 2020.
EGU2020-17472 | Displays | ESSI1.12
ESMValCore: analyzing CMIP data made easyBouwe Andela, Lisa Bock, Björn Brötz, Faruk Diblen, Laura Dreyer, Niels Drost, Paul Earnshaw, Veronika Eyring, Birgit Hassler, Nikolay Koldunov, Axel Lauer, Bill Little, Saskia Loosveldt-Tomas, Lee de Mora, Valeriu Predoi, Mattia Righi, Manuel Schlund, Javier Vegas-Regidor, and Klaus Zimmermann
The Earth System Model Evaluation Tool (ESMValTool) is a free and open-source community diagnostic and performance metrics tool for the evaluation of Earth system models participating in the Coupled Model Intercomparison Project (CMIP). Version 2 of the tool (Righi et al. 2019, www.esmvaltool.org) features a brand new design, consisting of ESMValCore (https://github.com/esmvalgroup/esmvalcore), a package for working with CMIP data and ESMValTool (https://github.com/esmvalgroup/esmvaltool), a package containing the scientific analysis scripts. This new version has been specifically developed to handle the increased data volume of CMIP Phase 6 (CMIP6) and the related challenges posed by the analysis and the evaluation of output from multiple high-resolution or complex Earth system models. The tool also supports CMIP5 and CMIP3 datasets, as well as a large number of re-analysis and observational datasets that can be formatted according to the same standards (CMOR) on-the-fly or through scripts currently included in the ESMValTool package.
At the heart of this new version is the ESMValCore software package, which provides a configurable framework for finding CMIP files using a “data reference syntax”, applying commonly used pre-processing functions to them, running analysis scripts, and recording provenance. Numerous pre-processing functions, e.g. for data selection, regridding, and statistics are readily available and the modular design makes it easy to add more. The ESMValCore package is easy to install with relatively few dependencies, written in Python 3, based on state-of-the-art open-source libraries such as Iris and Dask, and widely used standards such as YAML, NetCDF, CF-Conventions, and W3C PROV. An extensive set of automated tests and code quality checks ensure the reliability of the package. Documentation is available at https://esmvaltool.readthedocs.io.
The ESMValCore package uses human-readable recipes to define which variables and datasets to use, how to pre-process that data, and what scientific analysis scripts to run. The package provides convenient interfaces, based on the YAML and NetCDF/CF-convention file formats, for running diagnostic scripts written in any programming language. Because the ESMValCore framework takes care of running the workflow defined in the recipe in parallel, most analyses run much faster, with no additional programming effort required from the authors of the analysis scripts. For example, benchmarks show a factor of 30 speedup with respect to version 1 of the tool for a representative recipe on a 24 core machine. A large collection of standard recipes and associated analysis scripts is available in the ESMValTool package for reproducing selected peer-reviewed analyses. The ESMValCore package can also be used with any other script that implements it’s easy to use interface. All pre-processing functions of the ESMValCore can also be used directly from any Python program. These features allow for use by a wide community of scientific users and developers with different levels of programming skills and experience.
Future plans involve extending the public Python API (application programming interface) from just preprocessor functions to include all functionality, including finding the data and running diagnostic scripts. This would make ESMValCore suitable for interactive data exploration from a Jupyter Notebook.
How to cite: Andela, B., Bock, L., Brötz, B., Diblen, F., Dreyer, L., Drost, N., Earnshaw, P., Eyring, V., Hassler, B., Koldunov, N., Lauer, A., Little, B., Loosveldt-Tomas, S., de Mora, L., Predoi, V., Righi, M., Schlund, M., Vegas-Regidor, J., and Zimmermann, K.: ESMValCore: analyzing CMIP data made easy, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17472, https://doi.org/10.5194/egusphere-egu2020-17472, 2020.
The Earth System Model Evaluation Tool (ESMValTool) is a free and open-source community diagnostic and performance metrics tool for the evaluation of Earth system models participating in the Coupled Model Intercomparison Project (CMIP). Version 2 of the tool (Righi et al. 2019, www.esmvaltool.org) features a brand new design, consisting of ESMValCore (https://github.com/esmvalgroup/esmvalcore), a package for working with CMIP data and ESMValTool (https://github.com/esmvalgroup/esmvaltool), a package containing the scientific analysis scripts. This new version has been specifically developed to handle the increased data volume of CMIP Phase 6 (CMIP6) and the related challenges posed by the analysis and the evaluation of output from multiple high-resolution or complex Earth system models. The tool also supports CMIP5 and CMIP3 datasets, as well as a large number of re-analysis and observational datasets that can be formatted according to the same standards (CMOR) on-the-fly or through scripts currently included in the ESMValTool package.
At the heart of this new version is the ESMValCore software package, which provides a configurable framework for finding CMIP files using a “data reference syntax”, applying commonly used pre-processing functions to them, running analysis scripts, and recording provenance. Numerous pre-processing functions, e.g. for data selection, regridding, and statistics are readily available and the modular design makes it easy to add more. The ESMValCore package is easy to install with relatively few dependencies, written in Python 3, based on state-of-the-art open-source libraries such as Iris and Dask, and widely used standards such as YAML, NetCDF, CF-Conventions, and W3C PROV. An extensive set of automated tests and code quality checks ensure the reliability of the package. Documentation is available at https://esmvaltool.readthedocs.io.
The ESMValCore package uses human-readable recipes to define which variables and datasets to use, how to pre-process that data, and what scientific analysis scripts to run. The package provides convenient interfaces, based on the YAML and NetCDF/CF-convention file formats, for running diagnostic scripts written in any programming language. Because the ESMValCore framework takes care of running the workflow defined in the recipe in parallel, most analyses run much faster, with no additional programming effort required from the authors of the analysis scripts. For example, benchmarks show a factor of 30 speedup with respect to version 1 of the tool for a representative recipe on a 24 core machine. A large collection of standard recipes and associated analysis scripts is available in the ESMValTool package for reproducing selected peer-reviewed analyses. The ESMValCore package can also be used with any other script that implements it’s easy to use interface. All pre-processing functions of the ESMValCore can also be used directly from any Python program. These features allow for use by a wide community of scientific users and developers with different levels of programming skills and experience.
Future plans involve extending the public Python API (application programming interface) from just preprocessor functions to include all functionality, including finding the data and running diagnostic scripts. This would make ESMValCore suitable for interactive data exploration from a Jupyter Notebook.
How to cite: Andela, B., Bock, L., Brötz, B., Diblen, F., Dreyer, L., Drost, N., Earnshaw, P., Eyring, V., Hassler, B., Koldunov, N., Lauer, A., Little, B., Loosveldt-Tomas, S., de Mora, L., Predoi, V., Righi, M., Schlund, M., Vegas-Regidor, J., and Zimmermann, K.: ESMValCore: analyzing CMIP data made easy, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17472, https://doi.org/10.5194/egusphere-egu2020-17472, 2020.
EGU2020-18454 | Displays | ESSI1.12
A simulator for the CLARA-A2 cloud climate data record and its application to assess EC-Earth polar cloudinessSalomon Eliasson, Karl Göran Karlsson, and Ulrika Willén
One of the primary purposes of satellite simulators is to emulate the inability of retrievals, based on visible and infrared sensors, to detect subvisible clouds from space by removing them from the model. The current simulators in the COSP rely on a single visible cloud optical depth (τ)-threshold (τ=0.3) applied globally to delineate cloudy and cloud-free conditions. However, in reality, the cloud sensitivity of a retrieval varies regionally.
This presentation describes the satellite simulator for the CLARA-A2 climate data record (CDR). The CLARA simulator takes into account the variable
skill in cloud detection of the CLARA-A2 CDR using long/lat-gridded values separated by daytime and nighttime, which enable it to filter out clouds from
climate models that would be undetectable by observations. We introduce two methods of cloud mask simulation, one that depends on a spatially variable
τ-threshold and one that uses the cloud probability of detection (POD) as a function of the model τ and long/lat. The gridded POD values are from the
CLARA-A2 validation study by Karlsson and Hakansson (2018).
Both methods replicate the relative ease or difficulty for cloud retrievals, depending on the region and illumination. They increase the cloud sensitivity where the cloud retrievals are relatively straightforward, such as over mid-latitude oceans, and they decrease the sensitivity where cloud retrievals are
notoriously tricky, such as where thick clouds may be inseparable from cold, snow-covered surfaces, as well as in areas with an abundance of broken and
small-scale cumulus clouds such as the atmospheric subsidence regions over the ocean.
The CLARA simulator, together with the International Satellite Cloud Climatology Project (ISCCP) simulator of the COSP, is used to assess Arctic clouds in the EC-Earth climate model compared to the CLARA-A2 and ISCCP H-Series CDRs. Compared to CLARA-A2, EC-Earth generally underestimates cloudiness in the Arctic. However, compared to ISCCP and its simulator, the opposite conclusion is reached. Based on EC-Earth, this paper shows that the simulated cloud mask of CLARA-A2 is more representative of the CDR than using a global optical depth threshold, such as used by the ISCCP simulator.
The simulator substantially improves the simulation of the CLARA-A2-detected clouds compared to a global optical depth threshold, especially in the polar regions, by accounting for the variable cloud detection skill over the year.
How to cite: Eliasson, S., Karlsson, K. G., and Willén, U.: A simulator for the CLARA-A2 cloud climate data record and its application to assess EC-Earth polar cloudiness, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18454, https://doi.org/10.5194/egusphere-egu2020-18454, 2020.
One of the primary purposes of satellite simulators is to emulate the inability of retrievals, based on visible and infrared sensors, to detect subvisible clouds from space by removing them from the model. The current simulators in the COSP rely on a single visible cloud optical depth (τ)-threshold (τ=0.3) applied globally to delineate cloudy and cloud-free conditions. However, in reality, the cloud sensitivity of a retrieval varies regionally.
This presentation describes the satellite simulator for the CLARA-A2 climate data record (CDR). The CLARA simulator takes into account the variable
skill in cloud detection of the CLARA-A2 CDR using long/lat-gridded values separated by daytime and nighttime, which enable it to filter out clouds from
climate models that would be undetectable by observations. We introduce two methods of cloud mask simulation, one that depends on a spatially variable
τ-threshold and one that uses the cloud probability of detection (POD) as a function of the model τ and long/lat. The gridded POD values are from the
CLARA-A2 validation study by Karlsson and Hakansson (2018).
Both methods replicate the relative ease or difficulty for cloud retrievals, depending on the region and illumination. They increase the cloud sensitivity where the cloud retrievals are relatively straightforward, such as over mid-latitude oceans, and they decrease the sensitivity where cloud retrievals are
notoriously tricky, such as where thick clouds may be inseparable from cold, snow-covered surfaces, as well as in areas with an abundance of broken and
small-scale cumulus clouds such as the atmospheric subsidence regions over the ocean.
The CLARA simulator, together with the International Satellite Cloud Climatology Project (ISCCP) simulator of the COSP, is used to assess Arctic clouds in the EC-Earth climate model compared to the CLARA-A2 and ISCCP H-Series CDRs. Compared to CLARA-A2, EC-Earth generally underestimates cloudiness in the Arctic. However, compared to ISCCP and its simulator, the opposite conclusion is reached. Based on EC-Earth, this paper shows that the simulated cloud mask of CLARA-A2 is more representative of the CDR than using a global optical depth threshold, such as used by the ISCCP simulator.
The simulator substantially improves the simulation of the CLARA-A2-detected clouds compared to a global optical depth threshold, especially in the polar regions, by accounting for the variable cloud detection skill over the year.
How to cite: Eliasson, S., Karlsson, K. G., and Willén, U.: A simulator for the CLARA-A2 cloud climate data record and its application to assess EC-Earth polar cloudiness, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18454, https://doi.org/10.5194/egusphere-egu2020-18454, 2020.
EGU2020-19181 | Displays | ESSI1.12
ESMValTool - introducing a powerful model evaluation toolValeriu Predoi, Bouwe Andela, Lee De Mora, and Axel Lauer
The Earth System Model eValuation Tool (ESMValTool) is a powerful community-driven diagnostics and performance metrics tool. It is used for the evaluation of Earth System Models (ESMs) and allows for routine comparisons of either multiple model versions or observational datasets. ESMValTool's design is highly modular and flexible so that additional analyses can easily be added; in fact, this is essential to encourage the community-based approach to its scientific development. A set of standardized recipes for each scientific topic reproduces specific diagnostics or performance metrics that have demonstrated their importance in ESM evaluation in the peer-reviewed literature. Scientific themes include selected Essential Climate Variables, a range of known systematic biases common to ESMs such as coupled tropical climate variability, monsoons, Southern Ocean processes, continental dry biases and soil hydrology-climate interactions, as well as atmospheric CO3 budgets, tropospheric and stratospheric ozone, and tropospheric aerosols. We will outline the main functional characteristics of ESMValTool Version 2; we will also introduce the reader to the current set of diagnostics and the methods they can use to contribute to its development.
How to cite: Predoi, V., Andela, B., De Mora, L., and Lauer, A.: ESMValTool - introducing a powerful model evaluation tool, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19181, https://doi.org/10.5194/egusphere-egu2020-19181, 2020.
The Earth System Model eValuation Tool (ESMValTool) is a powerful community-driven diagnostics and performance metrics tool. It is used for the evaluation of Earth System Models (ESMs) and allows for routine comparisons of either multiple model versions or observational datasets. ESMValTool's design is highly modular and flexible so that additional analyses can easily be added; in fact, this is essential to encourage the community-based approach to its scientific development. A set of standardized recipes for each scientific topic reproduces specific diagnostics or performance metrics that have demonstrated their importance in ESM evaluation in the peer-reviewed literature. Scientific themes include selected Essential Climate Variables, a range of known systematic biases common to ESMs such as coupled tropical climate variability, monsoons, Southern Ocean processes, continental dry biases and soil hydrology-climate interactions, as well as atmospheric CO3 budgets, tropospheric and stratospheric ozone, and tropospheric aerosols. We will outline the main functional characteristics of ESMValTool Version 2; we will also introduce the reader to the current set of diagnostics and the methods they can use to contribute to its development.
How to cite: Predoi, V., Andela, B., De Mora, L., and Lauer, A.: ESMValTool - introducing a powerful model evaluation tool, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19181, https://doi.org/10.5194/egusphere-egu2020-19181, 2020.
EGU2020-19298 | Displays | ESSI1.12
Integrating Model Evaluation and Observations into a Production-Release PipelinePhilipp S. Sommer, Ronny Petrik, Beate Geyer, Ulrike Kleeberg, Dietmar Sauer, Linda Baldewein, Robin Luckey, Lars Möller, Housam Dibeh, and Christopher Kadow
The complexity of Earth System and Regional Climate Models represents a considerable challenge for developers. Tuning but also improving one aspect of a model can unexpectedly decrease the performance of others and introduces hidden errors. Reasons are in particular the multitude of output parameters and the shortage of reliable and complete observational datasets. One possibility to overcome these issues is a rigorous and continuous scientific evaluation of the model. This requires standardized model output and, most notably, standardized observational datasets. Additionally, in order to reduce the extra burden for the single scientist, this evaluation has to be as close as possible to the standard workflow of the researcher, and it needs to be flexible enough to adapt it to new scientific questions.
We present the Free Evaluation System Framework (Freva) implementation within the Helmholtz Coastal Data Center (HCDC) at the Institute of Coastal Research in the Helmholtz-Zentrum Geesthacht (HZG). Various plugins into the Freva software, namely the HZG-EvaSuite, use observational data to perform a standardized evaluation of the model simulation. We present a comprehensive data management infrastructure that copes with the heterogeneity of observations and simulations. This web framework comprises a FAIR and standardized database of both, large-scale and in-situ observations exported to a format suitable for data-model intercomparisons (particularly netCDF following the CF-conventions). Our pipeline links the raw data of the individual model simulations (i.e. the production of the results) to the finally published results (i.e. the released data).
Another benefit of the Freva-based evaluation is the enhanced exchange between the different compartments of the institute, particularly between the model developers and the data collectors, as Freva contains built-in functionalities to share and discuss results with colleagues. We will furthermore use the tool to strengthen the active communication with the data and software managers of the institute to generate or adapt the evaluation plugins.
How to cite: Sommer, P. S., Petrik, R., Geyer, B., Kleeberg, U., Sauer, D., Baldewein, L., Luckey, R., Möller, L., Dibeh, H., and Kadow, C.: Integrating Model Evaluation and Observations into a Production-Release Pipeline , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19298, https://doi.org/10.5194/egusphere-egu2020-19298, 2020.
The complexity of Earth System and Regional Climate Models represents a considerable challenge for developers. Tuning but also improving one aspect of a model can unexpectedly decrease the performance of others and introduces hidden errors. Reasons are in particular the multitude of output parameters and the shortage of reliable and complete observational datasets. One possibility to overcome these issues is a rigorous and continuous scientific evaluation of the model. This requires standardized model output and, most notably, standardized observational datasets. Additionally, in order to reduce the extra burden for the single scientist, this evaluation has to be as close as possible to the standard workflow of the researcher, and it needs to be flexible enough to adapt it to new scientific questions.
We present the Free Evaluation System Framework (Freva) implementation within the Helmholtz Coastal Data Center (HCDC) at the Institute of Coastal Research in the Helmholtz-Zentrum Geesthacht (HZG). Various plugins into the Freva software, namely the HZG-EvaSuite, use observational data to perform a standardized evaluation of the model simulation. We present a comprehensive data management infrastructure that copes with the heterogeneity of observations and simulations. This web framework comprises a FAIR and standardized database of both, large-scale and in-situ observations exported to a format suitable for data-model intercomparisons (particularly netCDF following the CF-conventions). Our pipeline links the raw data of the individual model simulations (i.e. the production of the results) to the finally published results (i.e. the released data).
Another benefit of the Freva-based evaluation is the enhanced exchange between the different compartments of the institute, particularly between the model developers and the data collectors, as Freva contains built-in functionalities to share and discuss results with colleagues. We will furthermore use the tool to strengthen the active communication with the data and software managers of the institute to generate or adapt the evaluation plugins.
How to cite: Sommer, P. S., Petrik, R., Geyer, B., Kleeberg, U., Sauer, D., Baldewein, L., Luckey, R., Möller, L., Dibeh, H., and Kadow, C.: Integrating Model Evaluation and Observations into a Production-Release Pipeline , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19298, https://doi.org/10.5194/egusphere-egu2020-19298, 2020.
EGU2020-21666 | Displays | ESSI1.12
Freva - Free Evaluation System Framework - New Aspects and FeaturesChristopher Kadow, Sebastian Illing, Oliver Kunst, Thomas Schartner, Jens Grieger, Mareike Schuster, Andy Richling, Ingo Kirchner, Henning Rust, Ulrich Cubasch, and Uwe Ulbrich
The Free Evaluation System Framework (Freva - freva.met.fu-berlin.de) is a software infrastructure for standardized data and tool solutions in Earth system science. Freva runs on high performance computers to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, including all features present in the shell and web environment. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). Freva indexes different data projects into one common search environment by storing the meta data information of the self-describing model, reanalysis and observational data sets in a database. This implemented meta data system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Users of the evaluation techniques benefit from the common interface of the evaluation system without any need to understand the different scripting languages. Facilitation of the provision and usage of tools and climate data automatically increases the number of scientists working with the data sets and identifying discrepancies. The integrated webshell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects HPC. Plugins are able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-systemstores every analysis performed with the evaluation system in a database. Configurations and results of the toolscan be shared among scientists via shell or web system. Therefore, plugged-in tools benefit from transparency and reproducibility. Furthermore, if configurations match while starting an evaluation plugin, the system suggests touse results already produced by other users – saving CPU/h, I/O, disk space and time. The efficient interaction between different technologies improves the Earth system modeling science framed by Freva.
New Features and aspects of further development and collaboration are discussed.
How to cite: Kadow, C., Illing, S., Kunst, O., Schartner, T., Grieger, J., Schuster, M., Richling, A., Kirchner, I., Rust, H., Cubasch, U., and Ulbrich, U.: Freva - Free Evaluation System Framework - New Aspects and Features, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21666, https://doi.org/10.5194/egusphere-egu2020-21666, 2020.
The Free Evaluation System Framework (Freva - freva.met.fu-berlin.de) is a software infrastructure for standardized data and tool solutions in Earth system science. Freva runs on high performance computers to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, including all features present in the shell and web environment. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). Freva indexes different data projects into one common search environment by storing the meta data information of the self-describing model, reanalysis and observational data sets in a database. This implemented meta data system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Users of the evaluation techniques benefit from the common interface of the evaluation system without any need to understand the different scripting languages. Facilitation of the provision and usage of tools and climate data automatically increases the number of scientists working with the data sets and identifying discrepancies. The integrated webshell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects HPC. Plugins are able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-systemstores every analysis performed with the evaluation system in a database. Configurations and results of the toolscan be shared among scientists via shell or web system. Therefore, plugged-in tools benefit from transparency and reproducibility. Furthermore, if configurations match while starting an evaluation plugin, the system suggests touse results already produced by other users – saving CPU/h, I/O, disk space and time. The efficient interaction between different technologies improves the Earth system modeling science framed by Freva.
New Features and aspects of further development and collaboration are discussed.
How to cite: Kadow, C., Illing, S., Kunst, O., Schartner, T., Grieger, J., Schuster, M., Richling, A., Kirchner, I., Rust, H., Cubasch, U., and Ulbrich, U.: Freva - Free Evaluation System Framework - New Aspects and Features, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21666, https://doi.org/10.5194/egusphere-egu2020-21666, 2020.
EGU2020-22155 | Displays | ESSI1.12
Climate Index Metadata and its ImplementationKlaus Zimmermann and Lars Bärring
Climate indices play an important role in the practical use of climate and weather data. Their application spans a wide range of topics, from impact assessment in agriculture and urban planning, over indispensable advice in the energy sector, to important evaluation in the climate science community. Several widely used standard sets of indices exist through long-standing efforts of WMO and WCRP Expert Teams (ETCCDI and ET-SCI), as well as European initiatives (ECA&D) and more recently Copernicus C3S activities. They, however, focus on the data themselves, leaving much of the metadata to the individual user. Moreover, these core sets of indices lack a coherent metadata framework that would allow for the consistent inclusion of new indices that continue to be considered every day.
In the meantime, the treatment of metadata in the wider community has received much attention. Within the climate community efforts such as the CF convention and the much-expanded scope and detail of metadata in CMIP6 have improved the clarity and long-term usability of many aspects of climate data a great deal.
We present a novel approach to metadata for climate indices. Our format describes the existing climate indices consistent with the established standards, adding metadata along the lines of existing metadata specifications. The formulation of these additions in a coherent framework encompassing most of the existing climate index standards allows for its easy extension and inclusion of new climate indices as they are developed.
We also present Climix, a new Python software for the calculation of indices based on this description. It can be seen as an example implementation of the proposed standard and features high-performance calculations based on state-of-the-art infrastructure, such as Iris and Dask. This way, it offers shared memory and distributed parallel and out-of-core computations, enabling the efficient treatment of large data volumes as incurred by the high resolution, long time-series of current and future datasets.
How to cite: Zimmermann, K. and Bärring, L.: Climate Index Metadata and its Implementation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22155, https://doi.org/10.5194/egusphere-egu2020-22155, 2020.
Climate indices play an important role in the practical use of climate and weather data. Their application spans a wide range of topics, from impact assessment in agriculture and urban planning, over indispensable advice in the energy sector, to important evaluation in the climate science community. Several widely used standard sets of indices exist through long-standing efforts of WMO and WCRP Expert Teams (ETCCDI and ET-SCI), as well as European initiatives (ECA&D) and more recently Copernicus C3S activities. They, however, focus on the data themselves, leaving much of the metadata to the individual user. Moreover, these core sets of indices lack a coherent metadata framework that would allow for the consistent inclusion of new indices that continue to be considered every day.
In the meantime, the treatment of metadata in the wider community has received much attention. Within the climate community efforts such as the CF convention and the much-expanded scope and detail of metadata in CMIP6 have improved the clarity and long-term usability of many aspects of climate data a great deal.
We present a novel approach to metadata for climate indices. Our format describes the existing climate indices consistent with the established standards, adding metadata along the lines of existing metadata specifications. The formulation of these additions in a coherent framework encompassing most of the existing climate index standards allows for its easy extension and inclusion of new climate indices as they are developed.
We also present Climix, a new Python software for the calculation of indices based on this description. It can be seen as an example implementation of the proposed standard and features high-performance calculations based on state-of-the-art infrastructure, such as Iris and Dask. This way, it offers shared memory and distributed parallel and out-of-core computations, enabling the efficient treatment of large data volumes as incurred by the high resolution, long time-series of current and future datasets.
How to cite: Zimmermann, K. and Bärring, L.: Climate Index Metadata and its Implementation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22155, https://doi.org/10.5194/egusphere-egu2020-22155, 2020.
ESSI1.15 – Towards SMART Monitoring and Integrated Data Exploration of the Earth System
EGU2020-21816 | Displays | ESSI1.15 | Highlight
The challenge of sensor selection, long term-sensor operation and data evaluation in inter- -institutional long term monitoring projects (lessons learned in the MOSES project)Philipp Fischer, Madlen Friedrich, Markus Brand, Uta Koedel, Peter Dietrich, Holger Brix, Dorit Kerschke, and Ingeborg Bussmann
Measuring environmental variables over longer times in coastal marine environments is a challenge in regard to sensor maintenance and data processing of continuously produced comprehensive datasets. In the project “MOSES” (Modular Observation Solutions for Earth Systems), this procedure became even more complicated because seven large Helmholtz centers from the research field Earth and Environment (E&E) within the framework of the German Ministery of Educatiopn and Research (BMBF) work together to design and construct a large scale monitoring network across earth compartments to study the effects of short-term events on long term environmental trends. This requires the development of robust and standardized automated data acquisition and processing routines, to ensure reliable, accure and precise data.
Here, the results of two intercomparison workshops on senor accuracy and precicion for selected environmental variables are presented. Environmental sensors which were to be used in MOSES campaigns on hydrological extremes (floods and draughts) in the Elbe catchment and the adjacent coastal areas in the North Sea in 2019 to 2020 were compared for selected parameters (temperature, salinity, chlorophyll-A, turbidity and methane) in the same experimentally controlled water body, assuming that all sensors provide comparable data. Results were analyzed with respect to individual sensor accuracy and precision related to an “assumed” real value as well as with respect to a cost versus accuracy/precision index for measuring specific environmental data. The results show, that accuracy and precision of sensors do not necessarily correlate with the price of the sensors and that low cost sensors may provide the same or even higher accuracy and precision values as even the highest price sensor types.
How to cite: Fischer, P., Friedrich, M., Brand, M., Koedel, U., Dietrich, P., Brix, H., Kerschke, D., and Bussmann, I.: The challenge of sensor selection, long term-sensor operation and data evaluation in inter- -institutional long term monitoring projects (lessons learned in the MOSES project) , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21816, https://doi.org/10.5194/egusphere-egu2020-21816, 2020.
Measuring environmental variables over longer times in coastal marine environments is a challenge in regard to sensor maintenance and data processing of continuously produced comprehensive datasets. In the project “MOSES” (Modular Observation Solutions for Earth Systems), this procedure became even more complicated because seven large Helmholtz centers from the research field Earth and Environment (E&E) within the framework of the German Ministery of Educatiopn and Research (BMBF) work together to design and construct a large scale monitoring network across earth compartments to study the effects of short-term events on long term environmental trends. This requires the development of robust and standardized automated data acquisition and processing routines, to ensure reliable, accure and precise data.
Here, the results of two intercomparison workshops on senor accuracy and precicion for selected environmental variables are presented. Environmental sensors which were to be used in MOSES campaigns on hydrological extremes (floods and draughts) in the Elbe catchment and the adjacent coastal areas in the North Sea in 2019 to 2020 were compared for selected parameters (temperature, salinity, chlorophyll-A, turbidity and methane) in the same experimentally controlled water body, assuming that all sensors provide comparable data. Results were analyzed with respect to individual sensor accuracy and precision related to an “assumed” real value as well as with respect to a cost versus accuracy/precision index for measuring specific environmental data. The results show, that accuracy and precision of sensors do not necessarily correlate with the price of the sensors and that low cost sensors may provide the same or even higher accuracy and precision values as even the highest price sensor types.
How to cite: Fischer, P., Friedrich, M., Brand, M., Koedel, U., Dietrich, P., Brix, H., Kerschke, D., and Bussmann, I.: The challenge of sensor selection, long term-sensor operation and data evaluation in inter- -institutional long term monitoring projects (lessons learned in the MOSES project) , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21816, https://doi.org/10.5194/egusphere-egu2020-21816, 2020.
EGU2020-9338 | Displays | ESSI1.15
Land-Use/Land-Cover Changes and Their Influence on Urban Thermal Environment in Zhengzhou City During the Period of 1986 to 2026Pei Liu, Ruimei Han, and Leiku Yang
Rapid urbanization has become a major urban sustainability concern due to environmental impacts, such as development of urban heat island (UHI) and the reduction of urban security states. To date, most research on urban sustainability development has focus on dynamic change monitoring or UHI state characterization. While there is little literature on UHI change analysis. In addition, there has been little research on the impact of land use and land cover changes (LULCCs) on UHI, especially simulates future trend of LULCCs, UHI change, and dynamic relationship of LULCCs and UHI. The purpose of this research is to design a remote sensing based framework that investigates and analysis that how the LULCCs in the process of urbanization affected thermal environment. In order to assesses and predicts impact of LULCCs on urban heat environment, multi-temporal remotely sensed data from 1986 to 2016 were selected as source data, and Geographic Information System (GIS) methods such as CA-Markov model were employed to construct the proposed framework. The results shown that (1) there has been a substantial strength of urban expansion during the 40 years study period; (2) the most far distance urban center of gravity movement from north-northeast (NEE) to west-southwest (WSW) direction; (3) the dominate temperature were middle level, sub-high level and high level in the research area; (4) there was a higher changing frequency and range from east to west; (5) there was significant negative correlation between land surface temperature and vegetation, and significant positive correlation between temperature and human settlement.
How to cite: Liu, P., Han, R., and Yang, L.: Land-Use/Land-Cover Changes and Their Influence on Urban Thermal Environment in Zhengzhou City During the Period of 1986 to 2026, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9338, https://doi.org/10.5194/egusphere-egu2020-9338, 2020.
Rapid urbanization has become a major urban sustainability concern due to environmental impacts, such as development of urban heat island (UHI) and the reduction of urban security states. To date, most research on urban sustainability development has focus on dynamic change monitoring or UHI state characterization. While there is little literature on UHI change analysis. In addition, there has been little research on the impact of land use and land cover changes (LULCCs) on UHI, especially simulates future trend of LULCCs, UHI change, and dynamic relationship of LULCCs and UHI. The purpose of this research is to design a remote sensing based framework that investigates and analysis that how the LULCCs in the process of urbanization affected thermal environment. In order to assesses and predicts impact of LULCCs on urban heat environment, multi-temporal remotely sensed data from 1986 to 2016 were selected as source data, and Geographic Information System (GIS) methods such as CA-Markov model were employed to construct the proposed framework. The results shown that (1) there has been a substantial strength of urban expansion during the 40 years study period; (2) the most far distance urban center of gravity movement from north-northeast (NEE) to west-southwest (WSW) direction; (3) the dominate temperature were middle level, sub-high level and high level in the research area; (4) there was a higher changing frequency and range from east to west; (5) there was significant negative correlation between land surface temperature and vegetation, and significant positive correlation between temperature and human settlement.
How to cite: Liu, P., Han, R., and Yang, L.: Land-Use/Land-Cover Changes and Their Influence on Urban Thermal Environment in Zhengzhou City During the Period of 1986 to 2026, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9338, https://doi.org/10.5194/egusphere-egu2020-9338, 2020.
EGU2020-22587 | Displays | ESSI1.15
Deep neural networks for total organic carbon prediction and data-driven samplingEverardo González Ávalos and Ewa Burwicz
Over the past decade deep learning has been used to solve a wide array of regression and classification tasks. Compared to classical machine learning approaches (k-Nearest Neighbours, Random Forests,… ) deep learning algorithms excel at learning complex, non-linear internal representations in part due to the highly over-parametrised nature of their underling models; thus, this advantage often comes at the cost of interpretability. In this work we used deep neural network to construct global total organic carbon (TOC) seafloor concentration map. Implementing Softmax distributions on implicitly continuous data (regression tasks) we were able to obtain probability distributions to asses prediction reliability. A variation of Dropout called Monte Carlo Dropout is also used during the inference step providing a tool to model prediction uncertainties. We used these techniques to create a model information map which is a key element to develop new data-driven sampling strategies for data acquisition.
How to cite: González Ávalos, E. and Burwicz, E.: Deep neural networks for total organic carbon prediction and data-driven sampling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22587, https://doi.org/10.5194/egusphere-egu2020-22587, 2020.
Over the past decade deep learning has been used to solve a wide array of regression and classification tasks. Compared to classical machine learning approaches (k-Nearest Neighbours, Random Forests,… ) deep learning algorithms excel at learning complex, non-linear internal representations in part due to the highly over-parametrised nature of their underling models; thus, this advantage often comes at the cost of interpretability. In this work we used deep neural network to construct global total organic carbon (TOC) seafloor concentration map. Implementing Softmax distributions on implicitly continuous data (regression tasks) we were able to obtain probability distributions to asses prediction reliability. A variation of Dropout called Monte Carlo Dropout is also used during the inference step providing a tool to model prediction uncertainties. We used these techniques to create a model information map which is a key element to develop new data-driven sampling strategies for data acquisition.
How to cite: González Ávalos, E. and Burwicz, E.: Deep neural networks for total organic carbon prediction and data-driven sampling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22587, https://doi.org/10.5194/egusphere-egu2020-22587, 2020.
EGU2020-19631 | Displays | ESSI1.15
Implementing FAIR in a Collaborative Data Management FrameworkAngela Schäfer, Norbert Anselm, Janik Eilers, Stephan Frickenhaus, Peter Gerchow, Frank Oliver Glöckner, Antonie Haas, Isabel Herrarte, Roland Koppe, Ana Macario, Christian Schäfer-Neth, Brenner Silva, and Philipp Fischer
Today's fast digital growth made data the most essential tool for scientific progress in Earth Systems Science. Hence, we strive to assemble a modular research infrastructure comprising a collection of tools and services that allow researchers to turn big data into scientific outcomes.
Major roadblocks are (i) the increasing number and complexity of research platforms, devices, and sensors, (ii) the heterogeneous project-driven requirements towards, e. g., satellite data, sensor monitoring, quality assessment and control, processing, analysis and visualization, and (iii) the demand for near real time analyses.
These requirements have led us to build a generic and cost-effective framework O2A (Observation to Archive) to enable, control, and access the flow of sensor observations to archives and repositories.
By establishing O2A within major cooperative projects like MOSES and Digital Earth in the research field Earth and Environment of the German Helmholtz Association, we extend research data management services, computing powers, and skills to connect with the evolving software and storage services for data science. This fully supports the typical scientific workflow from its very beginning to its very end, that is, from data acquisition to final data publication.
The key modules of O2A's digital research infrastructure established by AWI to enable Digital Earth Science are implementing the FAIR principles:
- Sensor Web, to register sensor applications and capture controlled meta data before and alongside any measurement in the field
- Data ingest, allowing researchers to feed data into storage systems and processing pipelines in a prepared and documented way, at best in controlled NRT data streams
- Dashboards, allowing researchers to find and access data and share and collaborate among partners
- Workspace, enabling researchers to access and use data with research software in a cloud-based virtualized infrastructure that allows researchers to analyse massive amounts of data on the spot
- Archiving and publishing data via repositories and Digital Object Identifiers (DOI).
How to cite: Schäfer, A., Anselm, N., Eilers, J., Frickenhaus, S., Gerchow, P., Glöckner, F. O., Haas, A., Herrarte, I., Koppe, R., Macario, A., Schäfer-Neth, C., Silva, B., and Fischer, P.: Implementing FAIR in a Collaborative Data Management Framework, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19631, https://doi.org/10.5194/egusphere-egu2020-19631, 2020.
Today's fast digital growth made data the most essential tool for scientific progress in Earth Systems Science. Hence, we strive to assemble a modular research infrastructure comprising a collection of tools and services that allow researchers to turn big data into scientific outcomes.
Major roadblocks are (i) the increasing number and complexity of research platforms, devices, and sensors, (ii) the heterogeneous project-driven requirements towards, e. g., satellite data, sensor monitoring, quality assessment and control, processing, analysis and visualization, and (iii) the demand for near real time analyses.
These requirements have led us to build a generic and cost-effective framework O2A (Observation to Archive) to enable, control, and access the flow of sensor observations to archives and repositories.
By establishing O2A within major cooperative projects like MOSES and Digital Earth in the research field Earth and Environment of the German Helmholtz Association, we extend research data management services, computing powers, and skills to connect with the evolving software and storage services for data science. This fully supports the typical scientific workflow from its very beginning to its very end, that is, from data acquisition to final data publication.
The key modules of O2A's digital research infrastructure established by AWI to enable Digital Earth Science are implementing the FAIR principles:
- Sensor Web, to register sensor applications and capture controlled meta data before and alongside any measurement in the field
- Data ingest, allowing researchers to feed data into storage systems and processing pipelines in a prepared and documented way, at best in controlled NRT data streams
- Dashboards, allowing researchers to find and access data and share and collaborate among partners
- Workspace, enabling researchers to access and use data with research software in a cloud-based virtualized infrastructure that allows researchers to analyse massive amounts of data on the spot
- Archiving and publishing data via repositories and Digital Object Identifiers (DOI).
How to cite: Schäfer, A., Anselm, N., Eilers, J., Frickenhaus, S., Gerchow, P., Glöckner, F. O., Haas, A., Herrarte, I., Koppe, R., Macario, A., Schäfer-Neth, C., Silva, B., and Fischer, P.: Implementing FAIR in a Collaborative Data Management Framework, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19631, https://doi.org/10.5194/egusphere-egu2020-19631, 2020.
EGU2020-19648 | Displays | ESSI1.15
From source to sink - Sustainable and reproducible data pipelines with SaQCDavid Schäfer, Bert Palm, Lennart Schmidt, Peter Lünenschloß, and Jan Bumberger
The number of sensors used in the environmental system sciences is increasing rapidly, and while this trend undoubtedly provides a great potential to broaden the understanding of complex spatio-temporal processes, it comes with its own set of new challenges. The flow of data from a source to its sink, from sensors to databases, involves many, usually error prone intermediate steps. From the data acquisition with its specific scientific and technical challenges, over the data transfer from often remote locations to the final data processing, all carry great potential to introduce errors and disturbances into the actual environmental signal.
Quantifying these errors becomes a crucial part of the later evaluation of all measured data. While many large environmental observatories are moving from manual to more automated ways of data processing and quality assurance, these systems are usually highly customized and hand written. This approach is non-ideal in several ways: First, it is a waste of resources as the same algorithms are implemented over and over again and second, it imposes great challenges to reproducibility. If the relevant programs are made available at all, they expose all problems of software reuse: correctness of the implementation, readability and comprehensibility for future users, as well as transferability between different computing environments. Beside these problems, related to software development in general, another crucial factor comes into play: the end product, a processed and quality controlled data set, is closely tied to the current version of the programs in use. Even small changes to the source code can lead to vastly differing results. If this is not approached responsibly, data and programs will inevitably fall out of sync.
The presented software, the 'System for automated Quality Control (SaQC)' (www.ufz.git.de/rdm-software/saqc), helps to either solve, or massively simplify the solution to the presented challenges. As a mainly no-code platform with a large set of implemented functionality, SaQC lowers the entry barrier for the non-programming scientific practitioner, without sacrificing the possibilities to fine-grained adaptation to project specific needs. The text based configuration allows the easy integration into version control systems and thus opens the opportunity to use well established software for data lineage. We will give a short overview of the program's unique features and showcase possibilities to build reliable and reproducible processing and quality assurance pipelines for real-world data from a spatially distributed, heterogeneous sensor network.
How to cite: Schäfer, D., Palm, B., Schmidt, L., Lünenschloß, P., and Bumberger, J.: From source to sink - Sustainable and reproducible data pipelines with SaQC, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19648, https://doi.org/10.5194/egusphere-egu2020-19648, 2020.
The number of sensors used in the environmental system sciences is increasing rapidly, and while this trend undoubtedly provides a great potential to broaden the understanding of complex spatio-temporal processes, it comes with its own set of new challenges. The flow of data from a source to its sink, from sensors to databases, involves many, usually error prone intermediate steps. From the data acquisition with its specific scientific and technical challenges, over the data transfer from often remote locations to the final data processing, all carry great potential to introduce errors and disturbances into the actual environmental signal.
Quantifying these errors becomes a crucial part of the later evaluation of all measured data. While many large environmental observatories are moving from manual to more automated ways of data processing and quality assurance, these systems are usually highly customized and hand written. This approach is non-ideal in several ways: First, it is a waste of resources as the same algorithms are implemented over and over again and second, it imposes great challenges to reproducibility. If the relevant programs are made available at all, they expose all problems of software reuse: correctness of the implementation, readability and comprehensibility for future users, as well as transferability between different computing environments. Beside these problems, related to software development in general, another crucial factor comes into play: the end product, a processed and quality controlled data set, is closely tied to the current version of the programs in use. Even small changes to the source code can lead to vastly differing results. If this is not approached responsibly, data and programs will inevitably fall out of sync.
The presented software, the 'System for automated Quality Control (SaQC)' (www.ufz.git.de/rdm-software/saqc), helps to either solve, or massively simplify the solution to the presented challenges. As a mainly no-code platform with a large set of implemented functionality, SaQC lowers the entry barrier for the non-programming scientific practitioner, without sacrificing the possibilities to fine-grained adaptation to project specific needs. The text based configuration allows the easy integration into version control systems and thus opens the opportunity to use well established software for data lineage. We will give a short overview of the program's unique features and showcase possibilities to build reliable and reproducible processing and quality assurance pipelines for real-world data from a spatially distributed, heterogeneous sensor network.
How to cite: Schäfer, D., Palm, B., Schmidt, L., Lünenschloß, P., and Bumberger, J.: From source to sink - Sustainable and reproducible data pipelines with SaQC, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19648, https://doi.org/10.5194/egusphere-egu2020-19648, 2020.
EGU2020-9251 | Displays | ESSI1.15
An integrative framework for data-driven investigation of environmental systemsDaniel Eggert and Doris Dransch
Environmental scientists aim at understanding not only single components but systems, one example is the flood system; scientists investigate the conditions, drivers and effects of flood events and the relations between them. Investigating environmental systems with a data-driven research approach requires linking a variety of data, analytical methods, and derived results.
Several obstacles exist in the recent scientific work environment that hinder scientists to easily create these links. They are distributed and heterogeneous data sets, separated analytical tools, discontinuous analytical workflows, as well as isolated views to data and data products. We address these obstacles with the exception of distributed and heterogeneous data since this is part of other ongoing initiatives.
Our goal is to develop a framework supporting the data-driven investigation of environmental systems. First we integrate separated analytical tools and methods by the means of a component-based software framework. Furthermore we allow for seamless and continuous analytical workflows by applying the concept of digital workflows, which also demands the aforementioned integration of separated tools and methods. Finally we provide integrated views of data and data products by interactive visual interfaces with multiple linked views. The combination of these three concepts from computer science allows us to create a digital research environment that enable scientists to create the initially mentioned links in a flexible way. We developed a generic concept for our approach, implemented a corresponding framework and finally applied both to realize a “Flood Event Explorer” prototype supporting the comprehensive investigation of a flood system.
In order to implement a digital workflow our approach intends to precisely define the workflow’s requirements. We mostly do this by conducting informal interviews with the domain scientists. The defined requirements also include the needed analytical tools and methods, as well as the utilized data and data products. For technically integrating the needed tools and methods our created software framework provides a modularization approach based on a messaging system. This allows us to create custom modules or wrap existing implementations and tools. The messaging system (e.g. pulsar) then connects these individual modules. This enables us to combine multiple methods and tools into a seamless digital workflow. The described approach of course demands the proper definition of interfaces to modules and data sources. Finally our software framework provides multiple generic visual front-end components (e.g. tables, maps and charts) to create interactive linked views supporting the visual analysis of the workflow’s data.
How to cite: Eggert, D. and Dransch, D.: An integrative framework for data-driven investigation of environmental systems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9251, https://doi.org/10.5194/egusphere-egu2020-9251, 2020.
Environmental scientists aim at understanding not only single components but systems, one example is the flood system; scientists investigate the conditions, drivers and effects of flood events and the relations between them. Investigating environmental systems with a data-driven research approach requires linking a variety of data, analytical methods, and derived results.
Several obstacles exist in the recent scientific work environment that hinder scientists to easily create these links. They are distributed and heterogeneous data sets, separated analytical tools, discontinuous analytical workflows, as well as isolated views to data and data products. We address these obstacles with the exception of distributed and heterogeneous data since this is part of other ongoing initiatives.
Our goal is to develop a framework supporting the data-driven investigation of environmental systems. First we integrate separated analytical tools and methods by the means of a component-based software framework. Furthermore we allow for seamless and continuous analytical workflows by applying the concept of digital workflows, which also demands the aforementioned integration of separated tools and methods. Finally we provide integrated views of data and data products by interactive visual interfaces with multiple linked views. The combination of these three concepts from computer science allows us to create a digital research environment that enable scientists to create the initially mentioned links in a flexible way. We developed a generic concept for our approach, implemented a corresponding framework and finally applied both to realize a “Flood Event Explorer” prototype supporting the comprehensive investigation of a flood system.
In order to implement a digital workflow our approach intends to precisely define the workflow’s requirements. We mostly do this by conducting informal interviews with the domain scientists. The defined requirements also include the needed analytical tools and methods, as well as the utilized data and data products. For technically integrating the needed tools and methods our created software framework provides a modularization approach based on a messaging system. This allows us to create custom modules or wrap existing implementations and tools. The messaging system (e.g. pulsar) then connects these individual modules. This enables us to combine multiple methods and tools into a seamless digital workflow. The described approach of course demands the proper definition of interfaces to modules and data sources. Finally our software framework provides multiple generic visual front-end components (e.g. tables, maps and charts) to create interactive linked views supporting the visual analysis of the workflow’s data.
How to cite: Eggert, D. and Dransch, D.: An integrative framework for data-driven investigation of environmental systems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9251, https://doi.org/10.5194/egusphere-egu2020-9251, 2020.
EGU2020-1412 | Displays | ESSI1.15
Surface Temperature Monitoring by Satellite Thermal Infrared Imagery at Mayon Volcano of Philippines, 1988-2019Hai-Po Chan and Kostas Konstantinou
Mayon Volcano on eastern Luzon Island is the most active volcano in the Philippines. It is named and renowned as the "perfect cone" for the symmetric conical shape and has recorded eruptions over 50 times in the past 500 years. Geographically the volcano is surrounded by the eight cities and municipalities with 1 million inhabitants. Currently, its activity is daily monitored by on-site observations such as seismometers installed on Mayon's slopes, plus, electronic distance meters (EDMs), precise leveling benchmarks, and portable fly spectrometers. Compared to existing direct on-site measurements, satellite remote sensing is currently assuming an essential role in understanding the whole picture of volcanic processes. The vulnerability to volcanic hazards is high for Mayon given that it is located in an area of high population density on Luzon Island. However, the satellite remote sensing method and dataset have not been integrated into Mayon’s hazard mapping and monitoring system, despite abundant open-access satellite dataset archives. Here, we perform multiscale and multitemporal monitoring based on the analysis of a nineteen-year Land Surface Temperature (LST) time series derived from satellite-retrieved thermal infrared imagery. Both Landsat thermal imagery (with 30-meter spatial resolution) and MODIS (Moderate Resolution Imaging Spectroradiometer) LST products (with 1-kilometer spatial resolution) are used for the analysis. The Ensemble Empirical Mode Decomposition (EEMD) is applied as the decomposition tool to decompose oscillatory components of various timescales within the LST time series. The physical interpretation of decomposed LST components at various periods are explored and compared with Mayon’s eruption records. Results show that annual-period components of LST tend to lose their regularity following an eruption, and amplitudes of short-period LST components are very responsive to the eruption events. The satellite remote sensing approach provides more insights at larger spatial and temporal scales on this renowned active volcano. This study not only presents the advantages and effectiveness of satellite remote sensing on volcanic monitoring but also provides valuable surface information for exploring the subsurface volcanic structures in Mayon.
How to cite: Chan, H.-P. and Konstantinou, K.: Surface Temperature Monitoring by Satellite Thermal Infrared Imagery at Mayon Volcano of Philippines, 1988-2019, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1412, https://doi.org/10.5194/egusphere-egu2020-1412, 2020.
Mayon Volcano on eastern Luzon Island is the most active volcano in the Philippines. It is named and renowned as the "perfect cone" for the symmetric conical shape and has recorded eruptions over 50 times in the past 500 years. Geographically the volcano is surrounded by the eight cities and municipalities with 1 million inhabitants. Currently, its activity is daily monitored by on-site observations such as seismometers installed on Mayon's slopes, plus, electronic distance meters (EDMs), precise leveling benchmarks, and portable fly spectrometers. Compared to existing direct on-site measurements, satellite remote sensing is currently assuming an essential role in understanding the whole picture of volcanic processes. The vulnerability to volcanic hazards is high for Mayon given that it is located in an area of high population density on Luzon Island. However, the satellite remote sensing method and dataset have not been integrated into Mayon’s hazard mapping and monitoring system, despite abundant open-access satellite dataset archives. Here, we perform multiscale and multitemporal monitoring based on the analysis of a nineteen-year Land Surface Temperature (LST) time series derived from satellite-retrieved thermal infrared imagery. Both Landsat thermal imagery (with 30-meter spatial resolution) and MODIS (Moderate Resolution Imaging Spectroradiometer) LST products (with 1-kilometer spatial resolution) are used for the analysis. The Ensemble Empirical Mode Decomposition (EEMD) is applied as the decomposition tool to decompose oscillatory components of various timescales within the LST time series. The physical interpretation of decomposed LST components at various periods are explored and compared with Mayon’s eruption records. Results show that annual-period components of LST tend to lose their regularity following an eruption, and amplitudes of short-period LST components are very responsive to the eruption events. The satellite remote sensing approach provides more insights at larger spatial and temporal scales on this renowned active volcano. This study not only presents the advantages and effectiveness of satellite remote sensing on volcanic monitoring but also provides valuable surface information for exploring the subsurface volcanic structures in Mayon.
How to cite: Chan, H.-P. and Konstantinou, K.: Surface Temperature Monitoring by Satellite Thermal Infrared Imagery at Mayon Volcano of Philippines, 1988-2019, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1412, https://doi.org/10.5194/egusphere-egu2020-1412, 2020.
EGU2020-3049 | Displays | ESSI1.15
Catchment scale prediction of soil moisture trends from Cosmic Ray Neutron Rover Surveys using machine learningErik Nixdorf, Marco Hannemann, Uta Ködel, Martin Schrön, and Thomas Kalbacher
Soil moisture is a critical hydrological component for determining hydrological state conditions and a crucial variable in controlling land-atmosphere interaction including evapotranspiration, infiltration and groundwater recharge.
At the catchment scale, spatial- temporal variations of soil moisture distribution are highly variable due to the influence of various factors such as soil heterogeneity, climate conditions, vegetation and geomorphology. Among the various existing soil moisture monitoring techniques, the application of vehicle-mounted Cosmic Ray Sensors (CRNS) allows monitoring soil moisture noninvasively by surveying larger regions within a reasonable time. However, measured data and their corresponding footprints are often allocated along the existing road network leaving inaccessible parts of a catchment unobserved and surveying larger areas in short intervals is often hindered by limited manpower.
In this study, data from more than 200 000 CRNS rover readings measured over different regions of Germany within the last 4 years have been employed to characterize the trends of soil moisture distribution in the 209 km2 large Mueglitz River Basin in Eastern Germany. Subsets of the data have been used to train three different supervised machine learning algorithms (multiple linear regression, random forest and artificial neural network) based on 85 independent relevant dynamic and stationary features derived from public databases. The Random Forest model outperforms the other models (R2= ~0.8), relying on day-of-year, altitude, air temperature, humidity, soil organic carbon content and soil temperature as the five most influencing predictors.
After test and training the models, CRNS records for each day of the last decade are predicted on a 250 × 250 m grid of Mueglitz River Basin using the same type of features. Derived CRNS record distributions are compared with both, spatial soil moisture estimates from a hydrological model and point estimates from a sensor network operated during spring 2019. After variable standardization, preliminary results show that the applied Random Forest model is able to resemble the spatio-temporal trends estimated by the hydrological model and the point measurements. These findings demonstrate that training machine learning models on domain-unspecific large datasets of CRNS records using spatial-temporally available predictors has the potential to fill measurement gaps and to improve soil moisture dynamics predictions on a catchment scale.
How to cite: Nixdorf, E., Hannemann, M., Ködel, U., Schrön, M., and Kalbacher, T.: Catchment scale prediction of soil moisture trends from Cosmic Ray Neutron Rover Surveys using machine learning, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3049, https://doi.org/10.5194/egusphere-egu2020-3049, 2020.
Soil moisture is a critical hydrological component for determining hydrological state conditions and a crucial variable in controlling land-atmosphere interaction including evapotranspiration, infiltration and groundwater recharge.
At the catchment scale, spatial- temporal variations of soil moisture distribution are highly variable due to the influence of various factors such as soil heterogeneity, climate conditions, vegetation and geomorphology. Among the various existing soil moisture monitoring techniques, the application of vehicle-mounted Cosmic Ray Sensors (CRNS) allows monitoring soil moisture noninvasively by surveying larger regions within a reasonable time. However, measured data and their corresponding footprints are often allocated along the existing road network leaving inaccessible parts of a catchment unobserved and surveying larger areas in short intervals is often hindered by limited manpower.
In this study, data from more than 200 000 CRNS rover readings measured over different regions of Germany within the last 4 years have been employed to characterize the trends of soil moisture distribution in the 209 km2 large Mueglitz River Basin in Eastern Germany. Subsets of the data have been used to train three different supervised machine learning algorithms (multiple linear regression, random forest and artificial neural network) based on 85 independent relevant dynamic and stationary features derived from public databases. The Random Forest model outperforms the other models (R2= ~0.8), relying on day-of-year, altitude, air temperature, humidity, soil organic carbon content and soil temperature as the five most influencing predictors.
After test and training the models, CRNS records for each day of the last decade are predicted on a 250 × 250 m grid of Mueglitz River Basin using the same type of features. Derived CRNS record distributions are compared with both, spatial soil moisture estimates from a hydrological model and point estimates from a sensor network operated during spring 2019. After variable standardization, preliminary results show that the applied Random Forest model is able to resemble the spatio-temporal trends estimated by the hydrological model and the point measurements. These findings demonstrate that training machine learning models on domain-unspecific large datasets of CRNS records using spatial-temporally available predictors has the potential to fill measurement gaps and to improve soil moisture dynamics predictions on a catchment scale.
How to cite: Nixdorf, E., Hannemann, M., Ködel, U., Schrön, M., and Kalbacher, T.: Catchment scale prediction of soil moisture trends from Cosmic Ray Neutron Rover Surveys using machine learning, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3049, https://doi.org/10.5194/egusphere-egu2020-3049, 2020.
EGU2020-5028 | Displays | ESSI1.15
Modeling methane from the North Sea region with ICON-ARTChristian Scharun, Roland Ruhnke, Jennifer Schröter, Michael Weimer, and Peter Braesicke
Methane (CH4) is the second most important greenhouse gas after CO2 affecting global warming. Various sources (e.g. fossil fuel production, agriculture and waste, biomass burning and natural wetlands) and sinks (the reaction with the OH-radical as the main sink contributes to tropospheric ozone production) determine the methane budget. Due to its long lifetime in the atmosphere methane can be transported over long distances.
Disused and active offshore platforms can emit methane, the amount being difficult to quantify. In addition, explorations of the sea floor in the North Sea showed a release of methane near the boreholes of both, oil and gas producing platforms. The basis of this study is the established emission data base EDGAR (Emission Database for Global Atmospheric Research), an inventory that includes methane emission fluxes in the North Sea region. While methane emission fluxes in the EDGAR inventory and platform locations are matching for most of the oil platforms almost all of the gas platform sources are missing in the database. We develop a method for estimating the missing sources based on the EDGAR emission inventory.
In this study the global model ICON-ART (ICOsahedral Nonhydrostatic model - Aerosols and Reactive Trace gases) will be used. ART is an online-coupled model extension for ICON that includes chemical gases and aerosols. One aim of the model is the simulation of interactions between the trace substances and the state of the atmosphere by coupling the spatiotemporal evolution of tracers with atmospheric processes. ICON-ART sensitivity simulations are performed with inserted and adjusted sources to access their influence on the methane and OH-radical distribution on regional (North Sea) and global scales.
How to cite: Scharun, C., Ruhnke, R., Schröter, J., Weimer, M., and Braesicke, P.: Modeling methane from the North Sea region with ICON-ART, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5028, https://doi.org/10.5194/egusphere-egu2020-5028, 2020.
Methane (CH4) is the second most important greenhouse gas after CO2 affecting global warming. Various sources (e.g. fossil fuel production, agriculture and waste, biomass burning and natural wetlands) and sinks (the reaction with the OH-radical as the main sink contributes to tropospheric ozone production) determine the methane budget. Due to its long lifetime in the atmosphere methane can be transported over long distances.
Disused and active offshore platforms can emit methane, the amount being difficult to quantify. In addition, explorations of the sea floor in the North Sea showed a release of methane near the boreholes of both, oil and gas producing platforms. The basis of this study is the established emission data base EDGAR (Emission Database for Global Atmospheric Research), an inventory that includes methane emission fluxes in the North Sea region. While methane emission fluxes in the EDGAR inventory and platform locations are matching for most of the oil platforms almost all of the gas platform sources are missing in the database. We develop a method for estimating the missing sources based on the EDGAR emission inventory.
In this study the global model ICON-ART (ICOsahedral Nonhydrostatic model - Aerosols and Reactive Trace gases) will be used. ART is an online-coupled model extension for ICON that includes chemical gases and aerosols. One aim of the model is the simulation of interactions between the trace substances and the state of the atmosphere by coupling the spatiotemporal evolution of tracers with atmospheric processes. ICON-ART sensitivity simulations are performed with inserted and adjusted sources to access their influence on the methane and OH-radical distribution on regional (North Sea) and global scales.
How to cite: Scharun, C., Ruhnke, R., Schröter, J., Weimer, M., and Braesicke, P.: Modeling methane from the North Sea region with ICON-ART, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5028, https://doi.org/10.5194/egusphere-egu2020-5028, 2020.
EGU2020-10239 | Displays | ESSI1.15 | Highlight
Management and dissemination of global high-resolution agro-hydrological model simulation data from the Virtual Water Values projectWolfgang Kurtz, Stephan Hachinger, Anton Frank, Wolfram Mauser, Jens Weismüller, and Christine Werner
The ViWA (Virtual Water Values) project aims to provide a global-scale assessment of the current usage of water resources, of the efficiency of water use and of agricultural yields as well as the flow and trade of ‘virtual’ water across country boundaries. This is achieved by establishing a global management and monitoring system which combines high-resolution (1 km2) agro-hydrological model simulations with information from high-resolution remote-sensing data from Copernicus satellites. The monitoring system is used to judge the progress in achieving water-related UN sustainable development goals on the local and global scale. Specific goals of the project are, for example, to:
- evaluate possible inefficiencies of the current water use in agriculture, industry and water management and its economic consequences.
- assess the vulnerability of agriculture and ecosystems to climate variability with a special emphasis on water availability.
- identify regional hot-spots of unsustainable water use and to analyze possible institutional obstacles for a sustainable and efficient water use.
- identify trade-offs between the commercial water use and protection of ecosystem services.
A cornerstone for reaching these project goals are high-resolution global ensemble simulations with an agro-hydrological model for a variety of crop types and management practices. These simulations provide the relevant information on agricultural yields and water demands at different scales. In this context, a considerable amount of data is generated and subsets of these data might also be of direct relevance for different external interest groups.
In this presentation, we describe our approach for managing the simulation data, with a special focus on possible strategies for data provisioning to interested stakeholders, scientists, practitioners and the general public. We will give an overview on the corresponding simulation and data storage workflows on the utilized HPC-systems and we will discuss methods for providing the data to the different interest groups. Among other aspects, we address findability (in the sense of the FAIR principles) of simulation results for the scienctific community in indexed search portals through a proper metadata annotation. We also discuss a prototypical interactive web portal for visualizing, subsetting and downloading of selected parts of the data set.
How to cite: Kurtz, W., Hachinger, S., Frank, A., Mauser, W., Weismüller, J., and Werner, C.: Management and dissemination of global high-resolution agro-hydrological model simulation data from the Virtual Water Values project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10239, https://doi.org/10.5194/egusphere-egu2020-10239, 2020.
The ViWA (Virtual Water Values) project aims to provide a global-scale assessment of the current usage of water resources, of the efficiency of water use and of agricultural yields as well as the flow and trade of ‘virtual’ water across country boundaries. This is achieved by establishing a global management and monitoring system which combines high-resolution (1 km2) agro-hydrological model simulations with information from high-resolution remote-sensing data from Copernicus satellites. The monitoring system is used to judge the progress in achieving water-related UN sustainable development goals on the local and global scale. Specific goals of the project are, for example, to:
- evaluate possible inefficiencies of the current water use in agriculture, industry and water management and its economic consequences.
- assess the vulnerability of agriculture and ecosystems to climate variability with a special emphasis on water availability.
- identify regional hot-spots of unsustainable water use and to analyze possible institutional obstacles for a sustainable and efficient water use.
- identify trade-offs between the commercial water use and protection of ecosystem services.
A cornerstone for reaching these project goals are high-resolution global ensemble simulations with an agro-hydrological model for a variety of crop types and management practices. These simulations provide the relevant information on agricultural yields and water demands at different scales. In this context, a considerable amount of data is generated and subsets of these data might also be of direct relevance for different external interest groups.
In this presentation, we describe our approach for managing the simulation data, with a special focus on possible strategies for data provisioning to interested stakeholders, scientists, practitioners and the general public. We will give an overview on the corresponding simulation and data storage workflows on the utilized HPC-systems and we will discuss methods for providing the data to the different interest groups. Among other aspects, we address findability (in the sense of the FAIR principles) of simulation results for the scienctific community in indexed search portals through a proper metadata annotation. We also discuss a prototypical interactive web portal for visualizing, subsetting and downloading of selected parts of the data set.
How to cite: Kurtz, W., Hachinger, S., Frank, A., Mauser, W., Weismüller, J., and Werner, C.: Management and dissemination of global high-resolution agro-hydrological model simulation data from the Virtual Water Values project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10239, https://doi.org/10.5194/egusphere-egu2020-10239, 2020.
EGU2020-12328 | Displays | ESSI1.15
Assessment of Multiplatform Satellite Image Frequency for Crop Health MonitoringValeriy Kovalskyy and Xiaoyuan Yang
Imagery products are critical for digital agriculture as they help delivering value and insights to growers. Use of publicly available satellite data feeds by digital agriculture companies helps keeping imagery services affordable for broader base of farmers. Optimal use of public and private imagery data sources plays a critical role in the success of image based services for agriculture.
At the Climate Corporation we have established a program focused on intelligence about satellite image coverage and frequency expected in different geographies and times of the year which is becoming critical for global expansion of the company. In this talk we report the results of our analysis on publicly available imagery data sources for key agricultural regions of the globe. Also, we demonstrate how these results can guide commercial imagery acquisition decisions on the case study in Brazil, where some growers run the risk of going through the growing season without receiving imagery from one satellite if relying on a single source of satellite imagery. The study clearly shows the validity of approaches taken as the results matched with factual image deliveries to single digits of percent cover on regional level. Also, our analysis clearly captured realistic temporal and spatial details of chances in image frequency from addition of alternative satellite imagery sources to the production stream. The optimization in imagery acquisitions enables filling data gaps for research and development. In the meantime, it contributes to delivering greater value for growers in Crop Health Monitoring and other image based service.
How to cite: Kovalskyy, V. and Yang, X.: Assessment of Multiplatform Satellite Image Frequency for Crop Health Monitoring, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12328, https://doi.org/10.5194/egusphere-egu2020-12328, 2020.
Imagery products are critical for digital agriculture as they help delivering value and insights to growers. Use of publicly available satellite data feeds by digital agriculture companies helps keeping imagery services affordable for broader base of farmers. Optimal use of public and private imagery data sources plays a critical role in the success of image based services for agriculture.
At the Climate Corporation we have established a program focused on intelligence about satellite image coverage and frequency expected in different geographies and times of the year which is becoming critical for global expansion of the company. In this talk we report the results of our analysis on publicly available imagery data sources for key agricultural regions of the globe. Also, we demonstrate how these results can guide commercial imagery acquisition decisions on the case study in Brazil, where some growers run the risk of going through the growing season without receiving imagery from one satellite if relying on a single source of satellite imagery. The study clearly shows the validity of approaches taken as the results matched with factual image deliveries to single digits of percent cover on regional level. Also, our analysis clearly captured realistic temporal and spatial details of chances in image frequency from addition of alternative satellite imagery sources to the production stream. The optimization in imagery acquisitions enables filling data gaps for research and development. In the meantime, it contributes to delivering greater value for growers in Crop Health Monitoring and other image based service.
How to cite: Kovalskyy, V. and Yang, X.: Assessment of Multiplatform Satellite Image Frequency for Crop Health Monitoring, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12328, https://doi.org/10.5194/egusphere-egu2020-12328, 2020.
EGU2020-22594 | Displays | ESSI1.15
Machine learning as supporting method for UXO mapping and detectionDaniela Henkel, Everardo González Ávalos, Mareike Kampmeier, Patrick Michaelis, and Jens Greinert
Marine munitions, or unexploded ordnances (UXO), were massively disposed of in coastal waters after World War II; they are still being introduced into the marine environment during war activities and military exercises. UXO detection and removal has gained great interest during the ongoing efforts to install offshore wind parks for energy generation as well as cable routing through coastal waters. Additionally, 70 years after World War II munition dumping events, more and more chemical and conventional munition is rusting away increasing the risk of toxic contamination.
The general detection methodology includes high resolution multibeam mapping, hydroacoustic sub-bottom mapping, electromagnetic surveys with gradiometers as well as visual inspections by divers or remotely operated vehicles (ROVs). Using autonomous unmanned vehicles (AUVs) for autonomous underwater inspections with multibeam, camera and EM systems is the next technological step in acquiring meaningful high resolution data independently of a mother ship. However, it would be beneficial for the use of such technology to be able to better predict potential hot spots of munition targets and distinguish them from other objects such as rocks, small artificial constructions or metallic waste (wires, barrels, etc.).
The above-mentioned predictor layers could be utilized for machine learning with different, already existing, and accessible algorithms. The structure of the data has a high similarity to image data, an area where neural networks are the benchmark. As a first approach we therefore trained convolutional neural networks in a supervised manner to detect seafloor areas contaminated with UXO. For this we manually annotated known UXO locations as well as known non-UXO locations to generate a training dataset which was later augmented by rotating and flipping each annotated tile. We achieved a high accuracy with this approach using only a subset of the data sources mentioned above as input layers. We also explored the use of further input layers and larger training datasets, and their impact in performance. This is a good example for machine learning enabling us to classify large areas in a short time and with minimal need for manual annotation.
How to cite: Henkel, D., González Ávalos, E., Kampmeier, M., Michaelis, P., and Greinert, J.: Machine learning as supporting method for UXO mapping and detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22594, https://doi.org/10.5194/egusphere-egu2020-22594, 2020.
Marine munitions, or unexploded ordnances (UXO), were massively disposed of in coastal waters after World War II; they are still being introduced into the marine environment during war activities and military exercises. UXO detection and removal has gained great interest during the ongoing efforts to install offshore wind parks for energy generation as well as cable routing through coastal waters. Additionally, 70 years after World War II munition dumping events, more and more chemical and conventional munition is rusting away increasing the risk of toxic contamination.
The general detection methodology includes high resolution multibeam mapping, hydroacoustic sub-bottom mapping, electromagnetic surveys with gradiometers as well as visual inspections by divers or remotely operated vehicles (ROVs). Using autonomous unmanned vehicles (AUVs) for autonomous underwater inspections with multibeam, camera and EM systems is the next technological step in acquiring meaningful high resolution data independently of a mother ship. However, it would be beneficial for the use of such technology to be able to better predict potential hot spots of munition targets and distinguish them from other objects such as rocks, small artificial constructions or metallic waste (wires, barrels, etc.).
The above-mentioned predictor layers could be utilized for machine learning with different, already existing, and accessible algorithms. The structure of the data has a high similarity to image data, an area where neural networks are the benchmark. As a first approach we therefore trained convolutional neural networks in a supervised manner to detect seafloor areas contaminated with UXO. For this we manually annotated known UXO locations as well as known non-UXO locations to generate a training dataset which was later augmented by rotating and flipping each annotated tile. We achieved a high accuracy with this approach using only a subset of the data sources mentioned above as input layers. We also explored the use of further input layers and larger training datasets, and their impact in performance. This is a good example for machine learning enabling us to classify large areas in a short time and with minimal need for manual annotation.
How to cite: Henkel, D., González Ávalos, E., Kampmeier, M., Michaelis, P., and Greinert, J.: Machine learning as supporting method for UXO mapping and detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22594, https://doi.org/10.5194/egusphere-egu2020-22594, 2020.
EGU2020-11084 | Displays | ESSI1.15
Significance and implementation of SMART Monitoring ToolsUta Koedel, Peter Dietrich, Erik Nixdorf, and Philipp Fischer
The term “SMART Monitoring” is often used in digital projects to survey and analyze data flows in near- or realtime. The term is also adopted in the project Digital Earth (DE) which was jointly launched in 2018 by the eight Helmholtz centers of the research field Earth and Environment (E&E) within the framework of the German Ministry of Education and Research (BMBF). Within DE, the “SMART monitoring” sub-project aims at developing workflows and processes to make scientific parameters and the related datasets SMART, which means specific, measurable, accepted, relevant, and trackable (SMART).
“SMART Monitoring” in DE comprises a combination of hard- and software tools to enhance the traditional sequential monitoring approach - where data are step-by-step analyzed and processed from the sensor towards a repository - into an integrated analysis approach where information on the measured value together with the status of each sensor and possible auxiliary relevant sensor data in a sensor network are available and used in real-time to enhance the sensor output concerning data accuracy, precision, and data availability. Thus, SMART Monitoring could be defined as a computer-enhanced monitoring network with automatic data flow control from individual sensors in a sensor network to databases enhanced by automated (machine learning) and near real-time interactive data analyses/exploration using the full potential of all available sensors within the network. Besides, “SMART monitoring” aims to help for a better adjustment of sensor settings and monitoring strategies in time and space in iterative feedback.
This poster presentation will show general concepts, workflows, and possible visualization tools based on examples that support the SMART Monitoring idea.
How to cite: Koedel, U., Dietrich, P., Nixdorf, E., and Fischer, P.: Significance and implementation of SMART Monitoring Tools, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11084, https://doi.org/10.5194/egusphere-egu2020-11084, 2020.
The term “SMART Monitoring” is often used in digital projects to survey and analyze data flows in near- or realtime. The term is also adopted in the project Digital Earth (DE) which was jointly launched in 2018 by the eight Helmholtz centers of the research field Earth and Environment (E&E) within the framework of the German Ministry of Education and Research (BMBF). Within DE, the “SMART monitoring” sub-project aims at developing workflows and processes to make scientific parameters and the related datasets SMART, which means specific, measurable, accepted, relevant, and trackable (SMART).
“SMART Monitoring” in DE comprises a combination of hard- and software tools to enhance the traditional sequential monitoring approach - where data are step-by-step analyzed and processed from the sensor towards a repository - into an integrated analysis approach where information on the measured value together with the status of each sensor and possible auxiliary relevant sensor data in a sensor network are available and used in real-time to enhance the sensor output concerning data accuracy, precision, and data availability. Thus, SMART Monitoring could be defined as a computer-enhanced monitoring network with automatic data flow control from individual sensors in a sensor network to databases enhanced by automated (machine learning) and near real-time interactive data analyses/exploration using the full potential of all available sensors within the network. Besides, “SMART monitoring” aims to help for a better adjustment of sensor settings and monitoring strategies in time and space in iterative feedback.
This poster presentation will show general concepts, workflows, and possible visualization tools based on examples that support the SMART Monitoring idea.
How to cite: Koedel, U., Dietrich, P., Nixdorf, E., and Fischer, P.: Significance and implementation of SMART Monitoring Tools, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11084, https://doi.org/10.5194/egusphere-egu2020-11084, 2020.
EGU2020-22618 | Displays | ESSI1.15
Towards easily accessible interactive big-data analysis on supercomputersKatharina Höflich, Martin Claus, Willi Rath, Dorian Krause, Benedikt von St. Vieth, and Kay Thust
Demand on high-end high performance computer (HPC) systems by the Earth system science community today encompasses not only the handling of complex simulations but also machine and deep learning as well as interactive data analysis workloads on large volumes of data. This poster addresses the infrastructure needs of large-scale interactive data analysis workloads on supercomputers. It lays out how to enable optimizations of existing infrastructure with respect to accessibility, usability and interactivity and aims at informing decision making about future systems. To enhance accessibility, options for distributed access, e.g. through JupyterHub, will be evaluated. To increase usability, the unification of working environments via the operation and the joint maintenance of containers will be explored. Containers serve as a portable base software setting for data analysis application stacks and allow for long-term usability of individual working environments and repeatability of scientific analysis. Aiming for interactive big-data analysis on HPC will also help the scientific community in utilizing increasingly heterogeneous supercomputers, since the modular data-analysis stack already contains solutions for seamless use of various architectures such as accelerators. However, to enable day-to-day interactive work on supercomputers, the inter-operation of workloads with quick turn-around times and highly variable resource demands needs to be understood and evaluated. To this end, scheduling policies on selected HPC systems are reviewed with respect to existing technical solutions such as job preemption, utilizing the resiliency features of parallel computing toolkits like Dask. Presented are preliminary results focussing on the aspects of usability and interactive use of HPC systems on the basis of typical use cases from the ocean science community.
How to cite: Höflich, K., Claus, M., Rath, W., Krause, D., von St. Vieth, B., and Thust, K.: Towards easily accessible interactive big-data analysis on supercomputers, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22618, https://doi.org/10.5194/egusphere-egu2020-22618, 2020.
Demand on high-end high performance computer (HPC) systems by the Earth system science community today encompasses not only the handling of complex simulations but also machine and deep learning as well as interactive data analysis workloads on large volumes of data. This poster addresses the infrastructure needs of large-scale interactive data analysis workloads on supercomputers. It lays out how to enable optimizations of existing infrastructure with respect to accessibility, usability and interactivity and aims at informing decision making about future systems. To enhance accessibility, options for distributed access, e.g. through JupyterHub, will be evaluated. To increase usability, the unification of working environments via the operation and the joint maintenance of containers will be explored. Containers serve as a portable base software setting for data analysis application stacks and allow for long-term usability of individual working environments and repeatability of scientific analysis. Aiming for interactive big-data analysis on HPC will also help the scientific community in utilizing increasingly heterogeneous supercomputers, since the modular data-analysis stack already contains solutions for seamless use of various architectures such as accelerators. However, to enable day-to-day interactive work on supercomputers, the inter-operation of workloads with quick turn-around times and highly variable resource demands needs to be understood and evaluated. To this end, scheduling policies on selected HPC systems are reviewed with respect to existing technical solutions such as job preemption, utilizing the resiliency features of parallel computing toolkits like Dask. Presented are preliminary results focussing on the aspects of usability and interactive use of HPC systems on the basis of typical use cases from the ocean science community.
How to cite: Höflich, K., Claus, M., Rath, W., Krause, D., von St. Vieth, B., and Thust, K.: Towards easily accessible interactive big-data analysis on supercomputers, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22618, https://doi.org/10.5194/egusphere-egu2020-22618, 2020.
EGU2020-11117 | Displays | ESSI1.15 | Highlight
Going beyond FAIR to increase data reliabilityUta Koedel and Peter Dietrich
The FAIR principle is on its way to becoming a conventional standard for all kinds of data. However, it is often forgotten that this principle does not consider data quality or data reliability issues. If the data quality isis not sufficiently described, a wrong interpretation and use of these data in a common interpretation can lead to false scientific conclusions. Hence, the statement about data reliability is an essential component for secondary data processing and joint interpretation efforts. Information on data reliability, uncertainty, quality as well as information on the used devices are essential and needs to be introduced or even implemented in the workflow from the sensor to a database if data is to be considered in a broader context.
In the past, many publications have shown that the same devices at the same location do not necessarily provide the same measurement data. Likewise, statistical quantities and confidence intervals are rarely given in publications in order to assess the reliability of the data. Many secondary users of measurement data assume that calibration data and the measurement of other auxiliary variables are sufficient to estimate the data reliability. However, even if some devices require on-site field calibration, that does not mean that the data are comparable. Heat, cold, internal processes on electronic components can lead to differences in measurement data recorded with devices of the same type at the same location, especially with the increasingly complex devices themselves.
The data reliability can be increased by implementing data uncertainty issues within the FAIR principle. The poster presentation will show the importance of comparative measurements, the information needs for the application of proxy-transfer functions, and suitable uncertainty analysis for databases.
How to cite: Koedel, U. and Dietrich, P.: Going beyond FAIR to increase data reliability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11117, https://doi.org/10.5194/egusphere-egu2020-11117, 2020.
The FAIR principle is on its way to becoming a conventional standard for all kinds of data. However, it is often forgotten that this principle does not consider data quality or data reliability issues. If the data quality isis not sufficiently described, a wrong interpretation and use of these data in a common interpretation can lead to false scientific conclusions. Hence, the statement about data reliability is an essential component for secondary data processing and joint interpretation efforts. Information on data reliability, uncertainty, quality as well as information on the used devices are essential and needs to be introduced or even implemented in the workflow from the sensor to a database if data is to be considered in a broader context.
In the past, many publications have shown that the same devices at the same location do not necessarily provide the same measurement data. Likewise, statistical quantities and confidence intervals are rarely given in publications in order to assess the reliability of the data. Many secondary users of measurement data assume that calibration data and the measurement of other auxiliary variables are sufficient to estimate the data reliability. However, even if some devices require on-site field calibration, that does not mean that the data are comparable. Heat, cold, internal processes on electronic components can lead to differences in measurement data recorded with devices of the same type at the same location, especially with the increasingly complex devices themselves.
The data reliability can be increased by implementing data uncertainty issues within the FAIR principle. The poster presentation will show the importance of comparative measurements, the information needs for the application of proxy-transfer functions, and suitable uncertainty analysis for databases.
How to cite: Koedel, U. and Dietrich, P.: Going beyond FAIR to increase data reliability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11117, https://doi.org/10.5194/egusphere-egu2020-11117, 2020.
ESSI2.1 – Metadata, Data Models, Semantics, and Collaboration
EGU2020-3663 | Displays | ESSI2.1
Ensuring Scientific Reproducibility within the Earth Observation Community: Standardized Algorithm Documentation for Improved Scientific Data UnderstandingAaron Kaulfus, Kaylin Bugbee, Alyssa Harris, Rahul Ramachandran, Sean Harkins, Aimee Barciauskas, and Deborah Smith
Algorithm Theoretical Basis Documents (ATBDs) accompany Earth observation data generated from algorithms. ATBDs describe the physical theory, mathematical procedures and assumptions made for the algorithms that convert radiances received by remote sensing instruments into geophysical quantities. While ATBDs are critical to scientific reproducibility and data reuse, there have been technical, social and informational issues surrounding the creation and maintenance of these key documents. A standard ATBD structure has been lacking, resulting in inconsistent documents of varying levels of detail. Due to the lack of a minimum set of requirements, there has been very little formal guidance on the ATBD publication process. Additionally, ATBDs have typically been provided as static documents that are not machine readable, making search and discovery of the documents and the content within the documents difficult for users. To address the challenges surrounding ATBDs, NASA has prototyped the Algorithm Publication Tool (APT), a centralized cloud-based publication tool that standardizes the ATBD content model and streamlines the ATBD authoring process. This presentation will describe our approach in developing a common information model for ATBDs and our efforts to provide ATBDs as dynamic documents that are available for both human and machine utilization. We will also include our vision for APT within the broader NASA Earth science data system and how this tool may assist in standardizes and easing the ATBD creation and maintenance process.
How to cite: Kaulfus, A., Bugbee, K., Harris, A., Ramachandran, R., Harkins, S., Barciauskas, A., and Smith, D.: Ensuring Scientific Reproducibility within the Earth Observation Community: Standardized Algorithm Documentation for Improved Scientific Data Understanding, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3663, https://doi.org/10.5194/egusphere-egu2020-3663, 2020.
Algorithm Theoretical Basis Documents (ATBDs) accompany Earth observation data generated from algorithms. ATBDs describe the physical theory, mathematical procedures and assumptions made for the algorithms that convert radiances received by remote sensing instruments into geophysical quantities. While ATBDs are critical to scientific reproducibility and data reuse, there have been technical, social and informational issues surrounding the creation and maintenance of these key documents. A standard ATBD structure has been lacking, resulting in inconsistent documents of varying levels of detail. Due to the lack of a minimum set of requirements, there has been very little formal guidance on the ATBD publication process. Additionally, ATBDs have typically been provided as static documents that are not machine readable, making search and discovery of the documents and the content within the documents difficult for users. To address the challenges surrounding ATBDs, NASA has prototyped the Algorithm Publication Tool (APT), a centralized cloud-based publication tool that standardizes the ATBD content model and streamlines the ATBD authoring process. This presentation will describe our approach in developing a common information model for ATBDs and our efforts to provide ATBDs as dynamic documents that are available for both human and machine utilization. We will also include our vision for APT within the broader NASA Earth science data system and how this tool may assist in standardizes and easing the ATBD creation and maintenance process.
How to cite: Kaulfus, A., Bugbee, K., Harris, A., Ramachandran, R., Harkins, S., Barciauskas, A., and Smith, D.: Ensuring Scientific Reproducibility within the Earth Observation Community: Standardized Algorithm Documentation for Improved Scientific Data Understanding, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3663, https://doi.org/10.5194/egusphere-egu2020-3663, 2020.
EGU2020-18976 | Displays | ESSI2.1
Managing the knowledge created by the users trough Geospatial User Feedback system. The NEXTGEOSS use caseAlaitz Zabala Torres, Joan Masó Pau, and Xavier Pons
First approach to metadata was based on producer's point of view, since producers were responsible for documenting and sharing metadata about their products. Since 2012 (started in EU FP7 GeoViQua project), the Geospatial User Feedback approach described the user perspective on datasets/services (GUF, OGC standard in 2016). In the past users of the data gained knowledge about and with the data, but they lacked the means to easily and automatically share this knowledge in a formal way.
In the EU H2020 NextGEOSS project, the NiMMbus system has been matured as an interoperable solution to manage and store feedback items following the OGC GUF standard. NiMMbus can be used as a component for any geospatial portal, and, so far, has been integrated in several H2020 project catalogues or portals (NextGEOSS, ECOPotential, GeoEssential and GroundTruth2.0).
User feedback metadata complements producer's metadata and adds value to the resource description in a geospatial portal by collecting the knowledge gained by the user while using the data for the purpose originally foreseen by the producer or an innovative one.
The current GEOSS platform provide access to endless data resources. But to truly assist decision making, GEOSS wants to add a knowledge base. We believe that the NiMMbus system is a significant NextGEOSS contribution is this direction.
This communication describes how to extend the GUF to provide a set of knowledge elements and connect them to the original data creating a network of knowledge. They can be citations (publications and policy briefs), quality indications (QualityML vocabulary and ISO 19157), usage reports (code and analytical processes), etc. The NiMMbus offers tools to create different levels of feedback starting with comments, providing citations or extract quality indicators for the different quality classes (positional, temporal and attribute accuracy, completeness, consistency) and share them to other users as part of the user feedback and usage report. Usage reports in GUF standards can be extended to include code fragments that other users can apply to reproduce a previous usage. For example, in ECOPotential Protected Areas from Space map browser (continues on H2020 e-Shape project) a vegetation index optimum to observe phenological blooms can be encoded by a user in the layer calculation using a combination of original Sentinel-2 bands. The portal stores that in a JavaScript code (serialized as JSON) that describes which layers and formula were used. Once a user validated the new layer, can decide to make it available to everyone by publishing it as an open source JavaScript code in the NiMMbus system. From then on, any other user of the portal can import it and use it. As the usage description is a full feedback item, the user creating the dynamic layer can also describe any other related information such as comments or advertise a related publication.
The system moves the focus to sharing user of the data and complements the producers documentation with the richness of the knowledge that user gain in their data driven research. In addition to augment GEOSS data the system enables a social network of knowledge.
How to cite: Zabala Torres, A., Masó Pau, J., and Pons, X.: Managing the knowledge created by the users trough Geospatial User Feedback system. The NEXTGEOSS use case, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18976, https://doi.org/10.5194/egusphere-egu2020-18976, 2020.
First approach to metadata was based on producer's point of view, since producers were responsible for documenting and sharing metadata about their products. Since 2012 (started in EU FP7 GeoViQua project), the Geospatial User Feedback approach described the user perspective on datasets/services (GUF, OGC standard in 2016). In the past users of the data gained knowledge about and with the data, but they lacked the means to easily and automatically share this knowledge in a formal way.
In the EU H2020 NextGEOSS project, the NiMMbus system has been matured as an interoperable solution to manage and store feedback items following the OGC GUF standard. NiMMbus can be used as a component for any geospatial portal, and, so far, has been integrated in several H2020 project catalogues or portals (NextGEOSS, ECOPotential, GeoEssential and GroundTruth2.0).
User feedback metadata complements producer's metadata and adds value to the resource description in a geospatial portal by collecting the knowledge gained by the user while using the data for the purpose originally foreseen by the producer or an innovative one.
The current GEOSS platform provide access to endless data resources. But to truly assist decision making, GEOSS wants to add a knowledge base. We believe that the NiMMbus system is a significant NextGEOSS contribution is this direction.
This communication describes how to extend the GUF to provide a set of knowledge elements and connect them to the original data creating a network of knowledge. They can be citations (publications and policy briefs), quality indications (QualityML vocabulary and ISO 19157), usage reports (code and analytical processes), etc. The NiMMbus offers tools to create different levels of feedback starting with comments, providing citations or extract quality indicators for the different quality classes (positional, temporal and attribute accuracy, completeness, consistency) and share them to other users as part of the user feedback and usage report. Usage reports in GUF standards can be extended to include code fragments that other users can apply to reproduce a previous usage. For example, in ECOPotential Protected Areas from Space map browser (continues on H2020 e-Shape project) a vegetation index optimum to observe phenological blooms can be encoded by a user in the layer calculation using a combination of original Sentinel-2 bands. The portal stores that in a JavaScript code (serialized as JSON) that describes which layers and formula were used. Once a user validated the new layer, can decide to make it available to everyone by publishing it as an open source JavaScript code in the NiMMbus system. From then on, any other user of the portal can import it and use it. As the usage description is a full feedback item, the user creating the dynamic layer can also describe any other related information such as comments or advertise a related publication.
The system moves the focus to sharing user of the data and complements the producers documentation with the richness of the knowledge that user gain in their data driven research. In addition to augment GEOSS data the system enables a social network of knowledge.
How to cite: Zabala Torres, A., Masó Pau, J., and Pons, X.: Managing the knowledge created by the users trough Geospatial User Feedback system. The NEXTGEOSS use case, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18976, https://doi.org/10.5194/egusphere-egu2020-18976, 2020.
EGU2020-19636 | Displays | ESSI2.1
Advances in Collaborative Documentation Support for CMIP6Charlotte Pascoe, David Hassell, Martina Stockhause, and Mark Greenslade
Citation Service: http://cmip6cite.wdc-climate.de
How to cite: Pascoe, C., Hassell, D., Stockhause, M., and Greenslade, M.: Advances in Collaborative Documentation Support for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19636, https://doi.org/10.5194/egusphere-egu2020-19636, 2020.
Citation Service: http://cmip6cite.wdc-climate.de
How to cite: Pascoe, C., Hassell, D., Stockhause, M., and Greenslade, M.: Advances in Collaborative Documentation Support for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19636, https://doi.org/10.5194/egusphere-egu2020-19636, 2020.
EGU2020-9412 | Displays | ESSI2.1
Semantic harmonization of geoscientific data sets using Linked Data and project specific vocabulariesMartin Schiegl, Gerold W. Diepolder, Abdelfettah Feliachi, José Román Hernández Manchado, Christine Hörfarter, Olov Johansson, Andreas-Alexander Maul, Marco Pantaloni, László Sőrés, and Rob van Ede
In geosciences, where nomenclature naturally has grown from regional approaches with limited cross-border harmonization, descriptive texts are often used for coding data whose meanings in the international context are not conclusively clarified. This leads to difficulties when cross border datasets are compiled. On one hand, this is caused by the national-language, regional and historical descriptions in geological map legends. On the other hand, it is related to the interdisciplinary orientation of the geosciences e.g. when concepts adopted from different areas have a different meaning. A consistent use and interpretation of data to international standards creates the potential for semantic interoperability. Datasets then fit into international data infrastructures. But what if the interpretation to international standards is not possible, because there is none, or existing standards are not applicable? Then efforts can be made to create machine-readable data using knowledge representations based on Semantic Web and Linked Data principles.
With making concepts reference able via uniform identifiers (HTTP URIs) and crosslinking them to other resources published in the web, Linked Data offers the necessary context for clarification of the meaning of concepts. This modern technology and approach ideally complements the mainstream GIS (Geographic Information System) and relational database technologies in making data findable and semantic interoperable.
GeoERA project (Establishing the European Geological Surveys Research Area to deliver a Geological Service for Europe, https://geoera.eu/) therefore provides the opportunity to clarify expert knowledge and terminology in the form of project specific vocabulary concepts on a scientific level and to use them in datasets to code data. At the same time, parts of this vocabulary might be later included in international standards (e.g. INSPIRE or GeoSciML), if desired. So called “GeoERA Project Vocabularies” are open collections of knowledge that, for example, may also contain deprecated, historical or only regionally relevant terms. In an ideal overall view, the sum of all vocabularies results in a knowledge database of bibliographically referenced terms that have been developed through scientific projects. Due to the consistent application of the data standards of Semantic Web and Linked Data nothing stands in the way of further use by modern technologies such as AI.
Project Vocabularies also could build an initial part of a future EGDI (European Geological Data Infrastructure, http://www.europe-geology.eu/) knowledge graph. They are restricted to linguistic labeled concepts, described in SKOS (Simple Knowledge Organization System) plus metadata properties with focus on scientific reusability. In order to extend this knowledge graph, additionally they also could be supplemented by RDF data files to support project related applications and functionality.
How to cite: Schiegl, M., Diepolder, G. W., Feliachi, A., Hernández Manchado, J. R., Hörfarter, C., Johansson, O., Maul, A.-A., Pantaloni, M., Sőrés, L., and van Ede, R.: Semantic harmonization of geoscientific data sets using Linked Data and project specific vocabularies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9412, https://doi.org/10.5194/egusphere-egu2020-9412, 2020.
In geosciences, where nomenclature naturally has grown from regional approaches with limited cross-border harmonization, descriptive texts are often used for coding data whose meanings in the international context are not conclusively clarified. This leads to difficulties when cross border datasets are compiled. On one hand, this is caused by the national-language, regional and historical descriptions in geological map legends. On the other hand, it is related to the interdisciplinary orientation of the geosciences e.g. when concepts adopted from different areas have a different meaning. A consistent use and interpretation of data to international standards creates the potential for semantic interoperability. Datasets then fit into international data infrastructures. But what if the interpretation to international standards is not possible, because there is none, or existing standards are not applicable? Then efforts can be made to create machine-readable data using knowledge representations based on Semantic Web and Linked Data principles.
With making concepts reference able via uniform identifiers (HTTP URIs) and crosslinking them to other resources published in the web, Linked Data offers the necessary context for clarification of the meaning of concepts. This modern technology and approach ideally complements the mainstream GIS (Geographic Information System) and relational database technologies in making data findable and semantic interoperable.
GeoERA project (Establishing the European Geological Surveys Research Area to deliver a Geological Service for Europe, https://geoera.eu/) therefore provides the opportunity to clarify expert knowledge and terminology in the form of project specific vocabulary concepts on a scientific level and to use them in datasets to code data. At the same time, parts of this vocabulary might be later included in international standards (e.g. INSPIRE or GeoSciML), if desired. So called “GeoERA Project Vocabularies” are open collections of knowledge that, for example, may also contain deprecated, historical or only regionally relevant terms. In an ideal overall view, the sum of all vocabularies results in a knowledge database of bibliographically referenced terms that have been developed through scientific projects. Due to the consistent application of the data standards of Semantic Web and Linked Data nothing stands in the way of further use by modern technologies such as AI.
Project Vocabularies also could build an initial part of a future EGDI (European Geological Data Infrastructure, http://www.europe-geology.eu/) knowledge graph. They are restricted to linguistic labeled concepts, described in SKOS (Simple Knowledge Organization System) plus metadata properties with focus on scientific reusability. In order to extend this knowledge graph, additionally they also could be supplemented by RDF data files to support project related applications and functionality.
How to cite: Schiegl, M., Diepolder, G. W., Feliachi, A., Hernández Manchado, J. R., Hörfarter, C., Johansson, O., Maul, A.-A., Pantaloni, M., Sőrés, L., and van Ede, R.: Semantic harmonization of geoscientific data sets using Linked Data and project specific vocabularies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9412, https://doi.org/10.5194/egusphere-egu2020-9412, 2020.
EGU2020-10227 | Displays | ESSI2.1
Towards an ontology based conceptual model, establishing maximum interoperability for interactive and distributed processing of geoscientific informationRainer Haener, Henning Lorenz, Sylvain Grellet, Marc Urvois, and Eberhard Kunz
This study presents an approach on how to establish Conceptual Interoperability for autonomous, multidisciplinary systems participating in Research Infrastructures, Early Warning, or Risk Management Systems. Although promising implementations already exist, true interoperability is far from being achieved. Therefore, reference architectures and principles of Systems-of-Systems are adapted for a fully specified, yet implementation-independent Conceptual Model, establishing interoperability to the highest possible degree. The approach utilises use cases and requirements from geological information processing and modelling within the European Plate Observing System (EPOS).
Conceptual Interoperability can be accomplished by enabling Service Composability. Unlike integration, composability allows interactive data processing and beyond, evolving systems that enable interpretation and evaluation by any potential participant. Integrating data from different domains often leads to monolithic services that are implemented only for a specific purpose (Stovepipe System). Consequently, composability is essential for collaborative information processing, especially in modern interactive computing and exploration environments. A major design principle for achieving composability is Dependency Injection, allowing flexible combinations (Loose Coupling) of services that implement common, standardised interfaces (abstractions). Another decisive factor for establishing interoperability are Metamodels of data models that specify data and semantics regardless of their domain, based on a common, reusable approach. Thus, data from different domains can be represented by one common encoding that e.g. abstracts landslides (geophysical models) or buildings (urban planning) based on their geometry. An indispensable part of a Conceptual Model is detailed semantics, which not only requires terms from Domain-Controlled Vocabularies, but also ontologies providing qualified statements about the relationship between data and associated concepts. This is of major importance for evolutionary systems that are able to comprehend and react to state changes. Maximum interoperability also requires strict modularisation for a clear separation of semantics, metadata and the data itself.
Conceptual models for geological information that are governed by the described principles and their implementations are still far away. Moreover, a route to achieve such models is not straightforward. They span a multitude of communities and are far too complex for conventional implementation in project form. A first step could be applying modern design principles to new developments in the various scientific communities and join the results under a common stewardship like the Open Geospatial Consortium (OGC). Recently, a Metamodel has been developed within the OGC’s Borehole Interoperability Experiment (BoreholeIE); initiated and led by the French Geological Survey (BRGM). It combines the ISO standard (19148:2012 linear referencing) for localisation along borehole paths with the adaption of different encodings of borehole logs based on well-established OGC standards. Further developments aim at correlating borehole logs, geological or geotechnical surveys, and geoscientific models. Since results of surveys are often only available as non-schematised interpretations in text form, interoperability requires formal classifications, which can be derived from machine learning methods applied to the interpretations. As part of a Conceptual Model, such classifications can be used for an automated exchange of standard-conform borehole logs or to support the generation of expert opinions on soil investigations.
How to cite: Haener, R., Lorenz, H., Grellet, S., Urvois, M., and Kunz, E.: Towards an ontology based conceptual model, establishing maximum interoperability for interactive and distributed processing of geoscientific information, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10227, https://doi.org/10.5194/egusphere-egu2020-10227, 2020.
This study presents an approach on how to establish Conceptual Interoperability for autonomous, multidisciplinary systems participating in Research Infrastructures, Early Warning, or Risk Management Systems. Although promising implementations already exist, true interoperability is far from being achieved. Therefore, reference architectures and principles of Systems-of-Systems are adapted for a fully specified, yet implementation-independent Conceptual Model, establishing interoperability to the highest possible degree. The approach utilises use cases and requirements from geological information processing and modelling within the European Plate Observing System (EPOS).
Conceptual Interoperability can be accomplished by enabling Service Composability. Unlike integration, composability allows interactive data processing and beyond, evolving systems that enable interpretation and evaluation by any potential participant. Integrating data from different domains often leads to monolithic services that are implemented only for a specific purpose (Stovepipe System). Consequently, composability is essential for collaborative information processing, especially in modern interactive computing and exploration environments. A major design principle for achieving composability is Dependency Injection, allowing flexible combinations (Loose Coupling) of services that implement common, standardised interfaces (abstractions). Another decisive factor for establishing interoperability are Metamodels of data models that specify data and semantics regardless of their domain, based on a common, reusable approach. Thus, data from different domains can be represented by one common encoding that e.g. abstracts landslides (geophysical models) or buildings (urban planning) based on their geometry. An indispensable part of a Conceptual Model is detailed semantics, which not only requires terms from Domain-Controlled Vocabularies, but also ontologies providing qualified statements about the relationship between data and associated concepts. This is of major importance for evolutionary systems that are able to comprehend and react to state changes. Maximum interoperability also requires strict modularisation for a clear separation of semantics, metadata and the data itself.
Conceptual models for geological information that are governed by the described principles and their implementations are still far away. Moreover, a route to achieve such models is not straightforward. They span a multitude of communities and are far too complex for conventional implementation in project form. A first step could be applying modern design principles to new developments in the various scientific communities and join the results under a common stewardship like the Open Geospatial Consortium (OGC). Recently, a Metamodel has been developed within the OGC’s Borehole Interoperability Experiment (BoreholeIE); initiated and led by the French Geological Survey (BRGM). It combines the ISO standard (19148:2012 linear referencing) for localisation along borehole paths with the adaption of different encodings of borehole logs based on well-established OGC standards. Further developments aim at correlating borehole logs, geological or geotechnical surveys, and geoscientific models. Since results of surveys are often only available as non-schematised interpretations in text form, interoperability requires formal classifications, which can be derived from machine learning methods applied to the interpretations. As part of a Conceptual Model, such classifications can be used for an automated exchange of standard-conform borehole logs or to support the generation of expert opinions on soil investigations.
How to cite: Haener, R., Lorenz, H., Grellet, S., Urvois, M., and Kunz, E.: Towards an ontology based conceptual model, establishing maximum interoperability for interactive and distributed processing of geoscientific information, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10227, https://doi.org/10.5194/egusphere-egu2020-10227, 2020.
EGU2020-10600 | Displays | ESSI2.1
Information Model Governance for Diverse DisciplinesJohn S. Hughes and Daniel J. Crichton
The PDS4 Information Model (IM) Version 1.13.0.0 was released for use in December 2019. The ontology-based IM remains true to its foundational principles found in the Open Archive Information System (OAIS) Reference Model (ISO 14721) and the Metadata Registry (MDR) standard (ISO/IEC 11179). The standards generated from the IM have become the de-facto data archiving standards for the international planetary science community and have successfully scaled to meet the requirements of the diverse and evolving planetary science disciplines.
A key foundational principle is the use of a multi-level governance scheme that partitions the IM into semi-independent dictionaries. The governance scheme first partitions the IM vertically into three levels, the common, discipline, and project/mission levels. The IM is then partitioned horizontally across both discipline and project/mission levels into individual Local Data Dictionaries (LDDs).
The Common dictionary defines the classes used across the science disciplines such as product, collection, bundle, data formats, data types, and units of measurement. The dictionary resulted from a large collaborative effort involving domain experts across the community. An ontology modeling tool was used to enforce a modeling discipline, for configuration management, to ensure consistency and extensibility, and to enable interoperability. The Common dictionary encompasses the information categories defined in the OAIS RM, specifically data representation, provenance, fixity, identification, reference, and context. Over the last few years, the Common dictionary has remained relatively stable in spite of requirements levied by new missions, instruments, and more complex data types.
Since the release of the Common dictionary, the creation of a significant number of LDDs has proved the effectiveness of multi-level, steward-based governance. This scheme is allowing the IM to scale to meet the archival and interoperability demands of the evolving disciplines. In fact, an LDD development “cottage industry” has emerged that required improvements to the development processes and configuration management. An LDD development tool now allows dictionary stewards to quickly produce specialized LDDs that are consistent with the Common dictionary.
The PDS4 Information Model is a world-class knowledge-base that governs the Planetary Science community's trusted digital repositories. This presentation will provide an overview of the model and additional information about its multi-level governance scheme including the topics of stewardship, configuration management, processes, and oversight.
How to cite: Hughes, J. S. and Crichton, D. J.: Information Model Governance for Diverse Disciplines, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10600, https://doi.org/10.5194/egusphere-egu2020-10600, 2020.
The PDS4 Information Model (IM) Version 1.13.0.0 was released for use in December 2019. The ontology-based IM remains true to its foundational principles found in the Open Archive Information System (OAIS) Reference Model (ISO 14721) and the Metadata Registry (MDR) standard (ISO/IEC 11179). The standards generated from the IM have become the de-facto data archiving standards for the international planetary science community and have successfully scaled to meet the requirements of the diverse and evolving planetary science disciplines.
A key foundational principle is the use of a multi-level governance scheme that partitions the IM into semi-independent dictionaries. The governance scheme first partitions the IM vertically into three levels, the common, discipline, and project/mission levels. The IM is then partitioned horizontally across both discipline and project/mission levels into individual Local Data Dictionaries (LDDs).
The Common dictionary defines the classes used across the science disciplines such as product, collection, bundle, data formats, data types, and units of measurement. The dictionary resulted from a large collaborative effort involving domain experts across the community. An ontology modeling tool was used to enforce a modeling discipline, for configuration management, to ensure consistency and extensibility, and to enable interoperability. The Common dictionary encompasses the information categories defined in the OAIS RM, specifically data representation, provenance, fixity, identification, reference, and context. Over the last few years, the Common dictionary has remained relatively stable in spite of requirements levied by new missions, instruments, and more complex data types.
Since the release of the Common dictionary, the creation of a significant number of LDDs has proved the effectiveness of multi-level, steward-based governance. This scheme is allowing the IM to scale to meet the archival and interoperability demands of the evolving disciplines. In fact, an LDD development “cottage industry” has emerged that required improvements to the development processes and configuration management. An LDD development tool now allows dictionary stewards to quickly produce specialized LDDs that are consistent with the Common dictionary.
The PDS4 Information Model is a world-class knowledge-base that governs the Planetary Science community's trusted digital repositories. This presentation will provide an overview of the model and additional information about its multi-level governance scheme including the topics of stewardship, configuration management, processes, and oversight.
How to cite: Hughes, J. S. and Crichton, D. J.: Information Model Governance for Diverse Disciplines, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10600, https://doi.org/10.5194/egusphere-egu2020-10600, 2020.
The geosciences are facing a Big Data problem, particularly in the areas of data Volume (huge observational datasets and numerical model outputs), Variety (large numbers of disparate datasets from multiple sources with inconsistent standards), and Velocity (need for rapid processing of continuous data streams). These challenges make it difficult to perform scientific research and to make decisions about serious environmental issues facing our planet. We need to enable science at the scale of our large, disparate, and continuous data.
One part of the solution relates to infrastructure, such as by making large datasets available in a shared environment co-located with computational resources so that we can bring the analysis code to the data instead of copying data. The other part relies on improvements in metadata, data models, semantics, and collaboration. Individual datasets must have comprehensive, accurate, and machine-readable metadata to enable assessment of their relevance to a specific problem. Multiple datasets must be mapped into an overarching data model rooted in the geographical and temporal attributes to enable us to seamlessly find and access data for the appropriate location and time. Semantic mapping is necessary to enable data from different disciplines to be brought to bear on the same problem. Progress in all these areas will require collaboration on technical methods, interoperability standards, and analysis software that bridges information communities -- collaboration driven by a willingness to make data usable by those outside of the original scientific discipline.
How to cite: de La Beaujardiere, J.: Enabling Science at Scale, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12079, https://doi.org/10.5194/egusphere-egu2020-12079, 2020.
The geosciences are facing a Big Data problem, particularly in the areas of data Volume (huge observational datasets and numerical model outputs), Variety (large numbers of disparate datasets from multiple sources with inconsistent standards), and Velocity (need for rapid processing of continuous data streams). These challenges make it difficult to perform scientific research and to make decisions about serious environmental issues facing our planet. We need to enable science at the scale of our large, disparate, and continuous data.
One part of the solution relates to infrastructure, such as by making large datasets available in a shared environment co-located with computational resources so that we can bring the analysis code to the data instead of copying data. The other part relies on improvements in metadata, data models, semantics, and collaboration. Individual datasets must have comprehensive, accurate, and machine-readable metadata to enable assessment of their relevance to a specific problem. Multiple datasets must be mapped into an overarching data model rooted in the geographical and temporal attributes to enable us to seamlessly find and access data for the appropriate location and time. Semantic mapping is necessary to enable data from different disciplines to be brought to bear on the same problem. Progress in all these areas will require collaboration on technical methods, interoperability standards, and analysis software that bridges information communities -- collaboration driven by a willingness to make data usable by those outside of the original scientific discipline.
How to cite: de La Beaujardiere, J.: Enabling Science at Scale, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12079, https://doi.org/10.5194/egusphere-egu2020-12079, 2020.
EGU2020-2117 | Displays | ESSI2.1
Dealing with Semantic Heterogeneity of Geographic Concepts: A Geometric Algebra-Based Encoding MethodFan Wu, Hong Gao, and Zhaoyuan Yu
A conceptual consensus, as well as a unified representation, on a certain geographic concept across multiple contexts, can be of great significance to the communication, retrieval, combination, and reuse of geographic information and knowledge. However, geographic concept is a rich synthesis of semantics, semiotics, quality (e.g., vagueness or approximation). The generation, representation calculation and application of a certain geographic concept, consequently, can be of great heterogeneity, especially considering different interests, domains, language, etc. In light of these semantic heterogeneity problems, to code core concepts uniquely can be a lighter alternative to tradition ontology-based method, the reason for which is numeric codes can be a symbolism of consensus on concept across domains and even languages. Consequently, this paper proposed a unified semantic model as well as an encoding framework for representation, reasoning, and computation of geographic concept based on geometric algebra (GA). In this method, a geographic concept can be represented as a collection of semantic elements, which can be further encoded based on its hierarchy structure, and all the semantic information of the concept can be preserved across the encoding process. On the basis of the encoding result, semantic information can be reasoned backward by some well-defined operators, semantic similarity can also be computed for information inference as well as semantic association retrieval. In the case study, the implementation of the proposed framework shows that this GA-based semantic encoding model of can be a promising method to the unified expression, reasoning, and calculation of geographic concepts, which, reasonably, can be further regarded as a prospect lighter alternative of the solution to semantic heterogeneity.
How to cite: Wu, F., Gao, H., and Yu, Z.: Dealing with Semantic Heterogeneity of Geographic Concepts: A Geometric Algebra-Based Encoding Method, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2117, https://doi.org/10.5194/egusphere-egu2020-2117, 2020.
A conceptual consensus, as well as a unified representation, on a certain geographic concept across multiple contexts, can be of great significance to the communication, retrieval, combination, and reuse of geographic information and knowledge. However, geographic concept is a rich synthesis of semantics, semiotics, quality (e.g., vagueness or approximation). The generation, representation calculation and application of a certain geographic concept, consequently, can be of great heterogeneity, especially considering different interests, domains, language, etc. In light of these semantic heterogeneity problems, to code core concepts uniquely can be a lighter alternative to tradition ontology-based method, the reason for which is numeric codes can be a symbolism of consensus on concept across domains and even languages. Consequently, this paper proposed a unified semantic model as well as an encoding framework for representation, reasoning, and computation of geographic concept based on geometric algebra (GA). In this method, a geographic concept can be represented as a collection of semantic elements, which can be further encoded based on its hierarchy structure, and all the semantic information of the concept can be preserved across the encoding process. On the basis of the encoding result, semantic information can be reasoned backward by some well-defined operators, semantic similarity can also be computed for information inference as well as semantic association retrieval. In the case study, the implementation of the proposed framework shows that this GA-based semantic encoding model of can be a promising method to the unified expression, reasoning, and calculation of geographic concepts, which, reasonably, can be further regarded as a prospect lighter alternative of the solution to semantic heterogeneity.
How to cite: Wu, F., Gao, H., and Yu, Z.: Dealing with Semantic Heterogeneity of Geographic Concepts: A Geometric Algebra-Based Encoding Method, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2117, https://doi.org/10.5194/egusphere-egu2020-2117, 2020.
Sending data to a secure long-term archive is increasingly a necessity for science projects due to the funding body and publishing requirements. It is also good practice for long term scientific aims and to enable the preservation and re-use of valuable research data. The Centre for Environmental Data Analysis (CEDA) hosts a data archive holding vast atmospheric and earth observation data from sources including aircraft campaigns, satellites, pollution, automatic weather stations, climate models, etc. The CEDA archive currently holds 14 PB data, in over 250 millions of files, which makes it challenging to discover and access specific data. In order to manage this, it is necessary to use standard formats and descriptions about the data. This poster will explore best practice in data management in CEDA and show tools used to archive and share data.
How to cite: Winfield, K.: Data Management at CEDA, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2375, https://doi.org/10.5194/egusphere-egu2020-2375, 2020.
Sending data to a secure long-term archive is increasingly a necessity for science projects due to the funding body and publishing requirements. It is also good practice for long term scientific aims and to enable the preservation and re-use of valuable research data. The Centre for Environmental Data Analysis (CEDA) hosts a data archive holding vast atmospheric and earth observation data from sources including aircraft campaigns, satellites, pollution, automatic weather stations, climate models, etc. The CEDA archive currently holds 14 PB data, in over 250 millions of files, which makes it challenging to discover and access specific data. In order to manage this, it is necessary to use standard formats and descriptions about the data. This poster will explore best practice in data management in CEDA and show tools used to archive and share data.
How to cite: Winfield, K.: Data Management at CEDA, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2375, https://doi.org/10.5194/egusphere-egu2020-2375, 2020.
EGU2020-5131 | Displays | ESSI2.1
Open access to geological information and 3D modelling data sets in the European Plate Observing System platform (EPOS)Marc Urvois, Sylvain Grellet, Abdelfettah Feliachi, Henning Lorenz, Rainer Haener, Christian Brogaard Pedersen, Martin Hansen, Luca Guerrieri, Carlo Cipolloni, and Mary Carter
The European Plate Observing System (EPOS, www.epos-ip.org) is a multidisciplinary pan-European research infrastructure for solid Earth science. It integrates a series of domain-specific service hubs such as the Geological Information and Modelling Technical Core Service (TCS GIM) dedicated to access data, data products and services on European boreholes, geological and geohazards maps, mineral resources as well as a catalogue of 3D models. These are hosted by European Geological Surveys and national research organisations.
Even though interoperability implementation frameworks are well described and used (ISO, OGC, IUGS/CGI, INSPIRE …), it proved to be difficult for several data providers to deploy in the first place the required OGC services supporting the full semantic definition (OGC Complex Feature) to discover and view millions of geological entities. Instead, data are collected and exposed using a simpler yet standardised description (GeoSciML Lite & EarthResourceML Lite). Subsequently, the more complex data flows are deployed with the corresponding semantics.
This approach was applied to design and implement the European Borehole Index and associated web services (View-WMS and Discovery-WFS) and extended to 3D Models. TCS GIM exposes to EPOS Central Integrated Core Services infrastructure a metadata catalogue service, a series of “index services”, a codeList registry and a Linked Data resolver. These allow EPOS end users to search and locate boreholes, geological maps and features, 3D models, etc., based on the information held by the index services.
In addition to these services, TCS GIM focussed particularly on sharing European geological data using the Linked Data approach. Each instance is associated with a URI and points to other information resources also using URIs. The Linked Data principles ensure the best semantic description (e.g. URIs to shared codeList registries entries) and also enrich an initial “information seed” (e.g. a set of Borehole entries matching a search) with more contents (e.g. URIs to more Features or a more complex description). As a result, this pattern including Simple Feature and Linked Data has a positive effect on the IT architecture: interoperable services are simpler and faster to deploy and there is no need to harvest a full OGC Complex Feature dataset. This architecture is also more scalable and sustainable.
The European Geological Services codeList registries have been enriched with new vocabularies as part of the European Geoscience Registry. In compliance with the relevant European INSPIRE rules, this registry is now part of the INPIRE Register Federation, the central access point to the repository for vocabulary and resources. European Geoscience Registry is available for reuse and extension by other geoscientific projects.
During the EPOS project, this approach has been developed and implemented for the Borehole and Model data services. TCS GIM team provided feedback on INSPIRE through the Earth Science Cluster, contributed to the creation of the OGC GeoScience Domain Working Group in 2017, the launch of the OGC Borehole Interoperability Experiment in 2018, and proposed evolutions to the OGC GeoSciML and IUGS/CGI EarthResourceML standards.
How to cite: Urvois, M., Grellet, S., Feliachi, A., Lorenz, H., Haener, R., Brogaard Pedersen, C., Hansen, M., Guerrieri, L., Cipolloni, C., and Carter, M.: Open access to geological information and 3D modelling data sets in the European Plate Observing System platform (EPOS), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5131, https://doi.org/10.5194/egusphere-egu2020-5131, 2020.
The European Plate Observing System (EPOS, www.epos-ip.org) is a multidisciplinary pan-European research infrastructure for solid Earth science. It integrates a series of domain-specific service hubs such as the Geological Information and Modelling Technical Core Service (TCS GIM) dedicated to access data, data products and services on European boreholes, geological and geohazards maps, mineral resources as well as a catalogue of 3D models. These are hosted by European Geological Surveys and national research organisations.
Even though interoperability implementation frameworks are well described and used (ISO, OGC, IUGS/CGI, INSPIRE …), it proved to be difficult for several data providers to deploy in the first place the required OGC services supporting the full semantic definition (OGC Complex Feature) to discover and view millions of geological entities. Instead, data are collected and exposed using a simpler yet standardised description (GeoSciML Lite & EarthResourceML Lite). Subsequently, the more complex data flows are deployed with the corresponding semantics.
This approach was applied to design and implement the European Borehole Index and associated web services (View-WMS and Discovery-WFS) and extended to 3D Models. TCS GIM exposes to EPOS Central Integrated Core Services infrastructure a metadata catalogue service, a series of “index services”, a codeList registry and a Linked Data resolver. These allow EPOS end users to search and locate boreholes, geological maps and features, 3D models, etc., based on the information held by the index services.
In addition to these services, TCS GIM focussed particularly on sharing European geological data using the Linked Data approach. Each instance is associated with a URI and points to other information resources also using URIs. The Linked Data principles ensure the best semantic description (e.g. URIs to shared codeList registries entries) and also enrich an initial “information seed” (e.g. a set of Borehole entries matching a search) with more contents (e.g. URIs to more Features or a more complex description). As a result, this pattern including Simple Feature and Linked Data has a positive effect on the IT architecture: interoperable services are simpler and faster to deploy and there is no need to harvest a full OGC Complex Feature dataset. This architecture is also more scalable and sustainable.
The European Geological Services codeList registries have been enriched with new vocabularies as part of the European Geoscience Registry. In compliance with the relevant European INSPIRE rules, this registry is now part of the INPIRE Register Federation, the central access point to the repository for vocabulary and resources. European Geoscience Registry is available for reuse and extension by other geoscientific projects.
During the EPOS project, this approach has been developed and implemented for the Borehole and Model data services. TCS GIM team provided feedback on INSPIRE through the Earth Science Cluster, contributed to the creation of the OGC GeoScience Domain Working Group in 2017, the launch of the OGC Borehole Interoperability Experiment in 2018, and proposed evolutions to the OGC GeoSciML and IUGS/CGI EarthResourceML standards.
How to cite: Urvois, M., Grellet, S., Feliachi, A., Lorenz, H., Haener, R., Brogaard Pedersen, C., Hansen, M., Guerrieri, L., Cipolloni, C., and Carter, M.: Open access to geological information and 3D modelling data sets in the European Plate Observing System platform (EPOS), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5131, https://doi.org/10.5194/egusphere-egu2020-5131, 2020.
EGU2020-6324 | Displays | ESSI2.1
A classification and predictive model of the complex REE mineral systemHassan Babaie and Armita Davarpanah
We model the intermittent, non-linear interactions and feedback loops of the complex rare earth elements (REE) mineral system applying the self-organized criticality concept. Our semantic knowledge model (REE_MinSys ontology) represents dynamic primary and secondary processes that occur over a wide range of spatial and temporal scales and produce the emergent REE deposits and their geometry, tonnage, and grade. These include the scale-invariant, out-of-equilibrium geodynamic and magmatic processes that lead to the formation of orthomagmatic (carbonatite, alkaline igneous rocks) and syn- and post-magmatic hydrothermal REE deposits. The ontology also represents the redistribution of the REE from these primary ores by metamorphic fluids and/or post-depositional surface and supergene processes in sedimentary basins, fluvial channels, coast areas, and/or regolith around or above them. The ontology applies concepts of the complex systems theory to represent the spatial and spatio-temporal elements of the REE mineral system such as source, driver, threshold barriers, trigger, avalanche, conduit, relaxation, critical point attractor, and self-organization for the alkaline igneous, Iron oxide (subcategory of IOCG), hydrothermal, marine placers, alluvial placers (including paleo-placers), phosphorite, laterite, and ion-adsorption clays REE deposits. The ontology is instantiated with diverse data drawn from globally-distributed types of well-studied small to giant REE deposits to build the REE_MinSys knowledge base. Users can query the data in the knowledge base to extract explicit and inferred facts in each type of REE deposit, for example by asking: “Which rare earth elements are in REE phosphate deposits?”; “Which rare earth elements are largely explored in REE placer deposits?” Data from the knowledge base will be divided into training and testing sets after they are preprocessed and trends and data patterns are identified through data analytical procedures. The training and test datasets will be used to build models applying machine learning algorithms to predict potential REE deposits of different kinds in unexposed or covered areas.
How to cite: Babaie, H. and Davarpanah, A.: A classification and predictive model of the complex REE mineral system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6324, https://doi.org/10.5194/egusphere-egu2020-6324, 2020.
We model the intermittent, non-linear interactions and feedback loops of the complex rare earth elements (REE) mineral system applying the self-organized criticality concept. Our semantic knowledge model (REE_MinSys ontology) represents dynamic primary and secondary processes that occur over a wide range of spatial and temporal scales and produce the emergent REE deposits and their geometry, tonnage, and grade. These include the scale-invariant, out-of-equilibrium geodynamic and magmatic processes that lead to the formation of orthomagmatic (carbonatite, alkaline igneous rocks) and syn- and post-magmatic hydrothermal REE deposits. The ontology also represents the redistribution of the REE from these primary ores by metamorphic fluids and/or post-depositional surface and supergene processes in sedimentary basins, fluvial channels, coast areas, and/or regolith around or above them. The ontology applies concepts of the complex systems theory to represent the spatial and spatio-temporal elements of the REE mineral system such as source, driver, threshold barriers, trigger, avalanche, conduit, relaxation, critical point attractor, and self-organization for the alkaline igneous, Iron oxide (subcategory of IOCG), hydrothermal, marine placers, alluvial placers (including paleo-placers), phosphorite, laterite, and ion-adsorption clays REE deposits. The ontology is instantiated with diverse data drawn from globally-distributed types of well-studied small to giant REE deposits to build the REE_MinSys knowledge base. Users can query the data in the knowledge base to extract explicit and inferred facts in each type of REE deposit, for example by asking: “Which rare earth elements are in REE phosphate deposits?”; “Which rare earth elements are largely explored in REE placer deposits?” Data from the knowledge base will be divided into training and testing sets after they are preprocessed and trends and data patterns are identified through data analytical procedures. The training and test datasets will be used to build models applying machine learning algorithms to predict potential REE deposits of different kinds in unexposed or covered areas.
How to cite: Babaie, H. and Davarpanah, A.: A classification and predictive model of the complex REE mineral system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6324, https://doi.org/10.5194/egusphere-egu2020-6324, 2020.
EGU2020-7058 | Displays | ESSI2.1
Enabling Data Reuse Through Semantic Enrichment of InstrumentationRobert Huber, Anusuriya Devaraju, Michael Diepenbroek, Uwe Schindler, Roland Koppe, Tina Dohna, Egor Gordeev, and Marianne Rehage
Pressing environmental and societal challenges demand the reuse of data on a much larger scale. Central to improvements on this front are approaches that support structured and detailed data descriptions of published data. In general, the reusability of scientific datasets such as measurements generated by instruments, observations collected in the field, and model simulation outputs, require information about the contexts through which they were produced. These contexts include the instrumentation, methods, and analysis software used. In current data curation practice, data providers often put a significant effort in capturing descriptive metadata about datasets. Nonetheless, metadata about instruments and methods provided by data authors are limited, and in most cases are unstructured.
The ‘Interoperability’ principle of FAIR emphasizes the importance of using formal vocabularies to enable machine-understandability of data and metadata, and establishing links between data and related research entities to provide their contextual information (e.g., devices and methods). To support FAIR data, PANGAEA is currently elaborating workflows to enrich instrument information of scientific datasets utilizing internal as well as third party services and ontologies and their identifiers. This abstract presents our ongoing development within the projects FREYA and FAIRsFAIR as follows:
- Integrating the AWI O2A (Observations to Archives) framework and associated suite of tools within PANGAEA’s curatorial workflow as well as semi-automatized ingestion of observatory data.
- Linking data with their observation sources (devices) by recording the persistent identifiers (PID) from the O2A sensor registry system (sensor.awi.de) as part of the PANGAEA instrumentation database.
- Enriching device and method descriptions of scientific data by annotating them with appropriate vocabularies such as the NERC device type and device vocabularies or scientific methodology classifications.
In our contribution we will also outline the challenges to be addressed in enabling FAIR vocabularies of instruments and methods. This includes questions regarding reliability and trustworthiness of third party ontologies and services. Further, challenges in content synchronisation across linked resources and implications on FAIRness levels of data sets such as dependencies on interlinked data sources and vocabularies.
We will show in how far adapting, harmonizing and controlling the used vocabularies, as well as identifier systems between data provider and data publisher, improves the findability and re-usability of datasets , while keeping the curational overhead a slow as possible. This use case is a valuable example of how improving interoperability through harmonization efforts, though initially problematic and labor intensive, can benefits to a multitude of stakeholders in the long run: data users, publishers, research institutes, and funders.
How to cite: Huber, R., Devaraju, A., Diepenbroek, M., Schindler, U., Koppe, R., Dohna, T., Gordeev, E., and Rehage, M.: Enabling Data Reuse Through Semantic Enrichment of Instrumentation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7058, https://doi.org/10.5194/egusphere-egu2020-7058, 2020.
Pressing environmental and societal challenges demand the reuse of data on a much larger scale. Central to improvements on this front are approaches that support structured and detailed data descriptions of published data. In general, the reusability of scientific datasets such as measurements generated by instruments, observations collected in the field, and model simulation outputs, require information about the contexts through which they were produced. These contexts include the instrumentation, methods, and analysis software used. In current data curation practice, data providers often put a significant effort in capturing descriptive metadata about datasets. Nonetheless, metadata about instruments and methods provided by data authors are limited, and in most cases are unstructured.
The ‘Interoperability’ principle of FAIR emphasizes the importance of using formal vocabularies to enable machine-understandability of data and metadata, and establishing links between data and related research entities to provide their contextual information (e.g., devices and methods). To support FAIR data, PANGAEA is currently elaborating workflows to enrich instrument information of scientific datasets utilizing internal as well as third party services and ontologies and their identifiers. This abstract presents our ongoing development within the projects FREYA and FAIRsFAIR as follows:
- Integrating the AWI O2A (Observations to Archives) framework and associated suite of tools within PANGAEA’s curatorial workflow as well as semi-automatized ingestion of observatory data.
- Linking data with their observation sources (devices) by recording the persistent identifiers (PID) from the O2A sensor registry system (sensor.awi.de) as part of the PANGAEA instrumentation database.
- Enriching device and method descriptions of scientific data by annotating them with appropriate vocabularies such as the NERC device type and device vocabularies or scientific methodology classifications.
In our contribution we will also outline the challenges to be addressed in enabling FAIR vocabularies of instruments and methods. This includes questions regarding reliability and trustworthiness of third party ontologies and services. Further, challenges in content synchronisation across linked resources and implications on FAIRness levels of data sets such as dependencies on interlinked data sources and vocabularies.
We will show in how far adapting, harmonizing and controlling the used vocabularies, as well as identifier systems between data provider and data publisher, improves the findability and re-usability of datasets , while keeping the curational overhead a slow as possible. This use case is a valuable example of how improving interoperability through harmonization efforts, though initially problematic and labor intensive, can benefits to a multitude of stakeholders in the long run: data users, publishers, research institutes, and funders.
How to cite: Huber, R., Devaraju, A., Diepenbroek, M., Schindler, U., Koppe, R., Dohna, T., Gordeev, E., and Rehage, M.: Enabling Data Reuse Through Semantic Enrichment of Instrumentation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7058, https://doi.org/10.5194/egusphere-egu2020-7058, 2020.
EGU2020-7937 | Displays | ESSI2.1
A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental DatasetsAlexander Götz, Johannes Munke, Mohamad Hayek, Hai Nguyen, Tobias Weber, Stephan Hachinger, and Jens Weismüller
LTDS ("Let the Data Sing") is a lightweight, microservice-based Research Data Management (RDM) architecture which augments previously isolated data stores ("data silos") with FAIR research data repositories. The core components of LTDS include a metadata store as well as dissemination services such as a landing page generator and an OAI-PMH server. As these core components were designed to be independent from one another, a central control system has been implemented, which handles data flows between components. LTDS is developed at LRZ (Leibniz Supercomputing Centre, Garching, Germany), with the aim of allowing researchers to make massive amounts of data (e.g. HPC simulation results) on different storage backends FAIR. Such data can often, owing to their size, not easily be transferred into conventional repositories. As a result, they remain "hidden", while only e.g. final results are published - a massive problem for reproducibility of simulation-based science. The LTDS architecture uses open-source and standardized components and follows best practices in FAIR data (and metadata) handling. We present our experience with our first three use cases: the Alpine Environmental Data Analysis Centre (AlpEnDAC) platform, the ClimEx dataset with 400TB of climate ensemble simulation data, and the Virtual Water Value (ViWA) hydrological model ensemble.
How to cite: Götz, A., Munke, J., Hayek, M., Nguyen, H., Weber, T., Hachinger, S., and Weismüller, J.: A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental Datasets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7937, https://doi.org/10.5194/egusphere-egu2020-7937, 2020.
LTDS ("Let the Data Sing") is a lightweight, microservice-based Research Data Management (RDM) architecture which augments previously isolated data stores ("data silos") with FAIR research data repositories. The core components of LTDS include a metadata store as well as dissemination services such as a landing page generator and an OAI-PMH server. As these core components were designed to be independent from one another, a central control system has been implemented, which handles data flows between components. LTDS is developed at LRZ (Leibniz Supercomputing Centre, Garching, Germany), with the aim of allowing researchers to make massive amounts of data (e.g. HPC simulation results) on different storage backends FAIR. Such data can often, owing to their size, not easily be transferred into conventional repositories. As a result, they remain "hidden", while only e.g. final results are published - a massive problem for reproducibility of simulation-based science. The LTDS architecture uses open-source and standardized components and follows best practices in FAIR data (and metadata) handling. We present our experience with our first three use cases: the Alpine Environmental Data Analysis Centre (AlpEnDAC) platform, the ClimEx dataset with 400TB of climate ensemble simulation data, and the Virtual Water Value (ViWA) hydrological model ensemble.
How to cite: Götz, A., Munke, J., Hayek, M., Nguyen, H., Weber, T., Hachinger, S., and Weismüller, J.: A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental Datasets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7937, https://doi.org/10.5194/egusphere-egu2020-7937, 2020.
EGU2020-9750 | Displays | ESSI2.1
A modular approach to cataloguing oceanographic dataAdam Leadbetter, Andrew Conway, Sarah Flynn, Tara Keena, Will Meaney, Elizabeth Tray, and Rob Thomas
The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. Therefore, in the sphere of oceanographic data management, the need for a modular approach to data cataloguing which is designed to meet a number of requirements can be clearly seen. In this paper we describe a data cataloguing system developed at and in use at the Marine Institute, Ireland to meet the needs of legislative requirements including the European Spatial Data Infrastructure (INSPIRE) and the Marine Spatial Planning directive.
The data catalogue described here makes use of a metadata model focussed on oceanographic-domain. It comprises a number of key classes which will be described in detail in the paper, but which include:
- Dataset - combine many different parameters, collected at multiple times and locations, using different instruments
- Dataset Collection - provides a link between a Dataset Collection Activity and a Dataset, as well as linking to the Device(s) used to sample the environment for a given range of parameters. An example of a Dataset Collection may be the Conductivity-Temperature-Depth profiles taken on a research vessel survey allowing the individual sensors to be connected to the activity and the calibration of those sensors to be connected with the associated measurements.
- Dataset Collection Activity - a specialised dataset to cover such activities as research vessel cruises; or the deployments of moored buoys at specific locations for given time periods
- Platform - an entity from which observations may be made, such as a research vessel or a satellite
- Programme - represents a formally recognized scientific effort receiving significant funding, requiring large scale coordination
- Device - aimed at providing enough metadata for a given instance of an instrument to provide a skeleton SensorML record
- Organisation - captures the details of research institutes, data holding centres, monitoring agencies, governmental and private organisations, that are in one way or another engaged in oceanographic and marine research activities, data & information management and/or data acquisition activities
The data model makes extensive use of controlled vocabularies to ensure both consistency and interoperability in the content of attribute fields for the Classes outlined above.
The data model has been implemented in a module for the Drupal open-source web content management system, and the paper will provide details of this application.
How to cite: Leadbetter, A., Conway, A., Flynn, S., Keena, T., Meaney, W., Tray, E., and Thomas, R.: A modular approach to cataloguing oceanographic data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9750, https://doi.org/10.5194/egusphere-egu2020-9750, 2020.
The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. Therefore, in the sphere of oceanographic data management, the need for a modular approach to data cataloguing which is designed to meet a number of requirements can be clearly seen. In this paper we describe a data cataloguing system developed at and in use at the Marine Institute, Ireland to meet the needs of legislative requirements including the European Spatial Data Infrastructure (INSPIRE) and the Marine Spatial Planning directive.
The data catalogue described here makes use of a metadata model focussed on oceanographic-domain. It comprises a number of key classes which will be described in detail in the paper, but which include:
- Dataset - combine many different parameters, collected at multiple times and locations, using different instruments
- Dataset Collection - provides a link between a Dataset Collection Activity and a Dataset, as well as linking to the Device(s) used to sample the environment for a given range of parameters. An example of a Dataset Collection may be the Conductivity-Temperature-Depth profiles taken on a research vessel survey allowing the individual sensors to be connected to the activity and the calibration of those sensors to be connected with the associated measurements.
- Dataset Collection Activity - a specialised dataset to cover such activities as research vessel cruises; or the deployments of moored buoys at specific locations for given time periods
- Platform - an entity from which observations may be made, such as a research vessel or a satellite
- Programme - represents a formally recognized scientific effort receiving significant funding, requiring large scale coordination
- Device - aimed at providing enough metadata for a given instance of an instrument to provide a skeleton SensorML record
- Organisation - captures the details of research institutes, data holding centres, monitoring agencies, governmental and private organisations, that are in one way or another engaged in oceanographic and marine research activities, data & information management and/or data acquisition activities
The data model makes extensive use of controlled vocabularies to ensure both consistency and interoperability in the content of attribute fields for the Classes outlined above.
The data model has been implemented in a module for the Drupal open-source web content management system, and the paper will provide details of this application.
How to cite: Leadbetter, A., Conway, A., Flynn, S., Keena, T., Meaney, W., Tray, E., and Thomas, R.: A modular approach to cataloguing oceanographic data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9750, https://doi.org/10.5194/egusphere-egu2020-9750, 2020.
EGU2020-12281 | Displays | ESSI2.1
Modern Scientific Metadata Management: Atmospheric Radiation Measurement (ARM) Facility Data CenterMaggie Davis, Richard Cederwall, Giri Prakash, and Ranjeet Devarakonda
Atmospheric Radiation Measurement (ARM), a U.S. Department of Energy (DOE) scientific user facility, is a key geophysical data source for national and international climate research. Utilizing a standardized schema that has evolved since ARM inception in 1989, the ARM Data Center (ADC) processes over 1.8 petabytes of stored data across over 10,000 data products. Data sources include ARM-owned instruments, as well as field campaign datasets, Value Added Products, evaluation data to test new instrumentation or models, Principal Investigator data products, and external data products (e.g., NASA satellite data). In line with FAIR principles, a team of metadata experts classifies instruments and defines spatial and temporal metadata to ensure accessibility through the ARM Data Discovery. To enhance geophysical metadata collaboration across American and European organizations, this work will summarize processes and tools which enable the management of ARM data and metadata. For example, this presentation will highlight recent enhancements in-field campaign metadata workflows to handle the ongoing Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC) data. Other key elements of ARM data center include: the architecture of ARM data transfer and storage processes, evaluation of data quality, ARM consolidated databases. We will also discuss tools developed for identifying and recommending datastreams and enhanced DOI assignments for all data types to assist an interdisciplinary user base in selecting, obtaining, and using data as well as citing the appropriate data source for reproducible atmospheric and climate research.
How to cite: Davis, M., Cederwall, R., Prakash, G., and Devarakonda, R.: Modern Scientific Metadata Management: Atmospheric Radiation Measurement (ARM) Facility Data Center , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12281, https://doi.org/10.5194/egusphere-egu2020-12281, 2020.
Atmospheric Radiation Measurement (ARM), a U.S. Department of Energy (DOE) scientific user facility, is a key geophysical data source for national and international climate research. Utilizing a standardized schema that has evolved since ARM inception in 1989, the ARM Data Center (ADC) processes over 1.8 petabytes of stored data across over 10,000 data products. Data sources include ARM-owned instruments, as well as field campaign datasets, Value Added Products, evaluation data to test new instrumentation or models, Principal Investigator data products, and external data products (e.g., NASA satellite data). In line with FAIR principles, a team of metadata experts classifies instruments and defines spatial and temporal metadata to ensure accessibility through the ARM Data Discovery. To enhance geophysical metadata collaboration across American and European organizations, this work will summarize processes and tools which enable the management of ARM data and metadata. For example, this presentation will highlight recent enhancements in-field campaign metadata workflows to handle the ongoing Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC) data. Other key elements of ARM data center include: the architecture of ARM data transfer and storage processes, evaluation of data quality, ARM consolidated databases. We will also discuss tools developed for identifying and recommending datastreams and enhanced DOI assignments for all data types to assist an interdisciplinary user base in selecting, obtaining, and using data as well as citing the appropriate data source for reproducible atmospheric and climate research.
How to cite: Davis, M., Cederwall, R., Prakash, G., and Devarakonda, R.: Modern Scientific Metadata Management: Atmospheric Radiation Measurement (ARM) Facility Data Center , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12281, https://doi.org/10.5194/egusphere-egu2020-12281, 2020.
EGU2020-14755 | Displays | ESSI2.1
WMO Hydrological Observing System (WHOS) broker: implementation progress and outcomesEnrico Boldrini, Paolo Mazzetti, Stefano Nativi, Mattia Santoro, Fabrizio Papeschi, Roberto Roncella, Massimiliano Olivieri, Fabio Bordini, and Silvano Pecora
The WMO Hydrological Observing System (WHOS) is a service-oriented System of Systems (SoS) linking hydrological data providers and users by enabling harmonized and real time discovery and access functionalities at global, regional, national and local scale. WHOS is being realized through a coordinated and collaborative effort amongst:
- National Hydrological Services (NHS) willing to publish their data to the benefit of a larger audience,
- Hydrologists, decision makers, app and portal authors willing to gain access to world-wide hydrological data,
- ESSI-Lab of CNR-IIA responsible for the WHOS broker component: a software framework in charge of enabling interoperability amongst the distributed heterogeneous systems belonging to data providers (e.g. data publishing services) and data consumers (e.g. web portals, libraries and apps),
- WMO Commission of Hydrology (CHy) providing guidance to WMO Member countries in operational hydrology, including capacity building, NHSs engagement and coordination of WHOS implementation.
In the last years two additional WMO regional programmes have been targeted to benefit from WHOS, operating as successful applications for others to follow:
- Plata river basin,
- Arctic-HYCOS.
Each programme operates with a “view” of the whole WHOS, a virtual subset composed only by the data sources that are relevant to its context.
WHOS-Plata is currently brokering data sources from the following countries:
- Argentina (hydrological & meteorological data),
- Bolivia (meteorological data; hydrological data expected in the near future),
- Brazil (hydrological & meteorological data),
- Paraguay (meteorological data; hydrological data in process),
- Uruguay (hydrological & meteorological data).
WHOS-Arctic is currently brokering data sources from the following countries:
- Canada (historical and real time data),
- Denmark (historical data),
- Finland (historical and real time data),
- Iceland (historical and real time data),
- Norway (historical and real time data),
- Russian (historical and real time data),
- United States (historical and real time data).
Each data source publishes its data online according to specific hydrological service protocols and/or APIs (e.g. CUAHSI HydroServer, USGS Water Services, FTP, SOAP, REST API, OData, WAF, OGC SOS, …). Each service protocol and API in turn implies support for a specific metadata and data model (e.g. WaterML, CSV, XML , JSON, USGS RDB, ZRXP, Observations & Measurements, …).
WHOS broker implements mediation and harmonization of all these heterogeneous standards, in order to seamlessly support discovery and access of all the available data to a growing set of data consumer systems (applications and libraries) without any implementation effort for them:
- 52North Helgoland (through SOS v.2.0.0),
- CUAHSI HydroDesktop (through CUAHSI WaterOneFlow),
- National Water Institute of Argentina (INA) node.js WaterML client (through CUAHSI WaterOneFlow),
- DAB JS API (through DAB REST API),
- USGS GWIS JS API plotting library (through RDB service),
- R scripts (through R WaterML library),
- C# applications (through CUAHSI WaterOneFlow),
- UCAR jOAI (through OAI-PMH/WIGOS metadata).
In particular, the support of WIGOS metadata standard provides a set of observational metadata elements for the effective interpretation of observational data internationally.
In addition to metadata and data model heterogeneity, WHOS needs to tackle also semantics heterogeneity. WHOS broker makes use of a hydrology ontology (made available as a SPARQL endpoint) to augment WHOS discovery capabilities (e.g. to obtain translation of a hydrology search parameter in multiple languages).
Technical documentation to exercise WHOS broker is already online available, while the official public launch with a dedicated WMO WHOS web portal is expected shortly.
How to cite: Boldrini, E., Mazzetti, P., Nativi, S., Santoro, M., Papeschi, F., Roncella, R., Olivieri, M., Bordini, F., and Pecora, S.: WMO Hydrological Observing System (WHOS) broker: implementation progress and outcomes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14755, https://doi.org/10.5194/egusphere-egu2020-14755, 2020.
The WMO Hydrological Observing System (WHOS) is a service-oriented System of Systems (SoS) linking hydrological data providers and users by enabling harmonized and real time discovery and access functionalities at global, regional, national and local scale. WHOS is being realized through a coordinated and collaborative effort amongst:
- National Hydrological Services (NHS) willing to publish their data to the benefit of a larger audience,
- Hydrologists, decision makers, app and portal authors willing to gain access to world-wide hydrological data,
- ESSI-Lab of CNR-IIA responsible for the WHOS broker component: a software framework in charge of enabling interoperability amongst the distributed heterogeneous systems belonging to data providers (e.g. data publishing services) and data consumers (e.g. web portals, libraries and apps),
- WMO Commission of Hydrology (CHy) providing guidance to WMO Member countries in operational hydrology, including capacity building, NHSs engagement and coordination of WHOS implementation.
In the last years two additional WMO regional programmes have been targeted to benefit from WHOS, operating as successful applications for others to follow:
- Plata river basin,
- Arctic-HYCOS.
Each programme operates with a “view” of the whole WHOS, a virtual subset composed only by the data sources that are relevant to its context.
WHOS-Plata is currently brokering data sources from the following countries:
- Argentina (hydrological & meteorological data),
- Bolivia (meteorological data; hydrological data expected in the near future),
- Brazil (hydrological & meteorological data),
- Paraguay (meteorological data; hydrological data in process),
- Uruguay (hydrological & meteorological data).
WHOS-Arctic is currently brokering data sources from the following countries:
- Canada (historical and real time data),
- Denmark (historical data),
- Finland (historical and real time data),
- Iceland (historical and real time data),
- Norway (historical and real time data),
- Russian (historical and real time data),
- United States (historical and real time data).
Each data source publishes its data online according to specific hydrological service protocols and/or APIs (e.g. CUAHSI HydroServer, USGS Water Services, FTP, SOAP, REST API, OData, WAF, OGC SOS, …). Each service protocol and API in turn implies support for a specific metadata and data model (e.g. WaterML, CSV, XML , JSON, USGS RDB, ZRXP, Observations & Measurements, …).
WHOS broker implements mediation and harmonization of all these heterogeneous standards, in order to seamlessly support discovery and access of all the available data to a growing set of data consumer systems (applications and libraries) without any implementation effort for them:
- 52North Helgoland (through SOS v.2.0.0),
- CUAHSI HydroDesktop (through CUAHSI WaterOneFlow),
- National Water Institute of Argentina (INA) node.js WaterML client (through CUAHSI WaterOneFlow),
- DAB JS API (through DAB REST API),
- USGS GWIS JS API plotting library (through RDB service),
- R scripts (through R WaterML library),
- C# applications (through CUAHSI WaterOneFlow),
- UCAR jOAI (through OAI-PMH/WIGOS metadata).
In particular, the support of WIGOS metadata standard provides a set of observational metadata elements for the effective interpretation of observational data internationally.
In addition to metadata and data model heterogeneity, WHOS needs to tackle also semantics heterogeneity. WHOS broker makes use of a hydrology ontology (made available as a SPARQL endpoint) to augment WHOS discovery capabilities (e.g. to obtain translation of a hydrology search parameter in multiple languages).
Technical documentation to exercise WHOS broker is already online available, while the official public launch with a dedicated WMO WHOS web portal is expected shortly.
How to cite: Boldrini, E., Mazzetti, P., Nativi, S., Santoro, M., Papeschi, F., Roncella, R., Olivieri, M., Bordini, F., and Pecora, S.: WMO Hydrological Observing System (WHOS) broker: implementation progress and outcomes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14755, https://doi.org/10.5194/egusphere-egu2020-14755, 2020.
EGU2020-15226 | Displays | ESSI2.1
OntoGeonous-GS: Implementation of an ontology for the geologic structures from the IUGS CGI and INSPIRE standardsAlizia Mantovani, Vincenzo Lombardo, and Fabrizio Piana
This contribution regards the encoding of an ontology for the GeologicStructure class. This is one of the sections of OntoGeonous, a bigger ontology for the geosciences principally devoted to the representation of the knowledge contained in the geological maps; the others regard the Geologic unit, Geomorphologic feature and Geologic event. OntoGeonous is developed by the University of Turin, Department of Computer Sciences, and the Institute of Geosciences and Earth Resources of the National Research Council of Italy (CNR-IGG).
The encoding of the knowledge is based on the definitions and hierarchical organization of the concepts proposed by the international standard: GeoScienceML directive(1) and INSPIRE Data Specification on Geology(2) drive the architecture at more general levels, while the broader/narrower representation by CGI vocabularies(3) provide the internal taxonomies of the specific sub-ontologies.
The first release of OntoGeonous had a complete hierarchy for the GeologicUnit class, which is partly different from the organization of knowledge of the international standard, and taxonomies for GeologicStructure, GeologicEvent and GeomorphologicFeature. The encoding process of OntoGeonous is presented in Lombardo et al. (2018) and in the WikiGeo website(4), while a method of application to the geological maps is presented in Mantovani et al (2020).
This contribution shows how the international standard guided the encoding of the sub-ontology for the GeologicStructure and the innovations introduced in the general organization of OntoGeonous compared to the OntoGeonous first release. The main differences come from the analysis of the UML schemata for the GeologicStructure subclasses(5): first, the presence of the FoldSystem class inspired the creation of more general class for the associations of features; second, the attempt to describe the NonDirectionalStructure class made us group all the remaining classes into a new class with opposite characteristics. Similar modification have been made all over the GeologicStructure ontology.
Our intent is to improve the formal description of geological knowledge in order to practically support the use of ontology-driven data model in the geological mapping task.
Refereces
Lombardo, V., Piana, F., Mimmo, D. (2018). Semantics–informed geological maps: Conceptual modelling and knowledge encoding. Computers & Geosciences. 116. 10.1016/j.cageo.2018.04.001.
Mantovani, A., Lombardo, V., Piana, F. (2020). Ontology-driven representation of knowledge for geological maps. (Submitted)
(1) http://www.geosciml.org.
(2) http://inspire.jrc.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_GE_v3.0.pdf
(3) http://resource.geosciml.org/def/voc/
(4) https://www.di.unito.it/wikigeo/index.php?title=Pagina_principale
(5) http://www.geosciml.org/doc/geosciml/4.1/documentation/html/EARoot/EA1/EA1/EA4/EA4/EA356.htm
How to cite: Mantovani, A., Lombardo, V., and Piana, F.: OntoGeonous-GS: Implementation of an ontology for the geologic structures from the IUGS CGI and INSPIRE standards, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15226, https://doi.org/10.5194/egusphere-egu2020-15226, 2020.
This contribution regards the encoding of an ontology for the GeologicStructure class. This is one of the sections of OntoGeonous, a bigger ontology for the geosciences principally devoted to the representation of the knowledge contained in the geological maps; the others regard the Geologic unit, Geomorphologic feature and Geologic event. OntoGeonous is developed by the University of Turin, Department of Computer Sciences, and the Institute of Geosciences and Earth Resources of the National Research Council of Italy (CNR-IGG).
The encoding of the knowledge is based on the definitions and hierarchical organization of the concepts proposed by the international standard: GeoScienceML directive(1) and INSPIRE Data Specification on Geology(2) drive the architecture at more general levels, while the broader/narrower representation by CGI vocabularies(3) provide the internal taxonomies of the specific sub-ontologies.
The first release of OntoGeonous had a complete hierarchy for the GeologicUnit class, which is partly different from the organization of knowledge of the international standard, and taxonomies for GeologicStructure, GeologicEvent and GeomorphologicFeature. The encoding process of OntoGeonous is presented in Lombardo et al. (2018) and in the WikiGeo website(4), while a method of application to the geological maps is presented in Mantovani et al (2020).
This contribution shows how the international standard guided the encoding of the sub-ontology for the GeologicStructure and the innovations introduced in the general organization of OntoGeonous compared to the OntoGeonous first release. The main differences come from the analysis of the UML schemata for the GeologicStructure subclasses(5): first, the presence of the FoldSystem class inspired the creation of more general class for the associations of features; second, the attempt to describe the NonDirectionalStructure class made us group all the remaining classes into a new class with opposite characteristics. Similar modification have been made all over the GeologicStructure ontology.
Our intent is to improve the formal description of geological knowledge in order to practically support the use of ontology-driven data model in the geological mapping task.
Refereces
Lombardo, V., Piana, F., Mimmo, D. (2018). Semantics–informed geological maps: Conceptual modelling and knowledge encoding. Computers & Geosciences. 116. 10.1016/j.cageo.2018.04.001.
Mantovani, A., Lombardo, V., Piana, F. (2020). Ontology-driven representation of knowledge for geological maps. (Submitted)
(1) http://www.geosciml.org.
(2) http://inspire.jrc.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_GE_v3.0.pdf
(3) http://resource.geosciml.org/def/voc/
(4) https://www.di.unito.it/wikigeo/index.php?title=Pagina_principale
(5) http://www.geosciml.org/doc/geosciml/4.1/documentation/html/EARoot/EA1/EA1/EA4/EA4/EA356.htm
How to cite: Mantovani, A., Lombardo, V., and Piana, F.: OntoGeonous-GS: Implementation of an ontology for the geologic structures from the IUGS CGI and INSPIRE standards, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15226, https://doi.org/10.5194/egusphere-egu2020-15226, 2020.
EGU2020-15466 | Displays | ESSI2.1
Using standards to model delayed mode sensor processesAlexandra Kokkinaki, Justin Buck, Emma Slater, Julie Collins, Raymond Cramer, and Louise Darroch
Ocean data are expensive to collect. Data reuse saves time and accelerates the pace of scientific discovery. For data to be re-usable the FAIR principles reassert the need for rich metadata and documentation that meet relevant community standards and provide information about provenance.
Approaches on sensor observations, are often inadequate at meeting FAIR; prescriptive with a limited set of attributes, while providing little or no provision for really important metadata about sensor observations later in the data lifecycle.
As part of the EU ENVRIplus project, our work aimed at capturing the delayed mode, data curation process taking place at the National Oceanography Centre’s British Oceanography Data Centre (BODC). Our solution uses Unique URIs, OGC SWE standards and controlled vocabularies, commencing from the submitted originators input and ending by the archived and published dataset.
The BODC delayed mode process is an example of a physical system that is composed of several components like sensors and other computations processes such as an algorithm to compute salinity or absolute winds. All components are described in sensorML identified by unique URIs and associated with the relevant datastreams, which in turn are exposed on the web via ERDDAP using unique URIs.
In this paper we intend to share our experience in using OGC standards and ERDDAP to model the above mentioned process and publish the associated datasets in a unified way. The benefits attained, allow greater automation of data transferring, easy access to large volumes of data from a chosen sensor, more precise capturing of data provenance, standardization, and pave the way towards greater FAIRness of the sensor data and metadata, focusing on the delayed mode processing.
How to cite: Kokkinaki, A., Buck, J., Slater, E., Collins, J., Cramer, R., and Darroch, L.: Using standards to model delayed mode sensor processes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15466, https://doi.org/10.5194/egusphere-egu2020-15466, 2020.
Ocean data are expensive to collect. Data reuse saves time and accelerates the pace of scientific discovery. For data to be re-usable the FAIR principles reassert the need for rich metadata and documentation that meet relevant community standards and provide information about provenance.
Approaches on sensor observations, are often inadequate at meeting FAIR; prescriptive with a limited set of attributes, while providing little or no provision for really important metadata about sensor observations later in the data lifecycle.
As part of the EU ENVRIplus project, our work aimed at capturing the delayed mode, data curation process taking place at the National Oceanography Centre’s British Oceanography Data Centre (BODC). Our solution uses Unique URIs, OGC SWE standards and controlled vocabularies, commencing from the submitted originators input and ending by the archived and published dataset.
The BODC delayed mode process is an example of a physical system that is composed of several components like sensors and other computations processes such as an algorithm to compute salinity or absolute winds. All components are described in sensorML identified by unique URIs and associated with the relevant datastreams, which in turn are exposed on the web via ERDDAP using unique URIs.
In this paper we intend to share our experience in using OGC standards and ERDDAP to model the above mentioned process and publish the associated datasets in a unified way. The benefits attained, allow greater automation of data transferring, easy access to large volumes of data from a chosen sensor, more precise capturing of data provenance, standardization, and pave the way towards greater FAIRness of the sensor data and metadata, focusing on the delayed mode processing.
How to cite: Kokkinaki, A., Buck, J., Slater, E., Collins, J., Cramer, R., and Darroch, L.: Using standards to model delayed mode sensor processes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15466, https://doi.org/10.5194/egusphere-egu2020-15466, 2020.
EGU2020-18848 | Displays | ESSI2.1
Continuous ocean monitoring from sensor arrays on the UK large research vesselsLouise Darroch, Juan Ward, Alexander Tate, and Justin Buck
More than 40% of the human population live within 100 km of the sea. Many of these communities intimately rely on the oceans for their food, climate and economy. However, the oceans are increasingly being adversely affected by human-driven activities such as climate change and pollution. Many targeted, marine monitoring programmes (e.g. GOSHIP, OceanSITES) and pioneering observing technologies (e.g. autonomous underwater vehicles, Argo floats) are being used to assess the impact humans are having on our oceans. Such activities and platforms are deployed, calibrated and serviced by state-of-the-art research ships, multimillion-pound floating laboratories which operate diverse arrays of high-powered, high-resolution sensors around-the-clock (e.g. sea-floor depth, weather, ocean current velocity and hydrography etc.). These sensors, coupled with event and environmental metadata provided by the ships logs and crew, are essential for understanding the wider context of the science they support, as well as directly contributing to crucial scientific understanding of the marine environment and key strategic policies (e.g. United Nation’s Sustainable Development Goal 14). However, despite their high scientific value and cost, these data streams are not routinely brought together from UK large research vessels in coordinated, reliable and accessible ways that are fundamental to ensuring user trust in the data and any products generated from the data.
The National Oceanography Centre (NOC) and British Antarctic Survey (BAS) are currently working together to improve the integrity of the data management workflow from sensor systems to end-users across the UK National Environment Research Council (NERC) large research vessel fleet, making cost effective use of vessel time while improving the FAIRness of data from these sensor arrays. The solution is based upon an Application Programming Interface (API) framework with endpoints tailored towards different end-users such as scientists on-board the vessels as well as the public on land. Key features include: Sensor triage using real-time automated monitoring systems, assuring sensors are working correctly and only the best data are output; Standardised digital event logging systems allowing data quality issues to be identified and resolved quickly; Novel open-source, data transport formats that are embedded with well-structured metadata, common standards and provenance information (such as controlled vocabularies and persistent identifiers), reducing ambiguity and enhancing interoperability across platforms; An open-source data processing application that applies quality control to international standards (SAMOS, or IOOS Qartod); Digital notebooks that manage and capture processing applied to data putting data into context; Democratisation and brokering of data through open data APIs (e.g. ERDDAP, Sensor Web Enablement), allowing end-users to discover and access data, layer their own tools or generate products to meet their own needs; Unambiguous provenance that is maintained throughout the data management workflow using instrument persistent identifiers, part of the latest recommendations by the Research Data Alliance (RDA).
Access to universally interoperable oceanic data, with known quality and provenance, will empower a broad range of stakeholder communities, creating opportunities for innovation and impact through data use, re-use and exploitation.
How to cite: Darroch, L., Ward, J., Tate, A., and Buck, J.: Continuous ocean monitoring from sensor arrays on the UK large research vessels, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18848, https://doi.org/10.5194/egusphere-egu2020-18848, 2020.
More than 40% of the human population live within 100 km of the sea. Many of these communities intimately rely on the oceans for their food, climate and economy. However, the oceans are increasingly being adversely affected by human-driven activities such as climate change and pollution. Many targeted, marine monitoring programmes (e.g. GOSHIP, OceanSITES) and pioneering observing technologies (e.g. autonomous underwater vehicles, Argo floats) are being used to assess the impact humans are having on our oceans. Such activities and platforms are deployed, calibrated and serviced by state-of-the-art research ships, multimillion-pound floating laboratories which operate diverse arrays of high-powered, high-resolution sensors around-the-clock (e.g. sea-floor depth, weather, ocean current velocity and hydrography etc.). These sensors, coupled with event and environmental metadata provided by the ships logs and crew, are essential for understanding the wider context of the science they support, as well as directly contributing to crucial scientific understanding of the marine environment and key strategic policies (e.g. United Nation’s Sustainable Development Goal 14). However, despite their high scientific value and cost, these data streams are not routinely brought together from UK large research vessels in coordinated, reliable and accessible ways that are fundamental to ensuring user trust in the data and any products generated from the data.
The National Oceanography Centre (NOC) and British Antarctic Survey (BAS) are currently working together to improve the integrity of the data management workflow from sensor systems to end-users across the UK National Environment Research Council (NERC) large research vessel fleet, making cost effective use of vessel time while improving the FAIRness of data from these sensor arrays. The solution is based upon an Application Programming Interface (API) framework with endpoints tailored towards different end-users such as scientists on-board the vessels as well as the public on land. Key features include: Sensor triage using real-time automated monitoring systems, assuring sensors are working correctly and only the best data are output; Standardised digital event logging systems allowing data quality issues to be identified and resolved quickly; Novel open-source, data transport formats that are embedded with well-structured metadata, common standards and provenance information (such as controlled vocabularies and persistent identifiers), reducing ambiguity and enhancing interoperability across platforms; An open-source data processing application that applies quality control to international standards (SAMOS, or IOOS Qartod); Digital notebooks that manage and capture processing applied to data putting data into context; Democratisation and brokering of data through open data APIs (e.g. ERDDAP, Sensor Web Enablement), allowing end-users to discover and access data, layer their own tools or generate products to meet their own needs; Unambiguous provenance that is maintained throughout the data management workflow using instrument persistent identifiers, part of the latest recommendations by the Research Data Alliance (RDA).
Access to universally interoperable oceanic data, with known quality and provenance, will empower a broad range of stakeholder communities, creating opportunities for innovation and impact through data use, re-use and exploitation.
How to cite: Darroch, L., Ward, J., Tate, A., and Buck, J.: Continuous ocean monitoring from sensor arrays on the UK large research vessels, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18848, https://doi.org/10.5194/egusphere-egu2020-18848, 2020.
EGU2020-19895 | Displays | ESSI2.1
Towards an interoperability framework for observable property terminologiesBarbara Magagna, Gwenaelle Moncoiffe, Anusuriya Devaraju, Pier Luigi Buttigieg, Maria Stoica, and Sirko Schindler
In October 2019, a new working group (InteroperAble Descriptions of Observable Property Terminology or I-ADOPT WG1) officially launched its 18-month workplan under the auspices of the Research Data Alliance (RDA) co-led by ENVRI-FAIR2 project members. The goal of the group is to develop a community-wide, consensus framework for representing observable properties and facilitating semantic mapping between disjoint terminologies used for data annotation. The group has been active for over two years and comprises research communities, data centers, and research infrastructures from environmental sciences. The WG members have been heavily involved in developing or applying terminologies to semantically enrich the descriptions of measured, observed, derived, or computed environmental data. They all recognize the need to enhance interoperability between their efforts through the WG’s activities.
Ongoing activities of the WG include gathering user stories from research communities (Task 1), reviewing related terminologies and current annotation practices (Task 2) and - based on this - defining and iteratively refining requirements for a community-wide semantic interoperability framework (Task 3). Much like a generic blueprint, this framework will be a basis upon which terminology developers can formulate local design patterns while at the same time remaining globally aligned. This framework will assist interoperability between machine-actionable complex property descriptions observed across the environmental sciences, including Earth, space, and biodiversity science. The WG will seek to synthesize well-adopted but still disparate approaches into global best practice recommendations for improved alignment. Furthermore, the framework will help mediate between generic observation standards (O&M3, SSNO4, SensorML5, OBOE6, ..) and current community-led terminologies and annotation practices, fostering harmonized implementations of observable property descriptions. Altogether, the WG’s work will boost the Interoperability component of the FAIR principles (especially principle I3) by encouraging convergence and by enriching the terminologies with qualified references to other resources. We envisage that this will greatly enhance the global effectiveness and scope of tools operating across terminologies. The WG will thus strengthen existing collaborations and build new connections between terminology developers and providers, disciplinary experts, and representatives of scientific data user groups.
In this presentation, we introduce the working group to the EGU community, and invite them to join our efforts. We report the methodology applied, the results from our first three tasks and the first deliverable, namely a catalog of domain-specific terminologies in use in environmental research, which will enable us to systematically compare existing resources for building the interoperability framework.
1https://www.rd-alliance.org/groups/interoperable-descriptions-observable-property-terminology-wg-i-adopt-wg
2https://envri.eu/home-envri-fair/
3https://www.iso.org/standard/32574.html
4https://www.w3.org/TR/vocab-ssn/
5https://www.opengeospatial.org/standards/sensorml
6https://github.com/NCEAS/oboe/
How to cite: Magagna, B., Moncoiffe, G., Devaraju, A., Buttigieg, P. L., Stoica, M., and Schindler, S.: Towards an interoperability framework for observable property terminologies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19895, https://doi.org/10.5194/egusphere-egu2020-19895, 2020.
In October 2019, a new working group (InteroperAble Descriptions of Observable Property Terminology or I-ADOPT WG1) officially launched its 18-month workplan under the auspices of the Research Data Alliance (RDA) co-led by ENVRI-FAIR2 project members. The goal of the group is to develop a community-wide, consensus framework for representing observable properties and facilitating semantic mapping between disjoint terminologies used for data annotation. The group has been active for over two years and comprises research communities, data centers, and research infrastructures from environmental sciences. The WG members have been heavily involved in developing or applying terminologies to semantically enrich the descriptions of measured, observed, derived, or computed environmental data. They all recognize the need to enhance interoperability between their efforts through the WG’s activities.
Ongoing activities of the WG include gathering user stories from research communities (Task 1), reviewing related terminologies and current annotation practices (Task 2) and - based on this - defining and iteratively refining requirements for a community-wide semantic interoperability framework (Task 3). Much like a generic blueprint, this framework will be a basis upon which terminology developers can formulate local design patterns while at the same time remaining globally aligned. This framework will assist interoperability between machine-actionable complex property descriptions observed across the environmental sciences, including Earth, space, and biodiversity science. The WG will seek to synthesize well-adopted but still disparate approaches into global best practice recommendations for improved alignment. Furthermore, the framework will help mediate between generic observation standards (O&M3, SSNO4, SensorML5, OBOE6, ..) and current community-led terminologies and annotation practices, fostering harmonized implementations of observable property descriptions. Altogether, the WG’s work will boost the Interoperability component of the FAIR principles (especially principle I3) by encouraging convergence and by enriching the terminologies with qualified references to other resources. We envisage that this will greatly enhance the global effectiveness and scope of tools operating across terminologies. The WG will thus strengthen existing collaborations and build new connections between terminology developers and providers, disciplinary experts, and representatives of scientific data user groups.
In this presentation, we introduce the working group to the EGU community, and invite them to join our efforts. We report the methodology applied, the results from our first three tasks and the first deliverable, namely a catalog of domain-specific terminologies in use in environmental research, which will enable us to systematically compare existing resources for building the interoperability framework.
1https://www.rd-alliance.org/groups/interoperable-descriptions-observable-property-terminology-wg-i-adopt-wg
2https://envri.eu/home-envri-fair/
3https://www.iso.org/standard/32574.html
4https://www.w3.org/TR/vocab-ssn/
5https://www.opengeospatial.org/standards/sensorml
6https://github.com/NCEAS/oboe/
How to cite: Magagna, B., Moncoiffe, G., Devaraju, A., Buttigieg, P. L., Stoica, M., and Schindler, S.: Towards an interoperability framework for observable property terminologies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19895, https://doi.org/10.5194/egusphere-egu2020-19895, 2020.
EGU2020-20448 | Displays | ESSI2.1
Supporting Users to Find Appropriate Visualizations of Spatio-Temporal Open Data SetsJan Schulte, Laura Helene Zepner, Stephan Mäs, Simon Jirka, and Petra Sauer
Over the last few years, a broad range of open data portals has been set-up. The aim of these portals is to improve the discoverability of open data resources and to strengthen the re-use of data generated by public agencies as well as research activities.
Often, such open data portals offer an immense amount of different types of data that may be relevant for a user. Thus, in order to facilitate the efficient and user-friendly exploration of available data sets, it is essential to visualize the data as quickly and easily as possible. While the visualization of static data sets is already well covered, selecting appropriate visualization approaches for potentially highly-dynamic spatio-temporal data sets is often still a challenge.
Within our contribution, we will introduce a preliminary study conducted by the mVIZ project which is funded by the German Federal Ministry of Transport and Digital Infrastructure as part of the mFUND programm. This project introduces a methodology to support the selection and creation of user-friendly visualizations for data discoverable via the open data portals such as the mCLOUD. During this process, specific consideration are given to properties and metadata of the datasets as input for a decision workflow to suggest appropriate visualization types. A resulting guideline will describe the methodology and serve as a basis for the conception, extension or improvement of visualization tools or for their further development and integration into open data portals.
The project focuses particularly on the creation of an inventory of open spatiotemporal data in open data portals as well as an overview of available visualization and analysis tools, the development of a methodology for selecting appropriate visualizations for the spatio-temporal data, and the development of a demonstrator for supporting the visualization of selected data sets.
How to cite: Schulte, J., Zepner, L. H., Mäs, S., Jirka, S., and Sauer, P.: Supporting Users to Find Appropriate Visualizations of Spatio-Temporal Open Data Sets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20448, https://doi.org/10.5194/egusphere-egu2020-20448, 2020.
Over the last few years, a broad range of open data portals has been set-up. The aim of these portals is to improve the discoverability of open data resources and to strengthen the re-use of data generated by public agencies as well as research activities.
Often, such open data portals offer an immense amount of different types of data that may be relevant for a user. Thus, in order to facilitate the efficient and user-friendly exploration of available data sets, it is essential to visualize the data as quickly and easily as possible. While the visualization of static data sets is already well covered, selecting appropriate visualization approaches for potentially highly-dynamic spatio-temporal data sets is often still a challenge.
Within our contribution, we will introduce a preliminary study conducted by the mVIZ project which is funded by the German Federal Ministry of Transport and Digital Infrastructure as part of the mFUND programm. This project introduces a methodology to support the selection and creation of user-friendly visualizations for data discoverable via the open data portals such as the mCLOUD. During this process, specific consideration are given to properties and metadata of the datasets as input for a decision workflow to suggest appropriate visualization types. A resulting guideline will describe the methodology and serve as a basis for the conception, extension or improvement of visualization tools or for their further development and integration into open data portals.
The project focuses particularly on the creation of an inventory of open spatiotemporal data in open data portals as well as an overview of available visualization and analysis tools, the development of a methodology for selecting appropriate visualizations for the spatio-temporal data, and the development of a demonstrator for supporting the visualization of selected data sets.
How to cite: Schulte, J., Zepner, L. H., Mäs, S., Jirka, S., and Sauer, P.: Supporting Users to Find Appropriate Visualizations of Spatio-Temporal Open Data Sets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20448, https://doi.org/10.5194/egusphere-egu2020-20448, 2020.
EGU2020-21258 | Displays | ESSI2.1
Scaling metadata catalogues with web-based software version control and integration systemsTara Keena, Adam Leadbetter, Andrew Conway, and Will Meaney
The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. One of the foundations of effective data management is appropriate metadata cataloguing; the storing and publishing of descriptive metadata for end users to query online. However, with ocean observing systems constantly evolving and the number of autonomous platforms and sensors growing, the volume and variety of data is constantly increasing, therefore metadata catalogue volumes are also expanding. The ability for data catalogue infrastructures to scale with data growth is a necessity, without causing significant additional overhead, in terms of technical infrastructure and financial costs.
To address some of these challenges, GitHub and Travis CI offers a potential solution for maintaining scalable data catalogues and hosting a variety of file types, all with minimal overhead costs.
GitHub is a repository hosting platform for version control and collaboration, and can be used with documents, computer code, or many file formats
GitHub Pages is a static website hosting service designed to host web pages directly from a GitHub repository
Travis CI is a hosted, distributed continuous integration service used to build and test projects hosted at GitHub
GitHub supports the implementation of a data catalogue as it stores metadata records of different formats in an online repository which is openly accessible and version controlled. The base metadata of the data catalogue in the Marine Institute is ISO 19115/19139 based XML which is in compliance with the INSPIRE implementing rules for metadata. However, using Travis CI, hooks can be provided to build additional metadata records and formats from this base XML, which can also be hosted in the repository. These formats include:
DataCite metadata schema - allowing a completed data description entry to be exported in support of the minting of Digital Object Identifiers (DOI) for published data
Resource Description Framework (RDF) - as part of the semantic web and linked data
Ecological Metadata Language (EML) - for Global Biodiversity Information Facility (GBIF) – which is used to share information about where and when species have been recorded
Schema.org XML – which creates a structured data mark-up schema to increase search engine optimisation (SEO)
HTML - the standard mark-up language for web pages which can be used to represent the XML as a web pages for end users to view the catalogue online
As well as hosting the various file types, GitHub Pages can also render the generated HTML pages as static web pages. This allows users to view and search the catalogue online via a generated static website.
The functionality GitHub has to host and version control metadata files, and render them as web pages, allows for an easier and more transparent generation of an online data catalogue while catering for scalability, hosting and security.
How to cite: Keena, T., Leadbetter, A., Conway, A., and Meaney, W.: Scaling metadata catalogues with web-based software version control and integration systems , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21258, https://doi.org/10.5194/egusphere-egu2020-21258, 2020.
The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. One of the foundations of effective data management is appropriate metadata cataloguing; the storing and publishing of descriptive metadata for end users to query online. However, with ocean observing systems constantly evolving and the number of autonomous platforms and sensors growing, the volume and variety of data is constantly increasing, therefore metadata catalogue volumes are also expanding. The ability for data catalogue infrastructures to scale with data growth is a necessity, without causing significant additional overhead, in terms of technical infrastructure and financial costs.
To address some of these challenges, GitHub and Travis CI offers a potential solution for maintaining scalable data catalogues and hosting a variety of file types, all with minimal overhead costs.
GitHub is a repository hosting platform for version control and collaboration, and can be used with documents, computer code, or many file formats
GitHub Pages is a static website hosting service designed to host web pages directly from a GitHub repository
Travis CI is a hosted, distributed continuous integration service used to build and test projects hosted at GitHub
GitHub supports the implementation of a data catalogue as it stores metadata records of different formats in an online repository which is openly accessible and version controlled. The base metadata of the data catalogue in the Marine Institute is ISO 19115/19139 based XML which is in compliance with the INSPIRE implementing rules for metadata. However, using Travis CI, hooks can be provided to build additional metadata records and formats from this base XML, which can also be hosted in the repository. These formats include:
DataCite metadata schema - allowing a completed data description entry to be exported in support of the minting of Digital Object Identifiers (DOI) for published data
Resource Description Framework (RDF) - as part of the semantic web and linked data
Ecological Metadata Language (EML) - for Global Biodiversity Information Facility (GBIF) – which is used to share information about where and when species have been recorded
Schema.org XML – which creates a structured data mark-up schema to increase search engine optimisation (SEO)
HTML - the standard mark-up language for web pages which can be used to represent the XML as a web pages for end users to view the catalogue online
As well as hosting the various file types, GitHub Pages can also render the generated HTML pages as static web pages. This allows users to view and search the catalogue online via a generated static website.
The functionality GitHub has to host and version control metadata files, and render them as web pages, allows for an easier and more transparent generation of an online data catalogue while catering for scalability, hosting and security.
How to cite: Keena, T., Leadbetter, A., Conway, A., and Meaney, W.: Scaling metadata catalogues with web-based software version control and integration systems , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21258, https://doi.org/10.5194/egusphere-egu2020-21258, 2020.
EGU2020-21522 | Displays | ESSI2.1
OpenSearch API for Earth observation DataHub serviceJovanka Gulicoska, Koushik Panda, and Hervé Caumont
OpenSearch is a de-facto standard specification and a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format.
Evolved through extensions within an international standards organisation, the Open Geospatial Consortium, OpenSearch has become a reference to make queries to a repository that contains Earth Observation information, to send and receive structured, standardized search requests and results, and to allow syndication of repositories. It is in this evolved form a shared API used by many applications, tools, portals and sites in the Earth sciences community. The OGC OpenSearch extensions that have been implemented for the NextGEOSS DataHub, following the OGC standards and validated to be fully compatible with the standard.
The OGC OpenSearch extensions implemented for CKAN, the open source software solution supporting the NextGEOSS Datahub, add the standardized metadata models and the OpenSearch API endpoints that allow the indexing of distributed EO data sources (currently over 110 data collections), and makes these available to client applications to perform queries and get the results. It allowed to develop a simple user interface as part of the NextGEOSS DataHub Portal, which implements the two-step search mechanism (leveraging data collections metadata and data products metadata) and translates the filtering done by users to an OpenSearch matching query. The user interface can render a general description document, that contains information about the collections available on the NextGEOSS DataHub, and then get a more detailed description document for each collection separately.
For generating the structure of the description documents and the result feed, we are using CKAN’s templates, and on top of that we are using additional files which are responsible for listing all available parameters and their options and perform validation on the query before executing. The search endpoint for getting the results feed, uses already existing CKANs API calls in order to perform the validation and get the filtered results taking into consideration the parameters of the user search.
The current NextGEOSS DataHub implementation therefore provides a user interface for users who are not familiar with Earth observation data collections and products, so they can easily create queries and access its results. Moreover, the NextGEOSS project partners are constantly adding additional data connectors and collecting new data sources that will become available through the OGC OpenSearch Extensions API. This will allow NextGEOSS to provide a variety of data for the users and accommodate their needs.
NextGEOSS is a H2020 Research and Development Project from the European Community under grant agreement 730329.
How to cite: Gulicoska, J., Panda, K., and Caumont, H.: OpenSearch API for Earth observation DataHub service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21522, https://doi.org/10.5194/egusphere-egu2020-21522, 2020.
OpenSearch is a de-facto standard specification and a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format.
Evolved through extensions within an international standards organisation, the Open Geospatial Consortium, OpenSearch has become a reference to make queries to a repository that contains Earth Observation information, to send and receive structured, standardized search requests and results, and to allow syndication of repositories. It is in this evolved form a shared API used by many applications, tools, portals and sites in the Earth sciences community. The OGC OpenSearch extensions that have been implemented for the NextGEOSS DataHub, following the OGC standards and validated to be fully compatible with the standard.
The OGC OpenSearch extensions implemented for CKAN, the open source software solution supporting the NextGEOSS Datahub, add the standardized metadata models and the OpenSearch API endpoints that allow the indexing of distributed EO data sources (currently over 110 data collections), and makes these available to client applications to perform queries and get the results. It allowed to develop a simple user interface as part of the NextGEOSS DataHub Portal, which implements the two-step search mechanism (leveraging data collections metadata and data products metadata) and translates the filtering done by users to an OpenSearch matching query. The user interface can render a general description document, that contains information about the collections available on the NextGEOSS DataHub, and then get a more detailed description document for each collection separately.
For generating the structure of the description documents and the result feed, we are using CKAN’s templates, and on top of that we are using additional files which are responsible for listing all available parameters and their options and perform validation on the query before executing. The search endpoint for getting the results feed, uses already existing CKANs API calls in order to perform the validation and get the filtered results taking into consideration the parameters of the user search.
The current NextGEOSS DataHub implementation therefore provides a user interface for users who are not familiar with Earth observation data collections and products, so they can easily create queries and access its results. Moreover, the NextGEOSS project partners are constantly adding additional data connectors and collecting new data sources that will become available through the OGC OpenSearch Extensions API. This will allow NextGEOSS to provide a variety of data for the users and accommodate their needs.
NextGEOSS is a H2020 Research and Development Project from the European Community under grant agreement 730329.
How to cite: Gulicoska, J., Panda, K., and Caumont, H.: OpenSearch API for Earth observation DataHub service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21522, https://doi.org/10.5194/egusphere-egu2020-21522, 2020.
EGU2020-21882 | Displays | ESSI2.1
Research products across space missions – a prototype for central storage, visualization and usabilityMario D'Amore, Andrea Naß, Martin Mühlbauer, Torsten Heinen, Mathias Boeck, Jörn Helbert, Torsten Riedlinger, Ralf Jaumann, and Guenter Strunz
For planetary sciences, the main archives to archived access to mission data are ESA's Planetary Science Archive (PSA) and the Planetary Data System (PSA) nodes in the USA. Along with recent and upcoming planetary missions the amount of different data (remote sensing/in-situ data, derived products) increases constantly and serves as basis for scientific research resulting in derived scientific data and information. Within missions to Mercury (BepiColombo), the Outer Solar System moons (JUICE), and asteroids (NASA`s DAWN), one way of scientific analysis, the systematic mapping of surfaces, has received new impulses, also in Europe. These systematic surface analyses are based on the numeric and visual comparison and combination of different remote sensing data sets, such as optical image data, spectral-/hyperspectral sensor data, radar images, and/or derived products like digital terrain models. The analyses mainly results in map figures, data, and profiles/diagrams, and serves for describing research investigations within scientific publications.
Handling these research products equivalently to missions´ base data in the main archives, web-based geographic information systems became a common mean to impart spatial knowledge to all kinds of possible users in the last years. So, further platforms and initiatives came up handling planetary data within web-based GIS, services, or/and virtual infrastructures. Those systems are either built upon proprietary software environments, but more common upon a well-established stack of open source software such as PostgreSQL, GeoServer (server for sharing geospatial data) and a graphical user interface based on JavaScript. Applicable standards developed by the Open Geospatial Consortium (OGC), such as the Web Map Service (WMS) and the Web Feature Service (WFS) server-based data storage as interface between the user interface and the server.
This contribution aims to a prototypical system for the structured storage and visualization of planetary data compiled and developed within or with the contribution of Institute for Planetary Research (PF, DLR). Consequently, it enables user groups to store and spatially explore research products centrally, sustainably across multiple missions and scientific disciplines [1].
Technically, the system is based on two components: 1) an infrastructure that provides data storage and management capabilities as well as OGC-compliant interfaces for collaborative and web-based data access services, such as the EOC Geoservice [2]. 2) UKIS (Environmental and Crisis Information Systems), a framework developed at DFD for the implementation of geoscientific web applications [3]. Substantially the prototype based on a recent approach developed within PF [4] where an existing database established at Planetary Spectroscopy Laboratory (PSL), handling different kind of spatial data, meets a vector-based data collection of thematic, mainly geologic and geomorphologic mapping results [5].
An information system of this kind is essential to ensure the efficient and sustainable utilization of the information already obtained and published. This is considered a prerequisite for guaranteeing a continuous and long-term use of scientific information and knowledge within institutional frameworks.
[1] Naß, et al (2019) EPSC #1311
[2] Dengler et al. (2013) PV 2013, elib.dlr.de/86351/
[3] Mühlbauer (2019) dlr.de/eoc/UKIS/en/
[4] Naß, d ’Amore, Helbert (2017) EPSC #646-1
[5] Naß, Dawn Science Team (2019) EPSC #1304
How to cite: D'Amore, M., Naß, A., Mühlbauer, M., Heinen, T., Boeck, M., Helbert, J., Riedlinger, T., Jaumann, R., and Strunz, G.: Research products across space missions – a prototype for central storage, visualization and usability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21882, https://doi.org/10.5194/egusphere-egu2020-21882, 2020.
For planetary sciences, the main archives to archived access to mission data are ESA's Planetary Science Archive (PSA) and the Planetary Data System (PSA) nodes in the USA. Along with recent and upcoming planetary missions the amount of different data (remote sensing/in-situ data, derived products) increases constantly and serves as basis for scientific research resulting in derived scientific data and information. Within missions to Mercury (BepiColombo), the Outer Solar System moons (JUICE), and asteroids (NASA`s DAWN), one way of scientific analysis, the systematic mapping of surfaces, has received new impulses, also in Europe. These systematic surface analyses are based on the numeric and visual comparison and combination of different remote sensing data sets, such as optical image data, spectral-/hyperspectral sensor data, radar images, and/or derived products like digital terrain models. The analyses mainly results in map figures, data, and profiles/diagrams, and serves for describing research investigations within scientific publications.
Handling these research products equivalently to missions´ base data in the main archives, web-based geographic information systems became a common mean to impart spatial knowledge to all kinds of possible users in the last years. So, further platforms and initiatives came up handling planetary data within web-based GIS, services, or/and virtual infrastructures. Those systems are either built upon proprietary software environments, but more common upon a well-established stack of open source software such as PostgreSQL, GeoServer (server for sharing geospatial data) and a graphical user interface based on JavaScript. Applicable standards developed by the Open Geospatial Consortium (OGC), such as the Web Map Service (WMS) and the Web Feature Service (WFS) server-based data storage as interface between the user interface and the server.
This contribution aims to a prototypical system for the structured storage and visualization of planetary data compiled and developed within or with the contribution of Institute for Planetary Research (PF, DLR). Consequently, it enables user groups to store and spatially explore research products centrally, sustainably across multiple missions and scientific disciplines [1].
Technically, the system is based on two components: 1) an infrastructure that provides data storage and management capabilities as well as OGC-compliant interfaces for collaborative and web-based data access services, such as the EOC Geoservice [2]. 2) UKIS (Environmental and Crisis Information Systems), a framework developed at DFD for the implementation of geoscientific web applications [3]. Substantially the prototype based on a recent approach developed within PF [4] where an existing database established at Planetary Spectroscopy Laboratory (PSL), handling different kind of spatial data, meets a vector-based data collection of thematic, mainly geologic and geomorphologic mapping results [5].
An information system of this kind is essential to ensure the efficient and sustainable utilization of the information already obtained and published. This is considered a prerequisite for guaranteeing a continuous and long-term use of scientific information and knowledge within institutional frameworks.
[1] Naß, et al (2019) EPSC #1311
[2] Dengler et al. (2013) PV 2013, elib.dlr.de/86351/
[3] Mühlbauer (2019) dlr.de/eoc/UKIS/en/
[4] Naß, d ’Amore, Helbert (2017) EPSC #646-1
[5] Naß, Dawn Science Team (2019) EPSC #1304
How to cite: D'Amore, M., Naß, A., Mühlbauer, M., Heinen, T., Boeck, M., Helbert, J., Riedlinger, T., Jaumann, R., and Strunz, G.: Research products across space missions – a prototype for central storage, visualization and usability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21882, https://doi.org/10.5194/egusphere-egu2020-21882, 2020.
EGU2020-22021 | Displays | ESSI2.1
An open-source database and collections management system for fish scale and otolith archivesElizabeth Tray, Adam Leadbetter, Will Meaney, Andrew Conway, Caoimhín Kelly, Niall O’Maoileidigh, Elvira De Eyto, Siobhan Moran, and Deirdre Brophy
Scales and otoliths (ear stones) from fish are routinely sampled for age estimation and stock management purposes. Growth records from scales and otoliths can be used to generate long-term time series data, and in combination with environmental data, can reveal species specific population responses to a changing climate. Additionally, scale and otolith microchemical data can be utilized to investigate fish habitat usage and migration patters. A common problem associated with biological collections, is that while sample intake grows, long-term digital and physical storage is rarely a priority. Material is often collected to meet short-term objectives and resources are seldom committed to maintaining and archiving collections. As a consequence, precious samples are frequently stored in many different and unsuitable locations, and may become lost or separated from associated metadata. The Marine Institute’s ecological research station in Newport, Ireland, holds a multi-decadal (1928-2020) collection of scales and otoliths from various fish species, gathered from many geographic locations. Here we present an open-source database and archiving system to consolidate and digitize this collection, and show how this case study infrastructure could be used for other biological sample collections. The system utilizes the FAIR (Findable Accessible Interoperable and Reusable) open data principals, and includes a physical repository, sample metadata catalogue, and image library.
How to cite: Tray, E., Leadbetter, A., Meaney, W., Conway, A., Kelly, C., O’Maoileidigh, N., De Eyto, E., Moran, S., and Brophy, D.: An open-source database and collections management system for fish scale and otolith archives, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22021, https://doi.org/10.5194/egusphere-egu2020-22021, 2020.
Scales and otoliths (ear stones) from fish are routinely sampled for age estimation and stock management purposes. Growth records from scales and otoliths can be used to generate long-term time series data, and in combination with environmental data, can reveal species specific population responses to a changing climate. Additionally, scale and otolith microchemical data can be utilized to investigate fish habitat usage and migration patters. A common problem associated with biological collections, is that while sample intake grows, long-term digital and physical storage is rarely a priority. Material is often collected to meet short-term objectives and resources are seldom committed to maintaining and archiving collections. As a consequence, precious samples are frequently stored in many different and unsuitable locations, and may become lost or separated from associated metadata. The Marine Institute’s ecological research station in Newport, Ireland, holds a multi-decadal (1928-2020) collection of scales and otoliths from various fish species, gathered from many geographic locations. Here we present an open-source database and archiving system to consolidate and digitize this collection, and show how this case study infrastructure could be used for other biological sample collections. The system utilizes the FAIR (Findable Accessible Interoperable and Reusable) open data principals, and includes a physical repository, sample metadata catalogue, and image library.
How to cite: Tray, E., Leadbetter, A., Meaney, W., Conway, A., Kelly, C., O’Maoileidigh, N., De Eyto, E., Moran, S., and Brophy, D.: An open-source database and collections management system for fish scale and otolith archives, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22021, https://doi.org/10.5194/egusphere-egu2020-22021, 2020.
ESSI2.10 – Data Integration: Enabling the Acceleration of Science Through Connectivity, Collaboration, and Convergent Science
EGU2020-11709 | Displays | ESSI2.10
Streamlining Oceanic Biogeochemical Dataset Assembly in Support of Global Data ProductsEugene Burger, Benjamin Pfeil, Kevin O'Brien, Linus Kamb, Steve Jones, and Karl Smith
Data assembly in support of global data products, such as GLODAP, and submission of data to national data centers to support long-term preservation, demands significant effort. This is in addition to the effort required to perform quality control on the data prior to submission. Delays in data assembly can negatively affect the timely production of scientific indicators that are dependent upon these datasets, including products such as GLODAP. What if data submission, metadata assembly and quality control can all be rolled into a single application? To support more streamlined data management processes in the NOAA Ocean Acidification Program (OAP) we are developing such an application.This application has the potential for application towards a broader community.
This application addresses the need that data contributing to analysis and synthesis products are high quality, well documented, and accessible from the applications scientists prefer to use. The Scientific Data Integration System (SDIS) application developed by the PMEL Science Data Integration Group, allows scientists to submit their data in a number of formats. Submitted data are checked for common errors. Metadata are extracted from the data that can then be complemented with a complete metadata record using the integrated metadata entry tool that collects rich metadata that meets the Carbon science community requirements. Still being developed, quality control for standard biogeochemical parameters will be integrated into the application. The quality control routines will be implemented in close collaboration with colleagues from the Bjerknes Climate Data Centre (BCDC) within the Bjerknes Centre for Climate Research (BCCR). This presentation will highlight the capabilities that are now available as well as the implementation of the archive automation workflow, and it’s potential use in support of GLODAP data assembly efforts.
How to cite: Burger, E., Pfeil, B., O'Brien, K., Kamb, L., Jones, S., and Smith, K.: Streamlining Oceanic Biogeochemical Dataset Assembly in Support of Global Data Products, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11709, https://doi.org/10.5194/egusphere-egu2020-11709, 2020.
Data assembly in support of global data products, such as GLODAP, and submission of data to national data centers to support long-term preservation, demands significant effort. This is in addition to the effort required to perform quality control on the data prior to submission. Delays in data assembly can negatively affect the timely production of scientific indicators that are dependent upon these datasets, including products such as GLODAP. What if data submission, metadata assembly and quality control can all be rolled into a single application? To support more streamlined data management processes in the NOAA Ocean Acidification Program (OAP) we are developing such an application.This application has the potential for application towards a broader community.
This application addresses the need that data contributing to analysis and synthesis products are high quality, well documented, and accessible from the applications scientists prefer to use. The Scientific Data Integration System (SDIS) application developed by the PMEL Science Data Integration Group, allows scientists to submit their data in a number of formats. Submitted data are checked for common errors. Metadata are extracted from the data that can then be complemented with a complete metadata record using the integrated metadata entry tool that collects rich metadata that meets the Carbon science community requirements. Still being developed, quality control for standard biogeochemical parameters will be integrated into the application. The quality control routines will be implemented in close collaboration with colleagues from the Bjerknes Climate Data Centre (BCDC) within the Bjerknes Centre for Climate Research (BCCR). This presentation will highlight the capabilities that are now available as well as the implementation of the archive automation workflow, and it’s potential use in support of GLODAP data assembly efforts.
How to cite: Burger, E., Pfeil, B., O'Brien, K., Kamb, L., Jones, S., and Smith, K.: Streamlining Oceanic Biogeochemical Dataset Assembly in Support of Global Data Products, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11709, https://doi.org/10.5194/egusphere-egu2020-11709, 2020.
EGU2020-20966 | Displays | ESSI2.10
NextGEOSS data hub and platform - connecting data providers with geosciences communitiesBente Bye, Elnaz Neinavaz, Alaitz Zabala, Joan Maso, Marie-Francoise Voidrot, Barth De Lathouwer, Nuno Catarino, Pedro Gonzalves, Michelle Cortes, Koushik Panda, Julian Meyer-Arnek, and Bram Janssen
The geosciences communities share common challenges related to effective use of the vast and growing amount of data as well as the continueous development of new technology. It is therefore a great potential in learning from the experiences and knowledge aquired across the various fields. The H2020 project NextGEOSS is building a European data hub and platform to support the Earth observation communities with a set of tools and services through the platform. The suite of tools on the platform alllows scalablitly, interoperability and transparency in a flexible way, well suited to serve a multifaceted interdisciplinary community, NextGEOSS is developed with and for multiple communities, the NextGEOSS pilots. This has resulted and continues to provide transfer of experience and knowledge along the whole value chain from data provision to applications and services based on multiple sources of data. We will introduce the NextGEOSS data hub and platform and show some illustrative examples of the exchange of knowledge that facilitates faster uptake of data and advances in use of new technology. An onboarding system is benefitting for existing and new users. A capacity building strategy is an integral part of both the onboarding and the individual services, which will be highligthed in this presentation.
How to cite: Bye, B., Neinavaz, E., Zabala, A., Maso, J., Voidrot, M.-F., De Lathouwer, B., Catarino, N., Gonzalves, P., Cortes, M., Panda, K., Meyer-Arnek, J., and Janssen, B.: NextGEOSS data hub and platform - connecting data providers with geosciences communities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20966, https://doi.org/10.5194/egusphere-egu2020-20966, 2020.
The geosciences communities share common challenges related to effective use of the vast and growing amount of data as well as the continueous development of new technology. It is therefore a great potential in learning from the experiences and knowledge aquired across the various fields. The H2020 project NextGEOSS is building a European data hub and platform to support the Earth observation communities with a set of tools and services through the platform. The suite of tools on the platform alllows scalablitly, interoperability and transparency in a flexible way, well suited to serve a multifaceted interdisciplinary community, NextGEOSS is developed with and for multiple communities, the NextGEOSS pilots. This has resulted and continues to provide transfer of experience and knowledge along the whole value chain from data provision to applications and services based on multiple sources of data. We will introduce the NextGEOSS data hub and platform and show some illustrative examples of the exchange of knowledge that facilitates faster uptake of data and advances in use of new technology. An onboarding system is benefitting for existing and new users. A capacity building strategy is an integral part of both the onboarding and the individual services, which will be highligthed in this presentation.
How to cite: Bye, B., Neinavaz, E., Zabala, A., Maso, J., Voidrot, M.-F., De Lathouwer, B., Catarino, N., Gonzalves, P., Cortes, M., Panda, K., Meyer-Arnek, J., and Janssen, B.: NextGEOSS data hub and platform - connecting data providers with geosciences communities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20966, https://doi.org/10.5194/egusphere-egu2020-20966, 2020.
EGU2020-12386 | Displays | ESSI2.10 | Highlight
Data dissemination best practices and challenges identified through NOAA’s Big Data ProjectMeredith Richardson, Ed Kearns, and Jonathan O'Neil
Through satellites, ships, radars, and weather models, the National Oceanic and Atmospheric Administration (NOAA) generates and handles tens of terabytes of data per day. Many of NOAA’s key datasets have been made available to the public through partnerships with Google, Microsoft, Amazon Web Services, and more as part of the Big Data Project (BDP). This movement of data to the Cloud has enabled access for researchers from all over the world to vast amounts of NOAA data, initiating a new form of federal data management as well as exposing key challenges for the future of open-access data. NOAA researchers have run into challenges of providing “analysis-ready” datasets to which researchers from varying fields can easily access, manipulate, and use for different purposes. This issue arises as there is no agreed-upon format or method of transforming traditional datasets for the cloud across research communities, with each scientific field or start up expressing differing data formatting needs (cloud-optimized, cloud-native, etc.). Some possible solutions involve changing data formats into those widely-used throughout the visualization community, such as Cloud-Optimized GeoTIFF. Initial findings have led NOAA to facilitate roundtable discussions with researchers, public and private stakeholders, and other key members of the data community, to encourage the development of best practices for the use of public data on commercial cloud platforms. Overall, by uploading NOAA data to the Cloud, the BDP has led to the recognition and ongoing development of new best practices for data authentication and dissemination and the identification of key areas for targeting collaboration and data use across scientific communities.
How to cite: Richardson, M., Kearns, E., and O'Neil, J.: Data dissemination best practices and challenges identified through NOAA’s Big Data Project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12386, https://doi.org/10.5194/egusphere-egu2020-12386, 2020.
Through satellites, ships, radars, and weather models, the National Oceanic and Atmospheric Administration (NOAA) generates and handles tens of terabytes of data per day. Many of NOAA’s key datasets have been made available to the public through partnerships with Google, Microsoft, Amazon Web Services, and more as part of the Big Data Project (BDP). This movement of data to the Cloud has enabled access for researchers from all over the world to vast amounts of NOAA data, initiating a new form of federal data management as well as exposing key challenges for the future of open-access data. NOAA researchers have run into challenges of providing “analysis-ready” datasets to which researchers from varying fields can easily access, manipulate, and use for different purposes. This issue arises as there is no agreed-upon format or method of transforming traditional datasets for the cloud across research communities, with each scientific field or start up expressing differing data formatting needs (cloud-optimized, cloud-native, etc.). Some possible solutions involve changing data formats into those widely-used throughout the visualization community, such as Cloud-Optimized GeoTIFF. Initial findings have led NOAA to facilitate roundtable discussions with researchers, public and private stakeholders, and other key members of the data community, to encourage the development of best practices for the use of public data on commercial cloud platforms. Overall, by uploading NOAA data to the Cloud, the BDP has led to the recognition and ongoing development of new best practices for data authentication and dissemination and the identification of key areas for targeting collaboration and data use across scientific communities.
How to cite: Richardson, M., Kearns, E., and O'Neil, J.: Data dissemination best practices and challenges identified through NOAA’s Big Data Project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12386, https://doi.org/10.5194/egusphere-egu2020-12386, 2020.
EGU2020-5972 | Displays | ESSI2.10 | Highlight
Data Systems to Enable Open Science: The Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s Data EcosystemKaylin Bugbee, Aaron Kaulfus, Aimee Barciauskas, Manil Maskey, Rahul Ramachandran, Dai-Hai Ton That, Chris Lynnes, Katrina Virts, Kel Markert, and Amanda Whitehurst
The scientific method within the Earth sciences is rapidly evolving. Ever increasing volumes require new methods for processing and understanding data while an almost 60 year Earth observation record makes more data-intensive retrospective analyses possible. These new methods of data analysis are made possible by technological innovations and interdisciplinary scientific collaborations. While scientists are beginning to adopt new technologies and collaborations to more effectively conduct data-intensive research, both the data information infrastructure and the supporting data stewardship model have been slow to change. Standard data products are generated at a processing system which are then ingested into local archives. These local archive centers then provide metadata to a centralized repository for search and discovery. Each step in the data process occurs independently and on different siloed components. Similarly, the data stewardship process has a well-established but narrow view of data publication that may be too constrained for an ever-changing data environment. To overcome these obstacles, a new approach is needed for both the data information infrastructure and stewardship models. The data ecosystem approach offers a solution to these challenges by placing an emphasis on the relationships between data, technologies and people. In this presentation, we present the Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s (MAAP) data system as a forward-looking ecosystem solution. We will present the components needed to support the MAAP data ecosystem along with the key capabilities the MAAP data ecosystem supports. These capabilities include the ability for users to share data and software within the MAAP, the creation of analysis optimized data services, and the creation of an aggregated catalog for data discovery. We will also explore our data stewardship efforts within this new type of data system which includes developing a data management plan and a level of service plan.
How to cite: Bugbee, K., Kaulfus, A., Barciauskas, A., Maskey, M., Ramachandran, R., Ton That, D.-H., Lynnes, C., Virts, K., Markert, K., and Whitehurst, A.: Data Systems to Enable Open Science: The Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s Data Ecosystem, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5972, https://doi.org/10.5194/egusphere-egu2020-5972, 2020.
The scientific method within the Earth sciences is rapidly evolving. Ever increasing volumes require new methods for processing and understanding data while an almost 60 year Earth observation record makes more data-intensive retrospective analyses possible. These new methods of data analysis are made possible by technological innovations and interdisciplinary scientific collaborations. While scientists are beginning to adopt new technologies and collaborations to more effectively conduct data-intensive research, both the data information infrastructure and the supporting data stewardship model have been slow to change. Standard data products are generated at a processing system which are then ingested into local archives. These local archive centers then provide metadata to a centralized repository for search and discovery. Each step in the data process occurs independently and on different siloed components. Similarly, the data stewardship process has a well-established but narrow view of data publication that may be too constrained for an ever-changing data environment. To overcome these obstacles, a new approach is needed for both the data information infrastructure and stewardship models. The data ecosystem approach offers a solution to these challenges by placing an emphasis on the relationships between data, technologies and people. In this presentation, we present the Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s (MAAP) data system as a forward-looking ecosystem solution. We will present the components needed to support the MAAP data ecosystem along with the key capabilities the MAAP data ecosystem supports. These capabilities include the ability for users to share data and software within the MAAP, the creation of analysis optimized data services, and the creation of an aggregated catalog for data discovery. We will also explore our data stewardship efforts within this new type of data system which includes developing a data management plan and a level of service plan.
How to cite: Bugbee, K., Kaulfus, A., Barciauskas, A., Maskey, M., Ramachandran, R., Ton That, D.-H., Lynnes, C., Virts, K., Markert, K., and Whitehurst, A.: Data Systems to Enable Open Science: The Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform’s Data Ecosystem, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5972, https://doi.org/10.5194/egusphere-egu2020-5972, 2020.
EGU2020-4616 | Displays | ESSI2.10 | Highlight
Documentation of climate change data supporting cross-domain data reuseMartina Stockhause, Mark Greenslade, David Hassell, and Charlotte Pascoe
Climate change data and information is among those of the highest interest for cross-domain researchers, policy makers and the general public. Serving climate projection data to these diverse users requires detailed and accessible documentation.
Thus, the CMIP6 (Coupled Model Intercomparison Project Phase 6) data infrastructure consists not only of the ESGF (Earth System Grid Federation) as the data dissemination component but additionally of ES-DOC (Earth System Documentation) and the Citation Service for describing the provenance of the data. These services provide further information on the data creation process (experiments, models, …) and data reuse (data references and licenses) and connect the data to other external resources like research papers.
The contribution will present documentation of the climate change workflow around the furtherInfoURL page serving as an entry point. The challenges are to collect quality-controlled information from the international research community in different infrastructure components and to display them seamlessly alongside on the furtherInfoURL page.
References / Links:
- CMIP6: https://pcmdi.llnl.gov/CMIP6/
- ES-DOC: https://es-doc.org/
- Citation Service: http://cmip6cite.wdc-climate.de
How to cite: Stockhause, M., Greenslade, M., Hassell, D., and Pascoe, C.: Documentation of climate change data supporting cross-domain data reuse, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4616, https://doi.org/10.5194/egusphere-egu2020-4616, 2020.
Climate change data and information is among those of the highest interest for cross-domain researchers, policy makers and the general public. Serving climate projection data to these diverse users requires detailed and accessible documentation.
Thus, the CMIP6 (Coupled Model Intercomparison Project Phase 6) data infrastructure consists not only of the ESGF (Earth System Grid Federation) as the data dissemination component but additionally of ES-DOC (Earth System Documentation) and the Citation Service for describing the provenance of the data. These services provide further information on the data creation process (experiments, models, …) and data reuse (data references and licenses) and connect the data to other external resources like research papers.
The contribution will present documentation of the climate change workflow around the furtherInfoURL page serving as an entry point. The challenges are to collect quality-controlled information from the international research community in different infrastructure components and to display them seamlessly alongside on the furtherInfoURL page.
References / Links:
- CMIP6: https://pcmdi.llnl.gov/CMIP6/
- ES-DOC: https://es-doc.org/
- Citation Service: http://cmip6cite.wdc-climate.de
How to cite: Stockhause, M., Greenslade, M., Hassell, D., and Pascoe, C.: Documentation of climate change data supporting cross-domain data reuse, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4616, https://doi.org/10.5194/egusphere-egu2020-4616, 2020.
EGU2020-8638 | Displays | ESSI2.10
Managing collaborative research data for integrated, interdisciplinary environmental researchMichael Finkel, Albrecht Baur, Tobias K.D. Weber, Karsten Osenbrück, Hermann Rügner, Carsten Leven, Marc Schwientek, Johanna Schlögl, Ulrich Hahn, Thilo Streck, Olaf A. Cirpka, Thomas Walter, and Peter Grathwohl
The consistent management of research data is crucial for the success of long-term and large-scale collaborative research. Research data management is the basis for efficiency, continuity, and quality of the research, as well as for maximum impact and outreach, including the long-term publication of data and their accessibility. Both funding agencies and publishers increasingly require this long term and open access to research data. Joint environmental studies typically take place in a fragmented research landscape of diverse disciplines; researchers involved typically show a variety of attitudes towards and previous experiences with common data policies, and the extensive variety of data types in interdisciplinary research poses particular challenges for collaborative data management.We present organizational measures, data and metadata management concepts, and technical solutions to form a flexible research data management framework that allows for efficiently sharing the full range of data and metadata among all researchers of the project, and smooth publishing of selected data and data streams to publicly accessible sites. The concept is built upon data type-specific and hierarchical metadata using a common taxonomy agreed upon by all researchers of the project. The framework’s concept has been developed along the needs and demands of the scientists involved, and aims to minimize their effort in data management, which we illustrate from the researchers’ perspective describing their typical workflow from the generation and preparation of data and metadata to the long-term preservation of data including their metadata.
How to cite: Finkel, M., Baur, A., Weber, T. K. D., Osenbrück, K., Rügner, H., Leven, C., Schwientek, M., Schlögl, J., Hahn, U., Streck, T., Cirpka, O. A., Walter, T., and Grathwohl, P.: Managing collaborative research data for integrated, interdisciplinary environmental research, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8638, https://doi.org/10.5194/egusphere-egu2020-8638, 2020.
The consistent management of research data is crucial for the success of long-term and large-scale collaborative research. Research data management is the basis for efficiency, continuity, and quality of the research, as well as for maximum impact and outreach, including the long-term publication of data and their accessibility. Both funding agencies and publishers increasingly require this long term and open access to research data. Joint environmental studies typically take place in a fragmented research landscape of diverse disciplines; researchers involved typically show a variety of attitudes towards and previous experiences with common data policies, and the extensive variety of data types in interdisciplinary research poses particular challenges for collaborative data management.We present organizational measures, data and metadata management concepts, and technical solutions to form a flexible research data management framework that allows for efficiently sharing the full range of data and metadata among all researchers of the project, and smooth publishing of selected data and data streams to publicly accessible sites. The concept is built upon data type-specific and hierarchical metadata using a common taxonomy agreed upon by all researchers of the project. The framework’s concept has been developed along the needs and demands of the scientists involved, and aims to minimize their effort in data management, which we illustrate from the researchers’ perspective describing their typical workflow from the generation and preparation of data and metadata to the long-term preservation of data including their metadata.
How to cite: Finkel, M., Baur, A., Weber, T. K. D., Osenbrück, K., Rügner, H., Leven, C., Schwientek, M., Schlögl, J., Hahn, U., Streck, T., Cirpka, O. A., Walter, T., and Grathwohl, P.: Managing collaborative research data for integrated, interdisciplinary environmental research, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8638, https://doi.org/10.5194/egusphere-egu2020-8638, 2020.
EGU2020-7115 | Displays | ESSI2.10 | Highlight
Data Management for Early Career Scientists – How to Tame the ElephantLaia Comas-Bru and Marcus Schmidt
Data Management can be overwhelming, especially for Early Career Scientists. In order to give them a kick-start, the World Data System (WDS) organised a 3-day EGU-sponsored workshop on current achievements and future challenges in November 2019 in Paris. The purpose of the workshop was to provide Early Career Scientists with practical skills in data curation and management through a combination of practical sessions, group discussions and lectures. Participants were introduced to what are research data andcommon vocabulary to be used during the workshop. Later, a World Café session provided an opportunity to discuss individual challenges on data management and expectations of the workshop in small groups of peers. Lectures and discussions evolved around Open Science, Data Management Plans (DMP), data exchange, copyright and plagiarism, the use of Big Data, ontologies and cloud platforms in Science. Finally, the roles and responsibilities of the WDS as well as its WDS Early Career Researcher Network were discussed. Wrapping-up the workshop, attendees were walked through what is a data repository and how do they obtain their certifications.This PICO presentation given by two attendees of the workshop will showcase the main topics of discussion on data management and curation, provide key examples with special emphasis on the importance of creating a DMP at an early stage of your research project and share practical tools and advise on how to make data management more accessible.
How to cite: Comas-Bru, L. and Schmidt, M.: Data Management for Early Career Scientists – How to Tame the Elephant , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7115, https://doi.org/10.5194/egusphere-egu2020-7115, 2020.
Data Management can be overwhelming, especially for Early Career Scientists. In order to give them a kick-start, the World Data System (WDS) organised a 3-day EGU-sponsored workshop on current achievements and future challenges in November 2019 in Paris. The purpose of the workshop was to provide Early Career Scientists with practical skills in data curation and management through a combination of practical sessions, group discussions and lectures. Participants were introduced to what are research data andcommon vocabulary to be used during the workshop. Later, a World Café session provided an opportunity to discuss individual challenges on data management and expectations of the workshop in small groups of peers. Lectures and discussions evolved around Open Science, Data Management Plans (DMP), data exchange, copyright and plagiarism, the use of Big Data, ontologies and cloud platforms in Science. Finally, the roles and responsibilities of the WDS as well as its WDS Early Career Researcher Network were discussed. Wrapping-up the workshop, attendees were walked through what is a data repository and how do they obtain their certifications.This PICO presentation given by two attendees of the workshop will showcase the main topics of discussion on data management and curation, provide key examples with special emphasis on the importance of creating a DMP at an early stage of your research project and share practical tools and advise on how to make data management more accessible.
How to cite: Comas-Bru, L. and Schmidt, M.: Data Management for Early Career Scientists – How to Tame the Elephant , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7115, https://doi.org/10.5194/egusphere-egu2020-7115, 2020.
EGU2020-22052 | Displays | ESSI2.10
Towards Seamless Planetary-Scale ServicesPeter Baumann and the Peter Baumann
Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework for regular and irregular grids, point clouds, and meshes., in particular: for spatio-temporal datacubes. Services grounding on such common understanding can be more uniform to access and handle, thereby implementing a principle of "minimal surprise" for users visiting different portals. Further, data combination and fusion benefits from canonical metadata allowing alignmen, e.g, between 2D DEMs, 3D satellite image timeseries, 4D atmospheric data.
The EarthServer federation is an open data center network offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on OGC standards, an ecosystem of data has been established that is available to users as a single pool, in particular for efficient distributed data fusion irrespective of data location.
The underlying datacube engine, rasdaman, enables location-transparent federation: clients can submit queries to any node, regardless of where data sit. Query evaluation is optimized automatically, including multi-data fusion of data residing on different nodes. Hence, users perceive one single, common information space. Thanks to the open standards, a broad spectrum of open-source and proprietary clients can utilize this federation, such ranging from OpenLayers and NASA WorldWind over QGIS and ArcGIS to python and R.
In our talk we present technology, services, and governance of this unique intercontinental line-up of data centers. A demo will show distributed datacube fusion live.
How to cite: Baumann, P. and the Peter Baumann: Towards Seamless Planetary-Scale Services , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22052, https://doi.org/10.5194/egusphere-egu2020-22052, 2020.
Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework for regular and irregular grids, point clouds, and meshes., in particular: for spatio-temporal datacubes. Services grounding on such common understanding can be more uniform to access and handle, thereby implementing a principle of "minimal surprise" for users visiting different portals. Further, data combination and fusion benefits from canonical metadata allowing alignmen, e.g, between 2D DEMs, 3D satellite image timeseries, 4D atmospheric data.
The EarthServer federation is an open data center network offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on OGC standards, an ecosystem of data has been established that is available to users as a single pool, in particular for efficient distributed data fusion irrespective of data location.
The underlying datacube engine, rasdaman, enables location-transparent federation: clients can submit queries to any node, regardless of where data sit. Query evaluation is optimized automatically, including multi-data fusion of data residing on different nodes. Hence, users perceive one single, common information space. Thanks to the open standards, a broad spectrum of open-source and proprietary clients can utilize this federation, such ranging from OpenLayers and NASA WorldWind over QGIS and ArcGIS to python and R.
In our talk we present technology, services, and governance of this unique intercontinental line-up of data centers. A demo will show distributed datacube fusion live.
How to cite: Baumann, P. and the Peter Baumann: Towards Seamless Planetary-Scale Services , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22052, https://doi.org/10.5194/egusphere-egu2020-22052, 2020.
EGU2020-18428 | Displays | ESSI2.10
Selection and integration of Earth Observation-based data for an operational disease forecasting systemEleanor A Ainscoe, Barbara Hofmann, Felipe Colon, Iacopo Ferrario, Quillon Harpham, Samuel JW James, Darren Lumbroso, Sajni Malde, Francesca Moschini, and Gina Tsarouchi
The current increase in the volume and quality of Earth Observation (EO) data being collected by satellites offers the potential to contribute to applications across a wide range of scientific domains. It is well established that there are correlations between characteristics that can be derived from EO satellite data, such as land surface temperature or land cover, and the incidence of some diseases. Thanks to the reliable frequent acquisition and rapid distribution of EO data it is now possible for this field to progress from using EO in retrospective analyses of historical disease case counts to using it in operational forecasting systems.
However, bringing together EO-based and non-EO-based datasets, as is required for disease forecasting and many other fields, requires carefully designed data selection, formatting and integration processes. Similarly, it requires careful communication between collaborators to ensure that the priorities of that design process match the requirements of the application.
Here we will present work from the D-MOSS (Dengue forecasting MOdel Satellite-based System) project. D-MOSS is a dengue fever early warning system for South and South East Asia that will allow public health authorities to identify areas at high risk of disease epidemics before an outbreak occurs in order to target resources to reduce spreading of epidemics and improve disease control. The D-MOSS system uses EO, meteorological and seasonal weather forecast data, combined with disease statistics and static layers such as land cover, as the inputs into a dengue fever model and a water availability model. Water availability directly impacts dengue epidemics due to the provision of mosquito breeding sites. The datasets are regularly updated with the latest data and run through the models to produce a new monthly forecast. For this we have designed a system to reliably feed standardised data to the models. The project has involved a close collaboration between remote sensing scientists, geospatial scientists, hydrologists and disease modelling experts. We will discuss our approach to the selection of data sources, data source quality assessment, and design of a processing and ingestion system to produce analysis-ready data for input to the disease and water availability models.
How to cite: Ainscoe, E. A., Hofmann, B., Colon, F., Ferrario, I., Harpham, Q., James, S. J., Lumbroso, D., Malde, S., Moschini, F., and Tsarouchi, G.: Selection and integration of Earth Observation-based data for an operational disease forecasting system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18428, https://doi.org/10.5194/egusphere-egu2020-18428, 2020.
The current increase in the volume and quality of Earth Observation (EO) data being collected by satellites offers the potential to contribute to applications across a wide range of scientific domains. It is well established that there are correlations between characteristics that can be derived from EO satellite data, such as land surface temperature or land cover, and the incidence of some diseases. Thanks to the reliable frequent acquisition and rapid distribution of EO data it is now possible for this field to progress from using EO in retrospective analyses of historical disease case counts to using it in operational forecasting systems.
However, bringing together EO-based and non-EO-based datasets, as is required for disease forecasting and many other fields, requires carefully designed data selection, formatting and integration processes. Similarly, it requires careful communication between collaborators to ensure that the priorities of that design process match the requirements of the application.
Here we will present work from the D-MOSS (Dengue forecasting MOdel Satellite-based System) project. D-MOSS is a dengue fever early warning system for South and South East Asia that will allow public health authorities to identify areas at high risk of disease epidemics before an outbreak occurs in order to target resources to reduce spreading of epidemics and improve disease control. The D-MOSS system uses EO, meteorological and seasonal weather forecast data, combined with disease statistics and static layers such as land cover, as the inputs into a dengue fever model and a water availability model. Water availability directly impacts dengue epidemics due to the provision of mosquito breeding sites. The datasets are regularly updated with the latest data and run through the models to produce a new monthly forecast. For this we have designed a system to reliably feed standardised data to the models. The project has involved a close collaboration between remote sensing scientists, geospatial scientists, hydrologists and disease modelling experts. We will discuss our approach to the selection of data sources, data source quality assessment, and design of a processing and ingestion system to produce analysis-ready data for input to the disease and water availability models.
How to cite: Ainscoe, E. A., Hofmann, B., Colon, F., Ferrario, I., Harpham, Q., James, S. J., Lumbroso, D., Malde, S., Moschini, F., and Tsarouchi, G.: Selection and integration of Earth Observation-based data for an operational disease forecasting system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18428, https://doi.org/10.5194/egusphere-egu2020-18428, 2020.
EGU2020-8937 | Displays | ESSI2.10
Realizing Maximum Transparency of Oceanographic Data Processing and Data Quality Control for Different End-User CommunitiesManuela Köllner, Mayumi Wilms, Anne-Christin Schulz, Martin Moritz, Katrin Latarius, Holger Klein, Kai Herklotz, and Kerstin Jochumsen
Reliable data are the basis for successful research and scientific publishing. Open data policies assure the availability of publicly financed field measurements to the public, thus to all interested scientists. However, the variety of data sources and the availability or lack of detailed metadata cause a huge effort for each scientist to decide if the data are usable for their own research topic or not. Data end-user communities have different requirements in metadata details and data handling during data processing. For data providing institutes or agencies, these needs are essential to know, if they want to reach a wide range of end-user communities.
The Federal Maritime and Hydrographic Agency (BSH, Bundesamt für Seeschifffahrt und Hydrographie, Hamburg, Germany) is collecting a large variety of field data in physical and chemical oceanography, regionally focused on the North Sea, Baltic Sea, and North Atlantic. Data types vary from vertical profiles, time-series, underway measurements as well as real-time or delayed-mode from moored or ship-based instruments. Along other oceanographic data, the BSH provides all physical data via the German Oceanographic Data Center (DOD). It is crucial to aim for a maximum in reliability of the published data to enhance the usage especially in the scientific community.
Here, we present our newly established data processing and quality control procedures using agile project management and workflow techniques, and outline their implementation into metadata and accompanied documentation. To enhance the transparency of data quality control, we will apply a detailed quality flag along with the common data quality flag. This detailed quality flag, established by Mayumi Wilms within the research project RAVE Offshore service (research at alpha ventus) enables data end-users to review the result of several individual quality control checks done during processing and thus to identify easily if the data are usable for their research.
How to cite: Köllner, M., Wilms, M., Schulz, A.-C., Moritz, M., Latarius, K., Klein, H., Herklotz, K., and Jochumsen, K.: Realizing Maximum Transparency of Oceanographic Data Processing and Data Quality Control for Different End-User Communities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8937, https://doi.org/10.5194/egusphere-egu2020-8937, 2020.
Reliable data are the basis for successful research and scientific publishing. Open data policies assure the availability of publicly financed field measurements to the public, thus to all interested scientists. However, the variety of data sources and the availability or lack of detailed metadata cause a huge effort for each scientist to decide if the data are usable for their own research topic or not. Data end-user communities have different requirements in metadata details and data handling during data processing. For data providing institutes or agencies, these needs are essential to know, if they want to reach a wide range of end-user communities.
The Federal Maritime and Hydrographic Agency (BSH, Bundesamt für Seeschifffahrt und Hydrographie, Hamburg, Germany) is collecting a large variety of field data in physical and chemical oceanography, regionally focused on the North Sea, Baltic Sea, and North Atlantic. Data types vary from vertical profiles, time-series, underway measurements as well as real-time or delayed-mode from moored or ship-based instruments. Along other oceanographic data, the BSH provides all physical data via the German Oceanographic Data Center (DOD). It is crucial to aim for a maximum in reliability of the published data to enhance the usage especially in the scientific community.
Here, we present our newly established data processing and quality control procedures using agile project management and workflow techniques, and outline their implementation into metadata and accompanied documentation. To enhance the transparency of data quality control, we will apply a detailed quality flag along with the common data quality flag. This detailed quality flag, established by Mayumi Wilms within the research project RAVE Offshore service (research at alpha ventus) enables data end-users to review the result of several individual quality control checks done during processing and thus to identify easily if the data are usable for their research.
How to cite: Köllner, M., Wilms, M., Schulz, A.-C., Moritz, M., Latarius, K., Klein, H., Herklotz, K., and Jochumsen, K.: Realizing Maximum Transparency of Oceanographic Data Processing and Data Quality Control for Different End-User Communities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8937, https://doi.org/10.5194/egusphere-egu2020-8937, 2020.
EGU2020-21975 | Displays | ESSI2.10 | Highlight
Managing oceanographic data for the Marine Strategy Framework Directive in Spain supported by the Spatial Data Infrastructure of the Spanish Institute of Oceanography (IEO) and the INSPIRE DirectiveGerardo Bruque and Olvido Tello
In Europe, the Marine Strategy Framework Directive (MSFD) seeks to achieve a good environmental status of European marine waters and protect the resource base on which economic and social activities related to the sea depend. With this legislative tool the European Parliament recognizes the vital importance of the management of human activities that have an impact on the marine environment, integrating the concepts of environmental protection and sustainable use.
MSFD establishes a monitoring program of different descriptors for continuous evaluation and periodic updating of the objectives. In Spain, the Ministry of Ecological Transition (MITECO) is responsible and coordinator of carrying out the MSFD, but it is the Spanish Institute of Oceanography (IEO) that performs the research and study of the different indicators and therefore the tasks of collecting oceanographic data.
The Geographic Information Systems Unit of the IEO is responsible for storing, debugging and standardizing this data by including them in the IEO Spatial Data Infrastructure (IDEO). IDEO has useful and advanced tools to discover and manage the oceanographic, spatial or non-spatial data that the IEO manages. To facilitate access to IDEO, the IEO Geoportal was developed, which essentially contains a catalog of metadata and access to different IEO web services and data viewers.
Some examples of priority dataset for the MSFD are: Species and Habitat distribution, commercially-exploited fish and shellfish species distribution, Nutrients, Chlorophyll a, dissolved oxygen, spatial extent of loss of seabed, Contaminants, litter, noise, etc.
The correct preparation and harmonization of the mentioned data sets following the Implementing Rules adopted by the INSPIRE Directive is essential to ensure that the different Spatial Data Infrastructures (SDI) of the member states are compatible and interoperable in the community context.
The INSPIRE Directive was born with the purpose of making relevant, concerted and quality geographic information available in a way that allows the formulation, implementation, monitoring and evaluation of the impact or territorial dimension policies of the European Union.
The geographic data sets, together with their corresponding metadata, constitute the cartographic base on which the information collected for the update of the continuous evaluation of the different descriptors of the MSFD is structured.
Thus, although these datasets are intended for use by public institutions responsible for decision-making on the management of the marine environment, they can also be very useful for a wide range of stakeholders and reused for multiple purposes.
With all this in mind, the INSPIRE Directive is extremely interesting and essential for the tasks required for the MSFD. As with work on our projects related to the Marine Space Planning Directive (MSP).
How to cite: Bruque, G. and Tello, O.: Managing oceanographic data for the Marine Strategy Framework Directive in Spain supported by the Spatial Data Infrastructure of the Spanish Institute of Oceanography (IEO) and the INSPIRE Directive, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21975, https://doi.org/10.5194/egusphere-egu2020-21975, 2020.
In Europe, the Marine Strategy Framework Directive (MSFD) seeks to achieve a good environmental status of European marine waters and protect the resource base on which economic and social activities related to the sea depend. With this legislative tool the European Parliament recognizes the vital importance of the management of human activities that have an impact on the marine environment, integrating the concepts of environmental protection and sustainable use.
MSFD establishes a monitoring program of different descriptors for continuous evaluation and periodic updating of the objectives. In Spain, the Ministry of Ecological Transition (MITECO) is responsible and coordinator of carrying out the MSFD, but it is the Spanish Institute of Oceanography (IEO) that performs the research and study of the different indicators and therefore the tasks of collecting oceanographic data.
The Geographic Information Systems Unit of the IEO is responsible for storing, debugging and standardizing this data by including them in the IEO Spatial Data Infrastructure (IDEO). IDEO has useful and advanced tools to discover and manage the oceanographic, spatial or non-spatial data that the IEO manages. To facilitate access to IDEO, the IEO Geoportal was developed, which essentially contains a catalog of metadata and access to different IEO web services and data viewers.
Some examples of priority dataset for the MSFD are: Species and Habitat distribution, commercially-exploited fish and shellfish species distribution, Nutrients, Chlorophyll a, dissolved oxygen, spatial extent of loss of seabed, Contaminants, litter, noise, etc.
The correct preparation and harmonization of the mentioned data sets following the Implementing Rules adopted by the INSPIRE Directive is essential to ensure that the different Spatial Data Infrastructures (SDI) of the member states are compatible and interoperable in the community context.
The INSPIRE Directive was born with the purpose of making relevant, concerted and quality geographic information available in a way that allows the formulation, implementation, monitoring and evaluation of the impact or territorial dimension policies of the European Union.
The geographic data sets, together with their corresponding metadata, constitute the cartographic base on which the information collected for the update of the continuous evaluation of the different descriptors of the MSFD is structured.
Thus, although these datasets are intended for use by public institutions responsible for decision-making on the management of the marine environment, they can also be very useful for a wide range of stakeholders and reused for multiple purposes.
With all this in mind, the INSPIRE Directive is extremely interesting and essential for the tasks required for the MSFD. As with work on our projects related to the Marine Space Planning Directive (MSP).
How to cite: Bruque, G. and Tello, O.: Managing oceanographic data for the Marine Strategy Framework Directive in Spain supported by the Spatial Data Infrastructure of the Spanish Institute of Oceanography (IEO) and the INSPIRE Directive, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21975, https://doi.org/10.5194/egusphere-egu2020-21975, 2020.
EGU2020-9686 | Displays | ESSI2.10
Python-based Multidimensional and Parallel Climate Model Data Analysis in ECASRegina Kwee, Tobias Weigel, Hannes Thiemann, Karsten Peters, Sandro Fiore, and Donatello Elia
This contribution highlights the Python xarray technique in context of a climate specific application (typical formats are NetCDF, GRIB and HDF).
We will see how to use in-file metadata and why they are so powerful for data analysis, in particular by looking at community specific problems, e.g. one can select purely on coordinate variable names. ECAS, the ENES Climate Analytics Service available at Deutsches Klimarechenzentrum (DKRZ), will help by enabling faster access to the high-volume simulation data output from climate modeling experiments. In this respect, we can also make use of “dask” which was developed for parallel computing and can smoothly work with xarray. This is extremely useful when we want to exploit fully the advantages of our supercomputer.
Our fully integrated service offers an interface via Jupyter notebooks (ecaslab.dkrz.de). We provide an analysis environment without the need of costly transfers, accessing CF standardized data files and all accessible via the ESGF portal on our nodes (esgf-data.dkrz.de). We can analyse the data of e.g. CMIP5, CMIP6, Grand Ensemble and observation data. ECAS was developed in the frame of European Open Source Cloud (EOSC) hub.
How to cite: Kwee, R., Weigel, T., Thiemann, H., Peters, K., Fiore, S., and Elia, D.: Python-based Multidimensional and Parallel Climate Model Data Analysis in ECAS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9686, https://doi.org/10.5194/egusphere-egu2020-9686, 2020.
This contribution highlights the Python xarray technique in context of a climate specific application (typical formats are NetCDF, GRIB and HDF).
We will see how to use in-file metadata and why they are so powerful for data analysis, in particular by looking at community specific problems, e.g. one can select purely on coordinate variable names. ECAS, the ENES Climate Analytics Service available at Deutsches Klimarechenzentrum (DKRZ), will help by enabling faster access to the high-volume simulation data output from climate modeling experiments. In this respect, we can also make use of “dask” which was developed for parallel computing and can smoothly work with xarray. This is extremely useful when we want to exploit fully the advantages of our supercomputer.
Our fully integrated service offers an interface via Jupyter notebooks (ecaslab.dkrz.de). We provide an analysis environment without the need of costly transfers, accessing CF standardized data files and all accessible via the ESGF portal on our nodes (esgf-data.dkrz.de). We can analyse the data of e.g. CMIP5, CMIP6, Grand Ensemble and observation data. ECAS was developed in the frame of European Open Source Cloud (EOSC) hub.
How to cite: Kwee, R., Weigel, T., Thiemann, H., Peters, K., Fiore, S., and Elia, D.: Python-based Multidimensional and Parallel Climate Model Data Analysis in ECAS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9686, https://doi.org/10.5194/egusphere-egu2020-9686, 2020.
EGU2020-15961 | Displays | ESSI2.10
Automatic quality control and quality control schema in the Observation to ArchiveBrenner Silva, Najmeh Kaffashzadeh, Erik Nixdorf, Sebastian Immoor, Philipp Fischer, Norbert Anselm, Peter Gerchow, Angela Schäfer, and Roland Koppe and the Computing and data center
The O2A (Observation to Archive) is a data-flow framework for heterogeneous sources, including multiple institutions and scales of Earth observation. In the O2A, once data transmission is set up, processes are executed to automatically ingest (i.e. collect and harmonize) and quality control data in near real-time. We consider a web-based sensor description application to support transmission and harmonization of observational time-series data. We also consider a product-oriented quality control, where a standardized and scalable approach should integrate the diversity of sensors connected to the framework. A review of literature and observation networks of marine and terrestrial environments is under construction to allow us, for example, to characterize quality tests in use for generic and specific applications. In addition, we use a standardized quality flag scheme to support both user and technical levels of information. In our outlook, a quality score should pair the quality flag to indicate the overall plausibility of each individual data value or to measure the flagging uncertainty. In this work, we present concepts under development and give insights into the data ingest and quality control currently operating within the O2A framework.
How to cite: Silva, B., Kaffashzadeh, N., Nixdorf, E., Immoor, S., Fischer, P., Anselm, N., Gerchow, P., Schäfer, A., and Koppe, R. and the Computing and data center: Automatic quality control and quality control schema in the Observation to Archive, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15961, https://doi.org/10.5194/egusphere-egu2020-15961, 2020.
The O2A (Observation to Archive) is a data-flow framework for heterogeneous sources, including multiple institutions and scales of Earth observation. In the O2A, once data transmission is set up, processes are executed to automatically ingest (i.e. collect and harmonize) and quality control data in near real-time. We consider a web-based sensor description application to support transmission and harmonization of observational time-series data. We also consider a product-oriented quality control, where a standardized and scalable approach should integrate the diversity of sensors connected to the framework. A review of literature and observation networks of marine and terrestrial environments is under construction to allow us, for example, to characterize quality tests in use for generic and specific applications. In addition, we use a standardized quality flag scheme to support both user and technical levels of information. In our outlook, a quality score should pair the quality flag to indicate the overall plausibility of each individual data value or to measure the flagging uncertainty. In this work, we present concepts under development and give insights into the data ingest and quality control currently operating within the O2A framework.
How to cite: Silva, B., Kaffashzadeh, N., Nixdorf, E., Immoor, S., Fischer, P., Anselm, N., Gerchow, P., Schäfer, A., and Koppe, R. and the Computing and data center: Automatic quality control and quality control schema in the Observation to Archive, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15961, https://doi.org/10.5194/egusphere-egu2020-15961, 2020.
EGU2020-20814 | Displays | ESSI2.10
Bayesian Spatio-temporal Geostatistics-based Method for Multiple Satellite Products Fusion and DownscalingYanchen Bo
High-level satellite remote sensing products of Earth surface play an irreplaceable role in global climate change, hydrological cycle modeling and water resources management, environment monitoring and assessment. Earth surface high-level remote sensing products released by NASA, ESA and other agencies are routinely derived from any single remote sensor. Due to the cloud contamination and limitations of retrieval algorithms, the remote sensing products derived from single remote senor are suspected to the incompleteness, low accuracy and less consistency in space and time. Some land surface remote sensing products, such as soil moisture products derived from passive microwave remote sensing data have too coarse spatial resolution to be applied at local scale. Fusion and downscaling is an effective way of improving the quality of satellite remote sensing products.
We developed a Bayesian spatio-temporal geostatistics-based framework for multiple remote sensing products fusion and downscaling. Compared to the existing methods, the presented method has 2 major advantages. The first is that the method was developed in the Bayesian paradigm, so the uncertainties of the multiple remote sensing products being fused or downscaled could be quantified and explicitly expressed in the fusion and downscaling algorithms. The second advantage is that the spatio-temporal autocorrelation is exploited in the fusion approach so that more complete products could be produced by geostatistical estimation.
This method has been applied to the fusion of multiple satellite AOD products, multiple satellite SST products, multiple satellite LST products and downscaling of 25 km spatial resolution soil moisture products. The results were evaluated in both spatio-temporal completeness and accuracy.
How to cite: Bo, Y.: Bayesian Spatio-temporal Geostatistics-based Method for Multiple Satellite Products Fusion and Downscaling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20814, https://doi.org/10.5194/egusphere-egu2020-20814, 2020.
High-level satellite remote sensing products of Earth surface play an irreplaceable role in global climate change, hydrological cycle modeling and water resources management, environment monitoring and assessment. Earth surface high-level remote sensing products released by NASA, ESA and other agencies are routinely derived from any single remote sensor. Due to the cloud contamination and limitations of retrieval algorithms, the remote sensing products derived from single remote senor are suspected to the incompleteness, low accuracy and less consistency in space and time. Some land surface remote sensing products, such as soil moisture products derived from passive microwave remote sensing data have too coarse spatial resolution to be applied at local scale. Fusion and downscaling is an effective way of improving the quality of satellite remote sensing products.
We developed a Bayesian spatio-temporal geostatistics-based framework for multiple remote sensing products fusion and downscaling. Compared to the existing methods, the presented method has 2 major advantages. The first is that the method was developed in the Bayesian paradigm, so the uncertainties of the multiple remote sensing products being fused or downscaled could be quantified and explicitly expressed in the fusion and downscaling algorithms. The second advantage is that the spatio-temporal autocorrelation is exploited in the fusion approach so that more complete products could be produced by geostatistical estimation.
This method has been applied to the fusion of multiple satellite AOD products, multiple satellite SST products, multiple satellite LST products and downscaling of 25 km spatial resolution soil moisture products. The results were evaluated in both spatio-temporal completeness and accuracy.
How to cite: Bo, Y.: Bayesian Spatio-temporal Geostatistics-based Method for Multiple Satellite Products Fusion and Downscaling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20814, https://doi.org/10.5194/egusphere-egu2020-20814, 2020.
EGU2020-8973 | Displays | ESSI2.10
Space debris monitoring based on inter-continental stereoscopic detectionsAlessandro Sozza, Massimo Cencini, Leonardo Parisi, Marco Acernese, Fabio Santoni, Fabrizio Piergentili, Stefania Melillo, and Andrea Cavagna
The monitoring of space debris and satellites orbiting around Earth is an essential topic in the space surveillance. The impact of debris, even of small size, against active spatial installations causes serious damage, malfunctions and potential service interruptions. Collision-avoidance maneuverings are often performed but they require increasingly complex protocols. Density of space debris is now so high that even astronomical observations are often degraded by it. Although it does not affect space weather, it may interfere with weather satellites.
We have developed an innovative experimental technique based on stereometry at intercontinental scale to obtain simultaneous images from two optic observatories, installed in Rome (at the Urbe Airport and in Collepardo on the Apennines) and in Malindi (Kenya). From the observations on Earth, it's possible to reconstruct the three-dimensional position and velocity of the objects. The distance between the two observatories is crucial for an accurate reconstruction. In particular, we have considered the sites of Urbe and Collepardo, with a baseline of 80 km, to detected Low-Earth orbits (LEO), while we have considered a baseline of 6000 km, between Urbe and Malindi, to observe geostationary orbits (GEO).
We will present the validation of the three-dimensional reconstruction method via a fully synthetic procedure that propagate the satellite trajectory, using SGP4 model and TLEs data (provided by NASA), and generate synthetic photographs of the satellite passage from the two observatories. Then we will compare the synthetic results with the experimental results obtained using real optic systems. The procedure can be automatized to identify unknown space objects and even generalized for an arbitrary number of sites of observation. The identified debris will be added to the catalogue DISCOS (Database and Information System Characterizing Objects in Space) owned by the European Space Agency (ESA) to improve the space surveillance and the ability to intervene in the case of potential risks.
How to cite: Sozza, A., Cencini, M., Parisi, L., Acernese, M., Santoni, F., Piergentili, F., Melillo, S., and Cavagna, A.: Space debris monitoring based on inter-continental stereoscopic detections, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8973, https://doi.org/10.5194/egusphere-egu2020-8973, 2020.
The monitoring of space debris and satellites orbiting around Earth is an essential topic in the space surveillance. The impact of debris, even of small size, against active spatial installations causes serious damage, malfunctions and potential service interruptions. Collision-avoidance maneuverings are often performed but they require increasingly complex protocols. Density of space debris is now so high that even astronomical observations are often degraded by it. Although it does not affect space weather, it may interfere with weather satellites.
We have developed an innovative experimental technique based on stereometry at intercontinental scale to obtain simultaneous images from two optic observatories, installed in Rome (at the Urbe Airport and in Collepardo on the Apennines) and in Malindi (Kenya). From the observations on Earth, it's possible to reconstruct the three-dimensional position and velocity of the objects. The distance between the two observatories is crucial for an accurate reconstruction. In particular, we have considered the sites of Urbe and Collepardo, with a baseline of 80 km, to detected Low-Earth orbits (LEO), while we have considered a baseline of 6000 km, between Urbe and Malindi, to observe geostationary orbits (GEO).
We will present the validation of the three-dimensional reconstruction method via a fully synthetic procedure that propagate the satellite trajectory, using SGP4 model and TLEs data (provided by NASA), and generate synthetic photographs of the satellite passage from the two observatories. Then we will compare the synthetic results with the experimental results obtained using real optic systems. The procedure can be automatized to identify unknown space objects and even generalized for an arbitrary number of sites of observation. The identified debris will be added to the catalogue DISCOS (Database and Information System Characterizing Objects in Space) owned by the European Space Agency (ESA) to improve the space surveillance and the ability to intervene in the case of potential risks.
How to cite: Sozza, A., Cencini, M., Parisi, L., Acernese, M., Santoni, F., Piergentili, F., Melillo, S., and Cavagna, A.: Space debris monitoring based on inter-continental stereoscopic detections, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8973, https://doi.org/10.5194/egusphere-egu2020-8973, 2020.
EGU2020-12012 | Displays | ESSI2.10
Automatic Monitoring of Mines Mining based on Multitemporal Remote Sensing Image Change DetectionChengyi Li
For the country and human society, it is a very important and meaningful work to make the mines mining controlled and rationally. Otherwise, illegal mining and unreasonable abandonment will cause waste and loss of resources. With the features of convenient, cheap, and instantaneous, remote sensing technology makes it possible to automatic monitoring the mines mining in large-scale.
We proposed a mine mining change detection framework based on multitemporal remote sensing images. In this framework, the status of mine mining is divided into mining in progress and stopped mining. Based on the multitemporal GF-2 satellite data and the mines mining data from Beijing, China, we have built a mines mining change dataset(BJMMC dataset), which includes two types, from mining to mining, and from mining to discontinued mining. And then we implement a new type of semantic change detection based on convolutional neural networks (CNNs), which involves intuitively inserting semantics into the detected change regions.
We applied our method to the mining monitoring of the Beijing area in another year, and combined with GIS data and field work, the results show that our proposed monitoring method has outstanding performance on the BJMMC dataset.
How to cite: Li, C.: Automatic Monitoring of Mines Mining based on Multitemporal Remote Sensing Image Change Detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12012, https://doi.org/10.5194/egusphere-egu2020-12012, 2020.
For the country and human society, it is a very important and meaningful work to make the mines mining controlled and rationally. Otherwise, illegal mining and unreasonable abandonment will cause waste and loss of resources. With the features of convenient, cheap, and instantaneous, remote sensing technology makes it possible to automatic monitoring the mines mining in large-scale.
We proposed a mine mining change detection framework based on multitemporal remote sensing images. In this framework, the status of mine mining is divided into mining in progress and stopped mining. Based on the multitemporal GF-2 satellite data and the mines mining data from Beijing, China, we have built a mines mining change dataset(BJMMC dataset), which includes two types, from mining to mining, and from mining to discontinued mining. And then we implement a new type of semantic change detection based on convolutional neural networks (CNNs), which involves intuitively inserting semantics into the detected change regions.
We applied our method to the mining monitoring of the Beijing area in another year, and combined with GIS data and field work, the results show that our proposed monitoring method has outstanding performance on the BJMMC dataset.
How to cite: Li, C.: Automatic Monitoring of Mines Mining based on Multitemporal Remote Sensing Image Change Detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12012, https://doi.org/10.5194/egusphere-egu2020-12012, 2020.
EGU2020-16632 | Displays | ESSI2.10
Estimation of Vegetation Proportion Cover to Improve Land Surface EmissivityElnaz Neinavaz, Andrew K. Skidmore, and Roshanak Darvishzadeh
Precise estimation of land surface emissivity (LSE) is essential to predict land surface energy budgets and land surface temperature, as LSE is an indicator of material composition. There exist several approaches to LSE estimation employing remote sensing data; however, the prediction of LSE remains a challenging task. Among the existing approaches for calculating LSE, the NDVI threshold method appears to hold well over vegetated areas. To apply the NDVI threshold method, it is necessary to know the proportion of vegetation cover (Pv). This research aims to investigate the impact of Pv's prediction accuracy on the estimation of LSE over the forest ecosystem. In this regard, a field campaign coinciding with a Landsat-8 overpass was undertaken for the mixed temperate forest of the Bavarian Forest National Park, in southeastern Germany. The Pv in situ measurements were made for 37 plots. Four vegetation indices, namely NDVI, variable atmospherically resistant index, wide dynamic range vegetation index, and three-band gradient difference vegetation index, were applied to predict Pv for further use in LSE computing. Unlike previous studies that suggested variable atmospherically resistant index can be estimated Pv with higher prediction accuracy compared to NDVI over the agricultural area, our results showed that the prediction accuracy of Pv is not different when using NDVI over the forest (R2CV = 0.42, RMSECV = 0.06). Pv was measured with the lowest accuracy using the wide dynamic range vegetation index (R2CV = 0.014, RMSECV = 0.197) and three-band gradient difference vegetation index (R2CV = 0.032, RMSECV = 0.018). The results of this study also revealed that the variation in the prediction accuracy of the Pv has an impact on the results of LSE calculation.
How to cite: Neinavaz, E., Skidmore, A. K., and Darvishzadeh, R.: Estimation of Vegetation Proportion Cover to Improve Land Surface Emissivity, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16632, https://doi.org/10.5194/egusphere-egu2020-16632, 2020.
Precise estimation of land surface emissivity (LSE) is essential to predict land surface energy budgets and land surface temperature, as LSE is an indicator of material composition. There exist several approaches to LSE estimation employing remote sensing data; however, the prediction of LSE remains a challenging task. Among the existing approaches for calculating LSE, the NDVI threshold method appears to hold well over vegetated areas. To apply the NDVI threshold method, it is necessary to know the proportion of vegetation cover (Pv). This research aims to investigate the impact of Pv's prediction accuracy on the estimation of LSE over the forest ecosystem. In this regard, a field campaign coinciding with a Landsat-8 overpass was undertaken for the mixed temperate forest of the Bavarian Forest National Park, in southeastern Germany. The Pv in situ measurements were made for 37 plots. Four vegetation indices, namely NDVI, variable atmospherically resistant index, wide dynamic range vegetation index, and three-band gradient difference vegetation index, were applied to predict Pv for further use in LSE computing. Unlike previous studies that suggested variable atmospherically resistant index can be estimated Pv with higher prediction accuracy compared to NDVI over the agricultural area, our results showed that the prediction accuracy of Pv is not different when using NDVI over the forest (R2CV = 0.42, RMSECV = 0.06). Pv was measured with the lowest accuracy using the wide dynamic range vegetation index (R2CV = 0.014, RMSECV = 0.197) and three-band gradient difference vegetation index (R2CV = 0.032, RMSECV = 0.018). The results of this study also revealed that the variation in the prediction accuracy of the Pv has an impact on the results of LSE calculation.
How to cite: Neinavaz, E., Skidmore, A. K., and Darvishzadeh, R.: Estimation of Vegetation Proportion Cover to Improve Land Surface Emissivity, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16632, https://doi.org/10.5194/egusphere-egu2020-16632, 2020.
ESSI2.11 – Earth/Environmental Science Applications on HPC and Cloud Infrastructures
EGU2020-11280 | Displays | ESSI2.11 | Highlight
Heterogeneous cloud-supercomputing framework for daily seismic noise source inversionAlexey Gokhberg, Laura Ermert, Jonas Igel, and Andreas Fichtner
The study of ambient seismic noise sources and their time- and space-dependent distribution is becoming a crucial component of the real-time monitoring of various geosystems, including active fault zones and volcanoes, as well as geothermal and hydrocarbon reservoirs. In this context, we have previously implemented a combined cloud - HPC infrastructure for production of ambient source maps with high temporal resolution. It covers the entire European continent and the North Atlantic, and is based on seismic data provided by the ORFEUS infrastructure. The solution is based on the Application-as-a-Service concept and includes (1) acquisition of data from distributed ORFEUS data archives, (2) noise source mapping, (3) workflow management, and (4) front-end Web interface to end users.
We present the new results of this ongoing project conducted with support of the Swiss National Supercomputing Centre (CSCS). Our recent goal has been transitioning from mapping the seismic noise sources towards modeling them based on our new method for near real-time finite-frequency ambient seismic noise source inversion. To invert for the power spectral density of the noise source distribution of the secondary microseisms we efficiently forward model global cross-correlation wavefields for any noise distribution. Subsequently, a gradient-based iterative inversion method employing finite-frequency sensitivity kernels is implemented to reduce the misfit between synthetic and observed cross correlations.
During this research we encountered substantial challenges related to the large data volumes and high computational complexity of involved algorithms. We handle these problems by using the CSCS massively parallel heterogeneous supercomputer "Piz Daint". We also apply various specialized numeric techniques which include: (1) using precomputed Green's functions databases generated offline with Axisem and efficiently extracted with Instaseis package and (2) our previously developed high performance package for massive cross correlation of seismograms using GPU accelerators. Furthermore, due to the inherent restrictions of supercomputers, some crucial components of the processing pipeline including the data acquisition and workflow management are deployed on the OpenStack cloud environment. The resulting solution combines the specific advantages of the supercomputer and cloud platforms thus providing a viable distributed platform for the large-scale modeling of seismic noise sources.
How to cite: Gokhberg, A., Ermert, L., Igel, J., and Fichtner, A.: Heterogeneous cloud-supercomputing framework for daily seismic noise source inversion, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11280, https://doi.org/10.5194/egusphere-egu2020-11280, 2020.
The study of ambient seismic noise sources and their time- and space-dependent distribution is becoming a crucial component of the real-time monitoring of various geosystems, including active fault zones and volcanoes, as well as geothermal and hydrocarbon reservoirs. In this context, we have previously implemented a combined cloud - HPC infrastructure for production of ambient source maps with high temporal resolution. It covers the entire European continent and the North Atlantic, and is based on seismic data provided by the ORFEUS infrastructure. The solution is based on the Application-as-a-Service concept and includes (1) acquisition of data from distributed ORFEUS data archives, (2) noise source mapping, (3) workflow management, and (4) front-end Web interface to end users.
We present the new results of this ongoing project conducted with support of the Swiss National Supercomputing Centre (CSCS). Our recent goal has been transitioning from mapping the seismic noise sources towards modeling them based on our new method for near real-time finite-frequency ambient seismic noise source inversion. To invert for the power spectral density of the noise source distribution of the secondary microseisms we efficiently forward model global cross-correlation wavefields for any noise distribution. Subsequently, a gradient-based iterative inversion method employing finite-frequency sensitivity kernels is implemented to reduce the misfit between synthetic and observed cross correlations.
During this research we encountered substantial challenges related to the large data volumes and high computational complexity of involved algorithms. We handle these problems by using the CSCS massively parallel heterogeneous supercomputer "Piz Daint". We also apply various specialized numeric techniques which include: (1) using precomputed Green's functions databases generated offline with Axisem and efficiently extracted with Instaseis package and (2) our previously developed high performance package for massive cross correlation of seismograms using GPU accelerators. Furthermore, due to the inherent restrictions of supercomputers, some crucial components of the processing pipeline including the data acquisition and workflow management are deployed on the OpenStack cloud environment. The resulting solution combines the specific advantages of the supercomputer and cloud platforms thus providing a viable distributed platform for the large-scale modeling of seismic noise sources.
How to cite: Gokhberg, A., Ermert, L., Igel, J., and Fichtner, A.: Heterogeneous cloud-supercomputing framework for daily seismic noise source inversion, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11280, https://doi.org/10.5194/egusphere-egu2020-11280, 2020.
EGU2020-13518 | Displays | ESSI2.11
Supporting Multi-cloud Model Execution with VLabMattia Santoro, Paolo Mazzetti, Nicholas Spadaro, and Stefano Nativi
The VLab (Virtual Laboratory), developed in the context of the European projects ECOPOTENTIAL and ERA-PLANET, is a cloud-based platform to support the activity of environmental scientists in sharing their models. The main challenges addressed by VLab are: (i) minimization of interoperability requirements in the process of model porting (i.e. to simplify as much as possible the process of publishing and sharing a model for model developers) and (ii) support multiple programming languages and environments (it must be possible porting models developed in different programming languages and which use an arbitrary set of libraries).
In this presentation we describe how VLab supports a multi-cloud deployment approach and the benefits.
In this presentation we describe VLab architecture and, in particular, how this enables supporting a multi-cloud deployment approach.
Deploying VLab on different cloud environments allows model execution where it is most convenient, e.g. depending on the availability of required data (move code to data).
This was implemented in the web application for Protected Areas, developed by the Joint Research Centre of the European Commission (EC JRC) in the context of the EuroGEOSS Sprint to Ministerial activity and demonstrated at the last GEO-XVI Plenary meeting in Canberra. The web application demonstrates the use of Copernicus Sentinel data to calculate Land Cover and Land Cover change in a set of Protected Areas belonging to different ecosystems. Based on user’s selection of satellite products to use, the different available cloud platforms where to run the model are presented along with their data availability for the selected products. After the platform selection, the web application utilizes the VLab APIs to launch the EODESM (Earth Observation Data for Ecosystem Monitoring) model (Lucas and Mitchell, 2017), monitoring the execution status and retrieve the output.
Currently, VLab was experimented with the following cloud platforms: Amazon Web Services, three of the 4+1 the Coperncius DIAS platforms (namely: ONDA, Creodias and Sobloo) and the European Open Science Cloud (EOSC).
Another possible scenario empowered by this multi-platform deployment feature is the possibility to let the user choose the computational platform and utilize her/his credentials to request the needed computational resources. Finally, it is also possible to exploit this feature for benchmarking different cloud platforms with respect to their performances.
References
Lucas, R. and A. Mitchell (2017). "Integrated Land Cover and Change Classifications"The Roles of Remote Sensing in Nature Conservation, pp. 295–308.
How to cite: Santoro, M., Mazzetti, P., Spadaro, N., and Nativi, S.: Supporting Multi-cloud Model Execution with VLab, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13518, https://doi.org/10.5194/egusphere-egu2020-13518, 2020.
The VLab (Virtual Laboratory), developed in the context of the European projects ECOPOTENTIAL and ERA-PLANET, is a cloud-based platform to support the activity of environmental scientists in sharing their models. The main challenges addressed by VLab are: (i) minimization of interoperability requirements in the process of model porting (i.e. to simplify as much as possible the process of publishing and sharing a model for model developers) and (ii) support multiple programming languages and environments (it must be possible porting models developed in different programming languages and which use an arbitrary set of libraries).
In this presentation we describe how VLab supports a multi-cloud deployment approach and the benefits.
In this presentation we describe VLab architecture and, in particular, how this enables supporting a multi-cloud deployment approach.
Deploying VLab on different cloud environments allows model execution where it is most convenient, e.g. depending on the availability of required data (move code to data).
This was implemented in the web application for Protected Areas, developed by the Joint Research Centre of the European Commission (EC JRC) in the context of the EuroGEOSS Sprint to Ministerial activity and demonstrated at the last GEO-XVI Plenary meeting in Canberra. The web application demonstrates the use of Copernicus Sentinel data to calculate Land Cover and Land Cover change in a set of Protected Areas belonging to different ecosystems. Based on user’s selection of satellite products to use, the different available cloud platforms where to run the model are presented along with their data availability for the selected products. After the platform selection, the web application utilizes the VLab APIs to launch the EODESM (Earth Observation Data for Ecosystem Monitoring) model (Lucas and Mitchell, 2017), monitoring the execution status and retrieve the output.
Currently, VLab was experimented with the following cloud platforms: Amazon Web Services, three of the 4+1 the Coperncius DIAS platforms (namely: ONDA, Creodias and Sobloo) and the European Open Science Cloud (EOSC).
Another possible scenario empowered by this multi-platform deployment feature is the possibility to let the user choose the computational platform and utilize her/his credentials to request the needed computational resources. Finally, it is also possible to exploit this feature for benchmarking different cloud platforms with respect to their performances.
References
Lucas, R. and A. Mitchell (2017). "Integrated Land Cover and Change Classifications"The Roles of Remote Sensing in Nature Conservation, pp. 295–308.
How to cite: Santoro, M., Mazzetti, P., Spadaro, N., and Nativi, S.: Supporting Multi-cloud Model Execution with VLab, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13518, https://doi.org/10.5194/egusphere-egu2020-13518, 2020.
EGU2020-12904 | Displays | ESSI2.11
Accelerated hydrologic modeling: ParFlow GPU implementationJaro Hokkanen, Jiri Kraus, Andreas Herten, Dirk Pleiter, and Stefan Kollet
ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase provides an embedded Domain-Specific Language (eDSL) for generic numerical implementations with support for supercomputer environments (distributed memory parallelism), on top of which the hydrologic numerical core has been built.
In ParFlow, the newly developed optional GPU acceleration is built directly into the eDSL headers such that, ideally, parallelizing all loops in a single source file requires only a new header file. This is possible because the eDSL API is used for looping, allocating memory, and accessing data structures. The decision to embed GPU acceleration directly into the eDSL layer resulted in a highly productive and minimally invasive implementation.
This eDSL implementation is based on C host language and the support for GPU acceleration is based on CUDA C++. CUDA C++ has been under intense development during the past years, and features such as Unified Memory and host-device lambdas were extensively leveraged in the ParFlow implementation in order to maximize productivity. Efficient intra- and inter-node data transfer between GPUs rests on a CUDA-aware MPI library and application side GPU-based data packing routines.
The current, moderately optimized ParFlow GPU version runs a representative model up to 20 times faster on a node with 2 Intel Skylake processors and 4 NVIDIA V100 GPUs compared to the original version of ParFlow, where the GPUs are not used. The eDSL approach and ParFlow GPU implementation may serve as a blueprint to tackle the challenges of heterogeneous HPC hardware architectures on the path to exascale.
How to cite: Hokkanen, J., Kraus, J., Herten, A., Pleiter, D., and Kollet, S.: Accelerated hydrologic modeling: ParFlow GPU implementation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12904, https://doi.org/10.5194/egusphere-egu2020-12904, 2020.