doi:https://doi.org/

ESSI – Earth & Space Science Informatics

ESSI1.1 – Informatics in Oceanography and Ocean Science

EGU2020-873 | Displays | ESSI1.1

The new online Black Sea Oceanographic Database

Elena Zhuk, Maxim Vecalo, and Andrey Ingerov

The new online Black Sea Oceanographic Database

Elena.V. Zhuk

Marine Hydrophysical Institute, Russian Academy of Science, Russia

alenixx@gmail.com

The new improvements of the Black Sea Oceanographic Database (BSOD) dedicated to the online access of the hydrological and hydro-chemical data, taking into account users priorities, data types, methods and time of data access are presented.

According to the results of the free DBMS analysis, the PostgreSQL object-relational DBMS was selected for archiving the data in the BSOD. PostgreSQL provides high performance and reliability and the ability to work with a big data. Moreover, the PostgreSQL has the functions allowing to work with GIS objects, using the PostGIS extension and has built-in support for poorly structured data in JSON format. For the development provided the capability to select large data set in accordance with the criteria specified by metadata selection. Taking these two features into account, the part of the database responsible for accessing the metadata, was designed for interactive transaction processing (OLTP access template), while the other part, responsible for the in-situ data archiving was developed in accordance with the “star” architecture, which is typical for the OLAP access template.

After analyzing the oceanographic in-situ observations, the following main entities were identified: Cruise, Ship, Station, Measurements, as well as Measured parameters and the relationships between them. A set of attributes was compiled for each of the entities and the tables were designed. The BSOD includes the following:

- Metadata tables : Cruises, ships, stations, stations_parameters.

- Data tables: measurements.

-Vocabularies: vocabularies were constructed using the SeaDataCloud BODC vocabularies parameters.

-Referencedata tables: GEBCO, EDMO, p01_vocabuary, p02_vocabuary, p06_vocabuary, l05_vocabuary.

To provide the online data access to the Black Sea Oceanographic Database, a User Interface-UI was implemented. It was developed using jQuery and mapBox GL javascript libraries and provides visual data selection for date period, cruises, parameters such as temperature, salinity, oxygen, nitrates, nitrites, phosphates and other metadata.

Acknowledgements: the work was carried out in the framework of the Marine Hydrophysical Institute of the Russian Academy of Science task No. 0827-2018-0002.

Keywords: Black Sea, oceanographic database, PostgreSQL, online data access, Geo-information system.

How to cite: Zhuk, E., Vecalo, M., and Ingerov, A.: The new online Black Sea Oceanographic Database , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-873, https://doi.org/10.5194/egusphere-egu2020-873, 2020.

The new online Black Sea Oceanographic Database

Elena.V. Zhuk

Marine Hydrophysical Institute, Russian Academy of Science, Russia

alenixx@gmail.com

The new improvements of the Black Sea Oceanographic Database (BSOD) dedicated to the online access of the hydrological and hydro-chemical data, taking into account users priorities, data types, methods and time of data access are presented.

According to the results of the free DBMS analysis, the PostgreSQL object-relational DBMS was selected for archiving the data in the BSOD. PostgreSQL provides high performance and reliability and the ability to work with a big data. Moreover, the PostgreSQL has the functions allowing to work with GIS objects, using the PostGIS extension and has built-in support for poorly structured data in JSON format. For the development provided the capability to select large data set in accordance with the criteria specified by metadata selection. Taking these two features into account, the part of the database responsible for accessing the metadata, was designed for interactive transaction processing (OLTP access template), while the other part, responsible for the in-situ data archiving was developed in accordance with the “star” architecture, which is typical for the OLAP access template.

After analyzing the oceanographic in-situ observations, the following main entities were identified: Cruise, Ship, Station, Measurements, as well as Measured parameters and the relationships between them. A set of attributes was compiled for each of the entities and the tables were designed. The BSOD includes the following:

- Metadata tables : Cruises, ships, stations, stations_parameters.

- Data tables: measurements.

-Vocabularies: vocabularies were constructed using the SeaDataCloud BODC vocabularies parameters.

-Referencedata tables: GEBCO, EDMO, p01_vocabuary, p02_vocabuary, p06_vocabuary, l05_vocabuary.

To provide the online data access to the Black Sea Oceanographic Database, a User Interface-UI was implemented. It was developed using jQuery and mapBox GL javascript libraries and provides visual data selection for date period, cruises, parameters such as temperature, salinity, oxygen, nitrates, nitrites, phosphates and other metadata.

Acknowledgements: the work was carried out in the framework of the Marine Hydrophysical Institute of the Russian Academy of Science task No. 0827-2018-0002.

Keywords: Black Sea, oceanographic database, PostgreSQL, online data access, Geo-information system.

How to cite: Zhuk, E., Vecalo, M., and Ingerov, A.: The new online Black Sea Oceanographic Database , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-873, https://doi.org/10.5194/egusphere-egu2020-873, 2020.

Discussion

EGU2020-2004 | Displays | ESSI1.1

VM-ADCP backscatter data management using QGIS

Paola Picco, Roberto Nardini, Sara Pensieri, Roberto Bozzano, Luca Repetti, and Maurizio Demarte

VM-ADCP (Vessel Mounted Acoustic Doppler Current Profiler) are regularly operating on board of several research vessels with the aim of providing 3-D ocean currents fields. Along with ocean currents, these instruments also measure acoustic backscatter profile on a known frequency, that can be of great advantages for other environmental investigations such as the zooplankton migrations. The presence of zooplankton can be detected by a variation of acoustic backscatter changing with the depth at a periodic (diurnal or semidiurnal) variability, related to the vertical migration of these organisms. GIS has proven to be a powerful tool to manage the huge amount of VM-ADCP backscatter data obtained during the oceanographic campaigns. Moreover, this allows to extract relevant information on zooplankton distribution and abundance, even when the monitoring strategy of the experiment does not completely meet the temporal and spatial resolution required for these studies. The application here described has been developed on QGIS and tested on the Ligurian Sea (Mediterranean Sea). In order to obtain the comparability of data from instruments operating at different frequencies and sampling set-up, echo intensity data are converted into volume backscatter strength and corrected for the slant-range. Using high-resolution bathymetry rasters acquired and processed by the Italian Hydrographic Institute, allows to discard the anomalous high backscatter values due to presence of the bottom. Another advantage of the GIS is the possibility to easily identify night-collected data from the daily ones and their spatial distribution, as well as those from the surface and the deeper layer. All the possible combinations can be then visualised and analysed.

How to cite: Picco, P., Nardini, R., Pensieri, S., Bozzano, R., Repetti, L., and Demarte, M.: VM-ADCP backscatter data management using QGIS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2004, https://doi.org/10.5194/egusphere-egu2020-2004, 2020.

VM-ADCP (Vessel Mounted Acoustic Doppler Current Profiler) are regularly operating on board of several research vessels with the aim of providing 3-D ocean currents fields. Along with ocean currents, these instruments also measure acoustic backscatter profile on a known frequency, that can be of great advantages for other environmental investigations such as the zooplankton migrations. The presence of zooplankton can be detected by a variation of acoustic backscatter changing with the depth at a periodic (diurnal or semidiurnal) variability, related to the vertical migration of these organisms. GIS has proven to be a powerful tool to manage the huge amount of VM-ADCP backscatter data obtained during the oceanographic campaigns. Moreover, this allows to extract relevant information on zooplankton distribution and abundance, even when the monitoring strategy of the experiment does not completely meet the temporal and spatial resolution required for these studies. The application here described has been developed on QGIS and tested on the Ligurian Sea (Mediterranean Sea). In order to obtain the comparability of data from instruments operating at different frequencies and sampling set-up, echo intensity data are converted into volume backscatter strength and corrected for the slant-range. Using high-resolution bathymetry rasters acquired and processed by the Italian Hydrographic Institute, allows to discard the anomalous high backscatter values due to presence of the bottom. Another advantage of the GIS is the possibility to easily identify night-collected data from the daily ones and their spatial distribution, as well as those from the surface and the deeper layer. All the possible combinations can be then visualised and analysed.

How to cite: Picco, P., Nardini, R., Pensieri, S., Bozzano, R., Repetti, L., and Demarte, M.: VM-ADCP backscatter data management using QGIS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2004, https://doi.org/10.5194/egusphere-egu2020-2004, 2020.

Discussion

EGU2020-8073 | Displays | ESSI1.1

Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service

Peter Thijsse, Dick Schaap, and Michele Fichaut

SeaDataNet is an operational pan-European infrastructure for managing marine and ocean data and its core partners are National Oceanographic Data Centres (NODC’s) and oceanographic data focal points from 34 coastal states in Europe. Currently SeaDataNet gives discovery and access to more than 2.3 million data sets for physical oceanography, chemistry, geology, geophysics, bathymetry and biology from more than 650 data originators. The population has increased considerably in cooperation with and involvement in many associated EU projects and initiatives such as EMODnet. The SeaDataNet infrastructure has been set up in a series of projects in last two decades. Currently the SeaDataNet core services and marine data management standards are upgraded in the EU HORIZON 2020 ‘SeaDataCloud’ project that runs for 4 years from 1^st November 2016. The upgraded services include a movement “to the cloud” via a strategic and technical cooperation of the SeaDataNet consortium with the EUDAT consortium of e-infrastructure service providers. This is an important step into the EOSC domain.

One of the main components of SeaDataNet is the CDI Data Discovery and Access service that provides users access to marine data from 100 connected data centres. The previous version of the CDI service was appreciated for harmonising the dataset, but also had some flaws towards usability of the interface and performance. Under SeaDataCloud the CDI Data Discovery and Access service has now been upgraded by introducing a central data buffer in the cloud that continuously synchronises by replication from the data centres. The “datacache” itself is being hosted and horizontally synchronised between 5 EUDAT e-data centres. During the implementation of the replication prcoess additional quality control mechanisms have been included on the central metadata and associated data in the buffer.

In October 2019 the actual public launch took place of the operational production version of the upgraded CDI Data Discovery and Access service. The user interface has been completely redeveloped, upgraded, reviewed and optimised, offering a very efficient query and shopping experience with great performance. Also, the import process for new and updated CDI metadata and associated data sets has been innovated, introducing successfully cloud technology.

The upgraded user interface has been developed and tested in close cooperation with the users. It now also includes the “MySeaDataCloud” concept in which various services are offered to meet the latest demands of users: e.g. save searches, sharing datasearches and eventually even pushing data in the SDC VRE. The user interface and machine-to-machine interfaces have improved the overall quality, performance and ease-of-use of the CDI service towards human users and machine processes.

The presentation will provide more technical background on the upgrading of the CDI Data Discovery and Access service, and adopting the cloud. It will report on the current release (https://cdi.seadatanet.org), demonstrate the wealth of data, present the experiences of developing services in the cloud, and demonstrate the advantages of this system for the scientific community.

How to cite: Thijsse, P., Schaap, D., and Fichaut, M.: Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8073, https://doi.org/10.5194/egusphere-egu2020-8073, 2020.

SeaDataNet is an operational pan-European infrastructure for managing marine and ocean data and its core partners are National Oceanographic Data Centres (NODC’s) and oceanographic data focal points from 34 coastal states in Europe. Currently SeaDataNet gives discovery and access to more than 2.3 million data sets for physical oceanography, chemistry, geology, geophysics, bathymetry and biology from more than 650 data originators. The population has increased considerably in cooperation with and involvement in many associated EU projects and initiatives such as EMODnet. The SeaDataNet infrastructure has been set up in a series of projects in last two decades. Currently the SeaDataNet core services and marine data management standards are upgraded in the EU HORIZON 2020 ‘SeaDataCloud’ project that runs for 4 years from 1^st November 2016. The upgraded services include a movement “to the cloud” via a strategic and technical cooperation of the SeaDataNet consortium with the EUDAT consortium of e-infrastructure service providers. This is an important step into the EOSC domain.

One of the main components of SeaDataNet is the CDI Data Discovery and Access service that provides users access to marine data from 100 connected data centres. The previous version of the CDI service was appreciated for harmonising the dataset, but also had some flaws towards usability of the interface and performance. Under SeaDataCloud the CDI Data Discovery and Access service has now been upgraded by introducing a central data buffer in the cloud that continuously synchronises by replication from the data centres. The “datacache” itself is being hosted and horizontally synchronised between 5 EUDAT e-data centres. During the implementation of the replication prcoess additional quality control mechanisms have been included on the central metadata and associated data in the buffer.

In October 2019 the actual public launch took place of the operational production version of the upgraded CDI Data Discovery and Access service. The user interface has been completely redeveloped, upgraded, reviewed and optimised, offering a very efficient query and shopping experience with great performance. Also, the import process for new and updated CDI metadata and associated data sets has been innovated, introducing successfully cloud technology.

The upgraded user interface has been developed and tested in close cooperation with the users. It now also includes the “MySeaDataCloud” concept in which various services are offered to meet the latest demands of users: e.g. save searches, sharing datasearches and eventually even pushing data in the SDC VRE. The user interface and machine-to-machine interfaces have improved the overall quality, performance and ease-of-use of the CDI service towards human users and machine processes.

The presentation will provide more technical background on the upgrading of the CDI Data Discovery and Access service, and adopting the cloud. It will report on the current release (https://cdi.seadatanet.org), demonstrate the wealth of data, present the experiences of developing services in the cloud, and demonstrate the advantages of this system for the scientific community.

How to cite: Thijsse, P., Schaap, D., and Fichaut, M.: Delivering marine data from the cloud using the SeaDataCloud Discovery and Access service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8073, https://doi.org/10.5194/egusphere-egu2020-8073, 2020.

Discussion

EGU2020-10292 | Displays | ESSI1.1

New BioGeoChemical products provided by the Copernicus Marine Service

Virginie Racapé, Vidar Lien, Nilsen Jan Even Øie, Havard Vindenes, Leonidas Perivoliotis, and Seppo Kaitala

The Copernicus Marine service is a “one-stop-shop” providing freely available operational data on the state of the marine environment for use by marine managers, advisors, and scientists, as well as intermediate and end users in marine businesses and operations. The Copernicus Marine service offers operationally updated and state-of-the-art products that are well documented and transparent. The European Commission’s long-term commitment to the Copernicus program offers long-term visibility and stability of the Copernicus Marine products. Furthermore, Copernicus Marine offers a dedicated service desk, in addition to training sessions and workshops.

Here, we present the in situ biogeochemical data products distributed by the Copernicus Marine System since 2018. It offers available data of chlorophyll-a, oxygen, and nutrients collected across the globe. These products integrate observation aggregated from the Regional EuroGOOS consortium (Arctic-ROOS, BOOS, NOOS, IBI-ROOS, MONGOOS) and Black Sea GOOS as well as from SeaDataNet2 National Data Centers (NODCs) and JCOMM global systems (Argo, GOSUD, OceanSITES, GTSPP, DBCP) and the Global telecommunication system (GTS) used by the Met Offices.

The in situ Near Real Time biogeochemical product is updated every month whereas the reprocessed product is updated two times per year. Products are delivered on NetCDF4 format compliant with the CF1.7 standard and well-documented quality control procedures.

How to cite: Racapé, V., Lien, V., Jan Even Øie, N., Vindenes, H., Perivoliotis, L., and Kaitala, S.: New BioGeoChemical products provided by the Copernicus Marine Service , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10292, https://doi.org/10.5194/egusphere-egu2020-10292, 2020.

The Copernicus Marine service is a “one-stop-shop” providing freely available operational data on the state of the marine environment for use by marine managers, advisors, and scientists, as well as intermediate and end users in marine businesses and operations. The Copernicus Marine service offers operationally updated and state-of-the-art products that are well documented and transparent. The European Commission’s long-term commitment to the Copernicus program offers long-term visibility and stability of the Copernicus Marine products. Furthermore, Copernicus Marine offers a dedicated service desk, in addition to training sessions and workshops.

Here, we present the in situ biogeochemical data products distributed by the Copernicus Marine System since 2018. It offers available data of chlorophyll-a, oxygen, and nutrients collected across the globe. These products integrate observation aggregated from the Regional EuroGOOS consortium (Arctic-ROOS, BOOS, NOOS, IBI-ROOS, MONGOOS) and Black Sea GOOS as well as from SeaDataNet2 National Data Centers (NODCs) and JCOMM global systems (Argo, GOSUD, OceanSITES, GTSPP, DBCP) and the Global telecommunication system (GTS) used by the Met Offices.

The in situ Near Real Time biogeochemical product is updated every month whereas the reprocessed product is updated two times per year. Products are delivered on NetCDF4 format compliant with the CF1.7 standard and well-documented quality control procedures.

How to cite: Racapé, V., Lien, V., Jan Even Øie, N., Vindenes, H., Perivoliotis, L., and Kaitala, S.: New BioGeoChemical products provided by the Copernicus Marine Service , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10292, https://doi.org/10.5194/egusphere-egu2020-10292, 2020.

Discussion

EGU2020-10296 | Displays | ESSI1.1

EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seas

Dick M. A. Schaap and Thierry Schmitt

Access to marine data is a key issue for the EU Marine Strategy Framework Directive and the EU Marine Knowledge 2020 agenda and includes the European Marine Observation and Data Network (EMODnet) initiative. EMODnet aims at assembling European marine data, data products and metadata from diverse sources in a uniform way.

The EMODnet Bathymetry project is active since 2008 and has developed Digital Terrain Models (DTM) for the European seas, which are published at a regular interval, each time improving quality and precision, and expanding functionalities for viewing, using, and downloading. The DTMs are produced from survey and aggregated data sets that are referenced with metadata adopting the SeaDataNet Catalogue services. SeaDataNet is a network of major oceanographic data centres around the European seas that manage, operate and further develop a pan-European infrastructure for marine and ocean data management. The latest EMODnet Bathymetry DTM release also includes Satellite Derived Bathymetry and has a grid resolution of 1/16 arcminute (circa 125 meters), covering all European sea regions. Use has been made of circa 9400 gathered survey datasets, composite DTMs and SDB bathymetry. Catalogues and the EMODnet DTM are published at the dedicated EMODnet Bathymetry portal including a versatile DTM viewing and downloading service.

As part of the expansion and innovation, more focus has been directed towards bathymetry for near coastal waters and coastal zones. And Satellite Derived Bathymetry data have been produced and included to fill gaps in coverage of the coastal zones. The Bathymetry Viewing and Download service has been upgraded to provide a multi-resolution map and including versatile 3D viewing. Moreover, best-estimates have been determined of the European coastline for a range of tidal levels (HAT, MHW, MSL, Chart Datum, LAT), thereby making use of a tidal model for Europe. In addition, a Quality Index layer has been formulated with indicators derived from the source data and which can be queried in the The Bathymetry Viewing and Download service. Finally, extra functonality has been added to the mechanism for downloading DTM tiles in various formats and special high-resolution DTMs for interesting areas.

This results in many users visiting the portal, browsing the DTM Viewer, downloading the DTM tiles and making use of the OGC Web services for using the EMODnet Bathymetry in their applications.

The presentation will highlight key details of the EMODnet Bathymetry DTM production process and the Bathymetry portal with its extensive functionality.

How to cite: Schaap, D. M. A. and Schmitt, T.: EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seas , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10296, https://doi.org/10.5194/egusphere-egu2020-10296, 2020.

Access to marine data is a key issue for the EU Marine Strategy Framework Directive and the EU Marine Knowledge 2020 agenda and includes the European Marine Observation and Data Network (EMODnet) initiative. EMODnet aims at assembling European marine data, data products and metadata from diverse sources in a uniform way.

The EMODnet Bathymetry project is active since 2008 and has developed Digital Terrain Models (DTM) for the European seas, which are published at a regular interval, each time improving quality and precision, and expanding functionalities for viewing, using, and downloading. The DTMs are produced from survey and aggregated data sets that are referenced with metadata adopting the SeaDataNet Catalogue services. SeaDataNet is a network of major oceanographic data centres around the European seas that manage, operate and further develop a pan-European infrastructure for marine and ocean data management. The latest EMODnet Bathymetry DTM release also includes Satellite Derived Bathymetry and has a grid resolution of 1/16 arcminute (circa 125 meters), covering all European sea regions. Use has been made of circa 9400 gathered survey datasets, composite DTMs and SDB bathymetry. Catalogues and the EMODnet DTM are published at the dedicated EMODnet Bathymetry portal including a versatile DTM viewing and downloading service.

As part of the expansion and innovation, more focus has been directed towards bathymetry for near coastal waters and coastal zones. And Satellite Derived Bathymetry data have been produced and included to fill gaps in coverage of the coastal zones. The Bathymetry Viewing and Download service has been upgraded to provide a multi-resolution map and including versatile 3D viewing. Moreover, best-estimates have been determined of the European coastline for a range of tidal levels (HAT, MHW, MSL, Chart Datum, LAT), thereby making use of a tidal model for Europe. In addition, a Quality Index layer has been formulated with indicators derived from the source data and which can be queried in the The Bathymetry Viewing and Download service. Finally, extra functonality has been added to the mechanism for downloading DTM tiles in various formats and special high-resolution DTMs for interesting areas.

This results in many users visiting the portal, browsing the DTM Viewer, downloading the DTM tiles and making use of the OGC Web services for using the EMODnet Bathymetry in their applications.

The presentation will highlight key details of the EMODnet Bathymetry DTM production process and the Bathymetry portal with its extensive functionality.

How to cite: Schaap, D. M. A. and Schmitt, T.: EMODnet Bathymetry – further developing a high resolution digital bathymetry for European seas , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10296, https://doi.org/10.5194/egusphere-egu2020-10296, 2020.

Discussion

EGU2020-13614 | Displays | ESSI1.1

Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment

Charles Troupin, Alexander Barth, Merret Buurman, Sebastian Mieruch, Léo Bruvry Lagadec, Themis Zamani, and Peter Thijsse

A typical hurdle faced by scientists when it comes to process data is the installation and maintenance of software tools: the installation procedures are sometimes poorly documented, while there is often several dependencies that may create incompatibilities issues. In order to make easier the life of scientists and experts, a Virtual Research Environment (VRE) is being developed in the frame of SeaDataCloud project.

The goal is to provide them with a computing environment where the tools are already deployed and datasets are available for direct processing. In the context of SeaDataCloud, the tools are:

WebODV, able to perform data reading, quality check, subsetting, among many other possibilities.
DIVAnd, for the spatial interpolation of in situ measurements.
A visualisation toolbox for both the input data and the output, gridded fields.

DIVAnd

DIVAnd (Data-Interpolating Variational Analysis in n dimensions) is a software tool designed to generate a set of gridded fields from in situ observations. The code is written in Julia a high-performance programming language (https://julialang.org/), particularly suitable for the processing of large matrices.

The code, developed and improved on a regular basis, is distributed via the hosting platform GitHub: https://github.com/gher-ulg/DIVAnd.jl. It supports Julia-1.0 since its version 2.1.0 (September 2018).

Notebooks

Along with the source code, a set of jupyter-notebooks describing the different steps for the production of a climatology are provided, with an increasing level of complexity: https://github.com/gher-ulg/Diva-Workshops/tree/master/notebooks.

Deployment in the VRE

JupyterHub (https://jupyter.org/hub), is a multiple-user instance of jupyter notebooks. It has proven an adequate solution to allow several users to work simultaneously with the DIVAnd tool and it offers different ways to isolate the users. The approach selected in the frame of this project is the Docker containers, in which the software tools, as well as their dependencies, are stored. This solution allows multiple copies of a container to be run efficiently in a system and also makes it easier to perform the deployment in the VRE. The authentication step is also managed by JupyterHub.

Docker container

The Docker container is distributed via Docker Hub (https://hub.docker.com/r/abarth/divand-jupyterhub) and includes the installation of:

The Julia language (currently version 1.3.1);
Libraries and tools such as netCDF, unzip, git;
Various Julia packages such as PyPlot (plotting library), NCDatasets (manipulation of netCDF files) and DIVAnd.jl.
The most recent version of the DIVAnd notebooks.

All in all, Docker allows one to provide a standardized computing environment to all users and helped significantly the development of the VRE.

How to cite: Troupin, C., Barth, A., Buurman, M., Mieruch, S., Bruvry Lagadec, L., Zamani, T., and Thijsse, P.: Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13614, https://doi.org/10.5194/egusphere-egu2020-13614, 2020.

A typical hurdle faced by scientists when it comes to process data is the installation and maintenance of software tools: the installation procedures are sometimes poorly documented, while there is often several dependencies that may create incompatibilities issues. In order to make easier the life of scientists and experts, a Virtual Research Environment (VRE) is being developed in the frame of SeaDataCloud project.

The goal is to provide them with a computing environment where the tools are already deployed and datasets are available for direct processing. In the context of SeaDataCloud, the tools are:

WebODV, able to perform data reading, quality check, subsetting, among many other possibilities.
DIVAnd, for the spatial interpolation of in situ measurements.
A visualisation toolbox for both the input data and the output, gridded fields.

DIVAnd

DIVAnd (Data-Interpolating Variational Analysis in n dimensions) is a software tool designed to generate a set of gridded fields from in situ observations. The code is written in Julia a high-performance programming language (https://julialang.org/), particularly suitable for the processing of large matrices.

The code, developed and improved on a regular basis, is distributed via the hosting platform GitHub: https://github.com/gher-ulg/DIVAnd.jl. It supports Julia-1.0 since its version 2.1.0 (September 2018).

Notebooks

Along with the source code, a set of jupyter-notebooks describing the different steps for the production of a climatology are provided, with an increasing level of complexity: https://github.com/gher-ulg/Diva-Workshops/tree/master/notebooks.

Deployment in the VRE

JupyterHub (https://jupyter.org/hub), is a multiple-user instance of jupyter notebooks. It has proven an adequate solution to allow several users to work simultaneously with the DIVAnd tool and it offers different ways to isolate the users. The approach selected in the frame of this project is the Docker containers, in which the software tools, as well as their dependencies, are stored. This solution allows multiple copies of a container to be run efficiently in a system and also makes it easier to perform the deployment in the VRE. The authentication step is also managed by JupyterHub.

Docker container

The Docker container is distributed via Docker Hub (https://hub.docker.com/r/abarth/divand-jupyterhub) and includes the installation of:

The Julia language (currently version 1.3.1);
Libraries and tools such as netCDF, unzip, git;
Various Julia packages such as PyPlot (plotting library), NCDatasets (manipulation of netCDF files) and DIVAnd.jl.
The most recent version of the DIVAnd notebooks.

All in all, Docker allows one to provide a standardized computing environment to all users and helped significantly the development of the VRE.

How to cite: Troupin, C., Barth, A., Buurman, M., Mieruch, S., Bruvry Lagadec, L., Zamani, T., and Thijsse, P.: Speeding-up data analysis: DIVAnd interpolation tool in the Virtual Research Environment, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13614, https://doi.org/10.5194/egusphere-egu2020-13614, 2020.

Discussion

EGU2020-16449 | Displays | ESSI1.1

Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainability

Sara Garavelli and Dick M. A. Schaap

The European Open Science Cloud (EOSC) is an initiative launched by the European Commission in 2016, as part of the European Cloud Initiative. EOSC aims to provide a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines, leveraging and federating the existing data infrastructures.

Following its launch several Calls have been published and several projects have been granted for developing (parts of) the EOSC, such as for example ENVRI-FAIR. For the marine domain a dedicated call was launched as part of ‘The Future of Seas and Oceans Flagship Initiative’, combining interests of developing a thematic marine EOSC cloud and serving the Blue Economy, Marine Environment and Marine Knowledge agendas.

The winning H2020 Blue-Cloud project is dedicated to marine data management and it is coordinated by Trust-IT with MARIS as technical coordinator. The aims are:

To build and demonstrate a Pilot Blue Cloud by combining distributed marine data resources, computing platforms, and analytical services
To develop services for supporting research to better understand & manage the many aspects of ocean sustainability

To develop and validate a number of demonstrators of relevance for marine societal challenges
To formulate a roadmap for expansion and sustainability of the Blue Cloud infrastructure and services.

The project will federate leading European marine data management infrastructures (SeaDataNet, EurOBIS, Euro-Argo, Argo GDAC, EMODnet, ELIXIR-ENA, EuroBioImaging, CMEMS, C3S, and ICOS-Marine), and horizontal e-infrastructures (EUDAT, DIAS, D4Science) to capitalise on what exists already and to develop and deploy the Blue Cloud. The federation will be at the levels of data resources, computing resources and analytical service resources. A Blue Cloud data discovery and access service will be developed to facilitate sharing with users of multi-disciplinary datasets. A Blue Cloud Virtual Research Environment (VRE) will be established to facilitate that computing and analytical services can be shared and combined for specific applications.

This innovation potential will be explored and unlocked by developing five dedicated Demonstrators as Virtual Labs together with excellent marine researchers. There is already a large portfolio of existing services managed by the Blue Cloud founders which will be activated and integrated to serve the Blue-Cloud.

The modular architecture of the VRE will allow scalability and sustainability for near-future expansions, such as connecting additional infrastructures, implementing more and advanced blue analytical services, configuring more dedicated Virtual Labs, and targeting more (groups of) users.

The presentation will describe the vision of the Blue-Cloud framework, the Blue-Cloud data discovery and access service (to find and retrieve data sets from a diversified array of key marine data infrastructures dealing with physics, biology, biodiversity, chemistry, and bio genomics), the Blue-Cloud VRE (to facilitate collaborative research using a variety of data sets and analytical tools, complemented by generic services such as sub-setting, pre-processing, harmonizing, publishing and visualization). The technical architecture of Blue-Cloud will be presented via 5 real-life use-cases to demonstrate the impact that such innovation can have on science and society.

How to cite: Garavelli, S. and Schaap, D. M. A.: Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainability , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16449, https://doi.org/10.5194/egusphere-egu2020-16449, 2020.

The European Open Science Cloud (EOSC) is an initiative launched by the European Commission in 2016, as part of the European Cloud Initiative. EOSC aims to provide a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines, leveraging and federating the existing data infrastructures.

Following its launch several Calls have been published and several projects have been granted for developing (parts of) the EOSC, such as for example ENVRI-FAIR. For the marine domain a dedicated call was launched as part of ‘The Future of Seas and Oceans Flagship Initiative’, combining interests of developing a thematic marine EOSC cloud and serving the Blue Economy, Marine Environment and Marine Knowledge agendas.

The winning H2020 Blue-Cloud project is dedicated to marine data management and it is coordinated by Trust-IT with MARIS as technical coordinator. The aims are:

To build and demonstrate a Pilot Blue Cloud by combining distributed marine data resources, computing platforms, and analytical services
To develop services for supporting research to better understand & manage the many aspects of ocean sustainability

To develop and validate a number of demonstrators of relevance for marine societal challenges
To formulate a roadmap for expansion and sustainability of the Blue Cloud infrastructure and services.

The project will federate leading European marine data management infrastructures (SeaDataNet, EurOBIS, Euro-Argo, Argo GDAC, EMODnet, ELIXIR-ENA, EuroBioImaging, CMEMS, C3S, and ICOS-Marine), and horizontal e-infrastructures (EUDAT, DIAS, D4Science) to capitalise on what exists already and to develop and deploy the Blue Cloud. The federation will be at the levels of data resources, computing resources and analytical service resources. A Blue Cloud data discovery and access service will be developed to facilitate sharing with users of multi-disciplinary datasets. A Blue Cloud Virtual Research Environment (VRE) will be established to facilitate that computing and analytical services can be shared and combined for specific applications.

This innovation potential will be explored and unlocked by developing five dedicated Demonstrators as Virtual Labs together with excellent marine researchers. There is already a large portfolio of existing services managed by the Blue Cloud founders which will be activated and integrated to serve the Blue-Cloud.

The modular architecture of the VRE will allow scalability and sustainability for near-future expansions, such as connecting additional infrastructures, implementing more and advanced blue analytical services, configuring more dedicated Virtual Labs, and targeting more (groups of) users.

The presentation will describe the vision of the Blue-Cloud framework, the Blue-Cloud data discovery and access service (to find and retrieve data sets from a diversified array of key marine data infrastructures dealing with physics, biology, biodiversity, chemistry, and bio genomics), the Blue-Cloud VRE (to facilitate collaborative research using a variety of data sets and analytical tools, complemented by generic services such as sub-setting, pre-processing, harmonizing, publishing and visualization). The technical architecture of Blue-Cloud will be presented via 5 real-life use-cases to demonstrate the impact that such innovation can have on science and society.

How to cite: Garavelli, S. and Schaap, D. M. A.: Blue-Cloud: Developing a marine thematic EOSC cloud to explore and demonstrate the potential of cloud based open science in the domain of ocean sustainability , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16449, https://doi.org/10.5194/egusphere-egu2020-16449, 2020.

Discussion

EGU2020-20215 | Displays | ESSI1.1

Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data

Benjamin Pfeil, Steve Jones, Maren Karlsen, Camilla Stegen Landa, Rocio Castano Primo, Alex Vermeulen, and Oleg Mirzov

Essential Ocean Variable Inorganic Carbon observations collected from instruments at sea are typically processed by individual PIs before submitting to data centres and other data archives. Often this work is done on an ad-hoc basis using unpublished, self-built software, and published in unique formats. This conflicts with the Interoperability and Reusability aspects of the FAIR data principles: such data requires significant reformatting efforts by data centres and/or end users, and reproducibility is impossible without a full record of the processing performed and QC decisions made by PIs. The manual nature of this process implies additional workload for PIs who need to submit their data to multiple archives/data product. There is a clear need to standardise the data workflow from measurement to publication using common, open source, and documented tools whose algorithms are fully accessible and all processing is recorded for full transparency.

The Ocean Thematic Centre of the European Research Infrastructure ICOS (Integrated Carbon Observation System) is developing QuinCe, a browser-based tool for uploading, processing, automatic and manual quality control, and publication of data from underway pCO₂ systems on ships and moorings. Data can be uploaded directly from instruments in any text format, where it is standardised and processed using algorithms approved by the scientific community. Automatic QC algorithms can detect many obvious data errors; afterwards PIs can perform full quality control of the data following Standard Operating Procedures and best practises. All records of QC decisions, with enforced explanatory notes, are recorded by the software to enable full traceability and reproducibility. The final QCed dataset can be downloaded by the PI, and is sent to the ICOS Carbon Portal and SOCAT project for publication. The ICOS Carbon Portal integrates marine data with ICOS data from the ecosystem and atmosphere on a regional scale and data is integrated via SOCAT in the annual Global Carbon Budgets of the Global Carbon Project where it informs policy/decision makers, the scientific community and the general public.

For platforms with operational data flows, the data is transmitted directly from ship to shore, QuinCe processes, quality controls and publishes Near Real Time data to the ICOS Carbon Portal and to Copernicus Marine Environmental Monitoring Services In Situ TAC as soon as it is received with no human intervention, greatly reducing the time from measurement to data availability.

Full metadata records for instruments are kept and maintained at the ICOS Carbon Portal, utilising existing standardised vocabularies and version control to maintain a complete history. The correct metadata for any given dataset is available at any time, and can be converted to any required format, allowing compliance with the United Nations Sustainable Development Goal 14.3.1 methodology ‘average marine acidity (pH) measured at agreed suite of representative sampling stations’ and ICOS data relevant to SDG 14.3 is distributed to IOC UNESCO’s IODE. While much of this work is currently performed manually, international efforts are underway to develop fully automated systems and these will be integrated as they become available.

How to cite: Pfeil, B., Jones, S., Karlsen, M., Stegen Landa, C., Castano Primo, R., Vermeulen, A., and Mirzov, O.: Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20215, https://doi.org/10.5194/egusphere-egu2020-20215, 2020.

Essential Ocean Variable Inorganic Carbon observations collected from instruments at sea are typically processed by individual PIs before submitting to data centres and other data archives. Often this work is done on an ad-hoc basis using unpublished, self-built software, and published in unique formats. This conflicts with the Interoperability and Reusability aspects of the FAIR data principles: such data requires significant reformatting efforts by data centres and/or end users, and reproducibility is impossible without a full record of the processing performed and QC decisions made by PIs. The manual nature of this process implies additional workload for PIs who need to submit their data to multiple archives/data product. There is a clear need to standardise the data workflow from measurement to publication using common, open source, and documented tools whose algorithms are fully accessible and all processing is recorded for full transparency.

The Ocean Thematic Centre of the European Research Infrastructure ICOS (Integrated Carbon Observation System) is developing QuinCe, a browser-based tool for uploading, processing, automatic and manual quality control, and publication of data from underway pCO₂ systems on ships and moorings. Data can be uploaded directly from instruments in any text format, where it is standardised and processed using algorithms approved by the scientific community. Automatic QC algorithms can detect many obvious data errors; afterwards PIs can perform full quality control of the data following Standard Operating Procedures and best practises. All records of QC decisions, with enforced explanatory notes, are recorded by the software to enable full traceability and reproducibility. The final QCed dataset can be downloaded by the PI, and is sent to the ICOS Carbon Portal and SOCAT project for publication. The ICOS Carbon Portal integrates marine data with ICOS data from the ecosystem and atmosphere on a regional scale and data is integrated via SOCAT in the annual Global Carbon Budgets of the Global Carbon Project where it informs policy/decision makers, the scientific community and the general public.

For platforms with operational data flows, the data is transmitted directly from ship to shore, QuinCe processes, quality controls and publishes Near Real Time data to the ICOS Carbon Portal and to Copernicus Marine Environmental Monitoring Services In Situ TAC as soon as it is received with no human intervention, greatly reducing the time from measurement to data availability.

Full metadata records for instruments are kept and maintained at the ICOS Carbon Portal, utilising existing standardised vocabularies and version control to maintain a complete history. The correct metadata for any given dataset is available at any time, and can be converted to any required format, allowing compliance with the United Nations Sustainable Development Goal 14.3.1 methodology ‘average marine acidity (pH) measured at agreed suite of representative sampling stations’ and ICOS data relevant to SDG 14.3 is distributed to IOC UNESCO’s IODE. While much of this work is currently performed manually, international efforts are underway to develop fully automated systems and these will be integrated as they become available.

How to cite: Pfeil, B., Jones, S., Karlsen, M., Stegen Landa, C., Castano Primo, R., Vermeulen, A., and Mirzov, O.: Browser based state-of-the-art software for automated data reduction, quality control and dissemination for marine carbon data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20215, https://doi.org/10.5194/egusphere-egu2020-20215, 2020.

EGU2020-20337 | Displays | ESSI1.1

Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-use

Leda Pecci, Michele Fichaut, and Dick Schaap

Marine and ocean data represent a significant resource that can be used to improve the global knowledge of the seas. A huge amount of data is produced every day by ocean observations all around Europe. The ability to leverage this valuable potential depends on the capacity of the already established European (EU) ocean data infrastructures to support new needs in the field of ocean data management and to adopt the emerging technologies.

The SeaDataNet e-infrastructure (https://www.seadatanet.org), built up in early 2000 years, plays an important role for marine scientists and other ocean stakeholders communities, giving access to more than 2.2 million multidisciplinary harmonised marine and ocean data sets coming mainly from the European seas collected by more than 110 data centers, and offering data products and metadata services. Thanks to the 4-year SeaDataCloud Horizon 2020 project, started the 1st of November 2016, the development of a more efficient electronic infrastructure, kept up with the times and offering new services, based on the cloud and High Performance Computing (HPC) technologies, was addressed. It has renewed the original SeaDataNet Information Technology (IT) architecture. The collaboration with the EUDAT consortium, composed of a number of research communities and large European computer and data centres, enabled the migration of the data storage and services into the cloud environment, new instruments, such as High-Frequency Radar (HFR), Flow Cytometer and Glider data, have been standardised in agreement with the respective user communities. Furthermore, a Virtual Research Environment will support research collaboration.

SDN infrastructure is focused on historical digital ocean data and also supports the management of data streams from sensors based on the Sensor Web Enablement (SWE) standards of the Open Geospatial Consortium (OGC).

Harmonisation of ocean data allows more countries to be able to use data for scientific research and for decision-making purpose but data re-use is related also to the trust that the ocean scientific community places in the data. The latter issue involves a well-defined process of data quality checks. In SDN, data producers have to label each individual measurement with a value according to the SDN Quality Check (QC) Flags, and they follow specific procedures presented in the SDN-QC guideline (https://www.seadatanet.org/Standards/Data-Quality-Control). Furthermore, a range of checks are carried out on the data, as part of the process of data products generation to improve the overall quality.

A relevant issue that limits data re-use is that some researchers are reluctant to share their own data, the push to encourage them it is to give them the right acknowledgment for the work done by means of the data citation, for this reason from the SDN portal a Digital Object Identifier (DOI) minting service is freely available for every data producer that shares their data. In addition, data versioning is available on the cloud platform for reproducible analysis.

The SeaDataCloud project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement Nº 730960.

How to cite: Pecci, L., Fichaut, M., and Schaap, D.: Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-use, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20337, https://doi.org/10.5194/egusphere-egu2020-20337, 2020.

Marine and ocean data represent a significant resource that can be used to improve the global knowledge of the seas. A huge amount of data is produced every day by ocean observations all around Europe. The ability to leverage this valuable potential depends on the capacity of the already established European (EU) ocean data infrastructures to support new needs in the field of ocean data management and to adopt the emerging technologies.

The SeaDataNet e-infrastructure (https://www.seadatanet.org), built up in early 2000 years, plays an important role for marine scientists and other ocean stakeholders communities, giving access to more than 2.2 million multidisciplinary harmonised marine and ocean data sets coming mainly from the European seas collected by more than 110 data centers, and offering data products and metadata services. Thanks to the 4-year SeaDataCloud Horizon 2020 project, started the 1st of November 2016, the development of a more efficient electronic infrastructure, kept up with the times and offering new services, based on the cloud and High Performance Computing (HPC) technologies, was addressed. It has renewed the original SeaDataNet Information Technology (IT) architecture. The collaboration with the EUDAT consortium, composed of a number of research communities and large European computer and data centres, enabled the migration of the data storage and services into the cloud environment, new instruments, such as High-Frequency Radar (HFR), Flow Cytometer and Glider data, have been standardised in agreement with the respective user communities. Furthermore, a Virtual Research Environment will support research collaboration.

SDN infrastructure is focused on historical digital ocean data and also supports the management of data streams from sensors based on the Sensor Web Enablement (SWE) standards of the Open Geospatial Consortium (OGC).

Harmonisation of ocean data allows more countries to be able to use data for scientific research and for decision-making purpose but data re-use is related also to the trust that the ocean scientific community places in the data. The latter issue involves a well-defined process of data quality checks. In SDN, data producers have to label each individual measurement with a value according to the SDN Quality Check (QC) Flags, and they follow specific procedures presented in the SDN-QC guideline (https://www.seadatanet.org/Standards/Data-Quality-Control). Furthermore, a range of checks are carried out on the data, as part of the process of data products generation to improve the overall quality.

A relevant issue that limits data re-use is that some researchers are reluctant to share their own data, the push to encourage them it is to give them the right acknowledgment for the work done by means of the data citation, for this reason from the SDN portal a Digital Object Identifier (DOI) minting service is freely available for every data producer that shares their data. In addition, data versioning is available on the cloud platform for reproducible analysis.

The SeaDataCloud project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement Nº 730960.

How to cite: Pecci, L., Fichaut, M., and Schaap, D.: Enhancing SeaDataNet e-infrastructure for ocean and marine data, new opportunities and challenges to foster data re-use, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20337, https://doi.org/10.5194/egusphere-egu2020-20337, 2020.

Discussion

EGU2020-21387 | Displays | ESSI1.1

Cultivating a mutually beneficial ocean science data management relationship with Brexit Nations

Andrew Conway, Adam Leadbetter, and Tara Keena

Integration of data management systems is a persistent problem in European projects that span multiple agencies. Months, if not years of projects are often expended on the integration of disparate database structures, data types, methodologies and outputs. Moreover, this work is usually confined to a single effort, meaning it is needlessly repeated on subsequent projects. The legacy effect of removing these barriers could therefore yield monetary and time savings for all involved, far beyond a single cross-jurisdictional project.

The European Union’s INTERREG VA Programme has funded the COMPASS project to better manage marine protected areas (MPA) in peripheral areas. Involving five organisations, spread across two nations, the project has developed a cross-border network for marine monitoring. Three of those organisations are UK-based and bound for Brexit (the Agri-Food and Biosciences Institute, Marine Scotland Science and the Scottish Association of Marine Science). With that network under construction, significant efforts have been placed on harmonizing data management processes and procedures between the partners.

A data management quality management framework (DM-QMF) was introduced to guide this harmonization and ensure adequate quality controls would be enforced. As lead partner on data management, the Irish Marine Institute (MI) initially shared guidelines for infrastructure, architecture and metadata. The implementation of those requirements were then left to the other four partners, with the MI acting as facilitator. This led to the following being generated for each process in the project:

Data management plan: Information on how and what data were to be generated as well as where it would be stored.

Flow diagrams: Diagrammatic overview of the flow of data through the project.

Standard Operating Procedures: Detailed explanatory documents on the precise workings of a process.

Data management processes were allowed to evolve naturally out of a need to adhere to this set standard. Organisations were able to work within their operational limitations, without being required to alter their existing procedures, but encouraged to learn from each other. Very quickly it was found that there were similarities in processes, where previously it was thought there were significant differences. This process of sharing data management information has created mutually benefiting synergies and enabled the convergence of procedures within the separate organisations.

The downstream data management synergies that COMPASS has produced have already taken effect. Sister INTERREG VA projects, SeaMonitor and MarPAMM, have felt the benefits. The same data management systems cultivated as part of the COMPASS project are being reused, while the groundwork in creating strong cross boundary channels of communication and cooperation are saving significant amounts of time in project coordination.

Through data management, personal and institutional relationships have been strengthened, both of which should persist beyond the project terminus in 2021, well into a post-Brexit Europe. The COMPASS project has been an exemplar of how close collaboration can persist and thrive in a changing political environment, in spite of the ongoing uncertainty surrounding Brexit.

How to cite: Conway, A., Leadbetter, A., and Keena, T.: Cultivating a mutually beneficial ocean science data management relationship with Brexit Nations, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21387, https://doi.org/10.5194/egusphere-egu2020-21387, 2020.