Content:

ESSI – Earth & Space Science Informatics

ESSI1.2 – Addressing Training Data Challenges to Accelerate Earth Science Machine Learning

EGU21-12065 | vPICO presentations | ESSI1.2

Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance

Benjamin Kellenberger, Devis Tuia, and Dan Morris

Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.

In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.

Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.

AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.

AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.

 

References

How to cite: Kellenberger, B., Tuia, D., and Morris, D.: Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12065, https://doi.org/10.5194/egusphere-egu21-12065, 2021.

EGU21-6853 | vPICO presentations | ESSI1.2

Curator: A No-Code Self-Supervised Learning and Active Labeling Tool to Create Labeled Image Datasets from Petabyte-Scale Imagery

Rudy Venguswamy, Mike Levy, Anirudh Koul, Satyarth Praveen, Tarun Narayanan, Ajay Krishnan, Jenessa Peterson, Siddha Ganju, and Meher Kasam

Machine learning modeling for Earth events at NASA is often limited by the availability of labeled examples. For example, training classifiers for forest fires or oil spills from satellite imagery requires curating a massive and diverse dataset of example forest fires, a tedious multi-month effort requiring careful review of over 196.9 million square miles of data per day for 20 years. While such images might exist in abundance within 40 petabytes of unlabeled satellite data, finding these positive examples to include in a training dataset for a machine learning model is extremely time-consuming and requires researchers to "hunt" for positive examples, like finding a needle in a haystack. 

We present a no-code open-source tool, Curator, whose goal is to minimize the amount of human manual image labeling needed to achieve a state of the art classifier. The pipeline, purpose-built to take advantage of the massive amount of unlabeled images, consists of (1) self-supervision training to convert unlabeled images into meaningful representations, (2) search-by-example to collect a seed set of images, (3) human-in-the-loop active learning to iteratively ask for labels on uncertain examples and train on them. 

In step 1, a model capable of representing unlabeled images meaningfully is trained with a self-supervised algorithm (like SimCLR) on a random subset of the dataset (that conforms to researchers’ specified “training budget.”). Since real-world datasets are often imbalanced leading to suboptimal models, the initial model is used to generate embeddings on the entire dataset. Then, images with equidistant embeddings are sampled. This iterative training and resampling strategy improves both balanced training data and models every iteration. In step 2, researchers supply an example image of interest, and the output embeddings generated from this image are used to find other images with embeddings near the reference image’s embedding in euclidean space (hence similar looking images to the query image). These proposed candidate images contain a higher density of positive examples and are annotated manually as a seed set. In step 3, the seed labels are used to train a classifier to identify more candidate images for human inspection with active learning. Each classification training loop, candidate images for labeling are sampled from the larger unlabeled dataset based on the images that the model is most uncertain about (p ≈ 0.5).

Curator is released as an open-source package built on PyTorch-Lightning. The pipeline uses GPU-based transforms from the NVIDIA-Dali package for augmentation, leading to a 5-10x speed up in self-supervised training and is run from the command line.

By iteratively training a self-supervised model and a classifier in tandem with human manual annotation, this pipeline is able to unearth more positive examples from severely imbalanced datasets which were previously untrainable with self-supervision algorithms. In applications such as detecting wildfires, atmospheric dust, or turning outward with telescopic surveys, increasing the number of positive candidates presented to humans for manual inspection increases the efficacy of classifiers and multiplies the efficiency of researchers’ data curation efforts.

How to cite: Venguswamy, R., Levy, M., Koul, A., Praveen, S., Narayanan, T., Krishnan, A., Peterson, J., Ganju, S., and Kasam, M.: Curator: A No-Code Self-Supervised Learning and Active Labeling Tool to Create Labeled Image Datasets from Petabyte-Scale Imagery, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6853, https://doi.org/10.5194/egusphere-egu21-6853, 2021.

Data preparation process generally consumes up to 80% of the Data Scientists time, with 60% of that being attributed to cleaning and labeling data.[1]  Our solution is to use automated pipelines to prepare, annotate, and catalog data. The first step upon ingestion, especially in the case of real world—unstructured and unlabeled datasets—is to leverage Snorkel, a tool specifically designed around a paradigm to rapidly create, manage, and model training data. Configured properly, Snorkel can be leveraged to temper this labeling bottle-neck through a process called weak supervision. Weak supervision uses programmatic labeling functions—heuristics, distant supervision, SME or knowledge base—scripted in python to generate “noisy labels”. The function traverses the entirety of the dataset and feeds the labeled data into a generative—conditionally probabilistic—model. The function of this model is to output the distribution of each response variable and predict the conditional probability based on a joint probability distribution algorithm. This is done by comparing the various labeling functions and the degree to which their outputs are congruent to each other. A single labeling function that has a high degree of congruence with other labeling functions will have a high degree of learned accuracy, that is, the fraction of predictions that the model got right. Conversely, single labeling functions that have a low degree of congruence with other functions will have low learned accuracy. Each prediction is then combined by the estimated weighted accuracy, whereby the predictions of the higher learned functions are counted multiple times. The result yields a transformation from a binary classification of 0 or 1 to a fuzzy label between 0 and 1— there is “x” probability that based on heuristic “n”, the response variable is “y”. The addition of data to this generative model multi-class inference will be made on the response variables positive, negative, or abstain, assigning probabilistic labels to potentially millions of data points. Thus, we have generated a discriminative ground truth for all further labeling efforts and have improved the scalability of our models. Labeling functions can be applied to unlabeled data to further machine learning efforts.
 
Once our datasets are labeled and a ground truth is established, we need to persist the data into our delta lake since it combines the most performant aspects of a warehouse with the low-cost storage for data lakes. In addition, the lake can accept unstructured, semi structured, or structured data sources, and those sources can be further aggregated into raw ingestion, cleaned, and feature engineered data layers.  By sectioning off the data sources into these “layers”, the data engineering portion is abstracted away from the data scientist, who can access model ready data at any time.  Data can be ingested via batch or stream. 
 
The design of the entire ecosystem is to eliminate as much technical debt in machine learning paradigms as possible in terms of configuration, data collection, verification, governance, extraction, analytics, process management, resource management, infrastructure, monitoring, and post verification. 

How to cite: Meil, J.: Programmatic Labeling of Dark Data for Artificial Intelligence in Spatial Informatics, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16326, https://doi.org/10.5194/egusphere-egu21-16326, 2021.

EGU21-15297 | vPICO presentations | ESSI1.2

Collecting training data to map forest management at global scale

Myroslava Lesiv, Dmitry Schepaschenko, Martina Dürauer, Marcel Buchhorn, Ivelina Georgieva, and Steffen Fritz

Spatially explicit information on forest management at a global scale is critical for understanding the current status of forests for sustainable forest management and restoration. Whereas remotely sensed based datasets, developed by applying ML and AI algorithms, can successfully depict tree cover and other land cover types, it has not yet been used to depict untouched forest and different degrees of forest management. We show for the first time that with sufficient training data derived from very high-resolution imagery a differentiation within the tree cover class of various levels of forest management is possible.

In this session, we would like to present our approach for labeling forest related training data by using Geo-Wiki application (https://www.geo-wiki.org/). Moreover, we would like to share a new open global training data set on forest management we collected from a series of Geo-Wiki campaigns. In February 2019, we organized an expert workshop to (1) discuss the variety of forest management practices that take place in different parts of the world; (2) generalize the definitions for the application at global scale; (3) finalize the Geo-Wiki interface for the crowdsourcing campaigns; and (4) build a data set of control points (or the expert data set), which we used later to monitor the quality of the crowdsourced contributions by the volunteers. We involved forest experts from different regions around the world to explore what types of forest management information could be collected from visual interpretation of very high-resolution images from Google Maps and Microsoft Bing, in combination with Sentinel time series and Normalized Difference Vegetation Index (NDVI) profiles derived from Google Earth Engine (GEE). Based on the results of this analysis, we expanded these campaigns by involving a broader group of participants, mainly people recruited from remote sensing, geography and forest research institutes and universities.

In total, we collected forest data for approximately 230 000 locations globally. These data are of sufficient density and quality and therefore could be used in many ML and AI applications for forests at regional and local scale.  We also provide an example of ML application, a remotely sensed based global forest management map at a 100 m resolution (PROBA-V) for the year 2015. It includes such classes as intact forests, forests with signs of human impact, including clear cuts and logging, replanted forest, woody plantations with a rotation period up to 15 years, oil palms and agroforestry. The results of independent statistical validation show that the map’s overall accuracy is 81%.

How to cite: Lesiv, M., Schepaschenko, D., Dürauer, M., Buchhorn, M., Georgieva, I., and Fritz, S.: Collecting training data to map forest management at global scale, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15297, https://doi.org/10.5194/egusphere-egu21-15297, 2021.

EGU21-10347 | vPICO presentations | ESSI1.2

gprMax: An Open Source Electromagnetic Simulator for Generating Big Data for Ground Penetrating Radar Applications

Craig Warren, Iraklis Giannakis, and Antonios Giannopoulos

Lack of well-labelled and coherent training data is the main reason why machine learning (ML) and data-driven interpretations are not established in the field of Ground-Penetrating Radar (GPR). Non-representative and limited datasets lead to non-reliable ML-schemes that overfit, and are unable to compete with traditional deterministic approaches. To that extent, numerical data can potentially complement insufficient measured datasets and overcome this lack of data, even in the presence of large feature spaces.

Using synthetic data in ML is not new and it has been extensively applied to computer vision. Applying numerical data in ML requires a numerical framework capable of generating synthetic but nonetheless realistic datasets. Regarding GPR, such a framework is possible using gprMax, an open source electromagnetic solver, fine-tuned for GPR applications [1], [2], [3]. gprMax is fully parallelised and can be run using multiple CPU’s and GPU’s. In addition, it has a flexible scriptable format that makes it easy to generate big data in a trivial manner. Stochastic geometries, realistic soils, vegetation, targets [3] and models of commercial antennas [4], [5] are some of the features that can be easily incorporated in the training data.

The capability of gprMax to generate realistic numerical datasets is demonstrated in [6], [7]. The investigated problem is assessing the depth and the diameter of rebars in reinforced concrete. Estimating the diameter of rebars using GPR is particularly challenging with no conclusive solution. Using a synthetic training set, generated using gprMax, we managed to effectively train ML-schemes capable of estimating the diameter of rebar in an accurate and efficient manner [6], [7]. The aforementioned case studies support the premise that gprMax has the potential to provide realistic training data to applications where well-labelled data are not available, such as landmine detection, non-destructive testing and planetary sciences.

References

[1] Warren, C., Giannopoulos, A. & Giannakis, I., (2016). gprMax: Open Source software to simulate electromagnetic wave propagation for Ground Penetrating Radar, Computer Physics Communications, 209, 163-170.

[2] Warren, C., Giannopoulos, A., Gray, A., Giannakis, I., Patterson, A., Wetter, L. & Hamrah, A., (2018). A CUDA-based GPU engine for gprMax: Open source FDTD, electromagnetic simulation software. Computer Physics Communications, 237, 208-218.

[3] Giannakis, I., Giannopoulos, A. & Warren, C. (2016). A realistic FDTD numerical modeling framework of Ground Penetrating Radar for landmine detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 9(1), 37-51.

[4] Giannakis, I., Giannopoulos, A. & Warren, C., (2018). Realistic FDTD GPR antenna models optimized using a novel linear/non-linear full waveform inversion. IEEE Transactions on Geoscience and Remote Sensing, 207(3), 1768-1778.

[5] Warren, C., Giannopoulos, A. (2011). Creating finite-difference time-domain models of commercial ground-penetrating radar antennas using Taguchi’s optimization method. Geophysics, 76(2), G37-G47

[6] Giannakis, I., Giannopoulos, A. & Warren, C. (2021). A Machine Learning Scheme for  Estimating the Diameter of Reinforcing Bars Using Ground Penetrating Radar. IEEE Geoscience and Remote Sensing Letters.

[7] Giannakis, I., Giannopoulos, A., & Warren, C. (2019). A machine learning-based fast-forward solver for ground penetrating radar with application to full-waveform inversion. IEEE Transactions on Geoscience and Remote Sensing. 57(7), 4417-4426.

How to cite: Warren, C., Giannakis, I., and Giannopoulos, A.: gprMax: An Open Source Electromagnetic Simulator for Generating Big Data for Ground Penetrating Radar Applications, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10347, https://doi.org/10.5194/egusphere-egu21-10347, 2021.

EGU21-6490 | vPICO presentations | ESSI1.2 | Highlight

It's a Bird it's a Plane it's a Meteor

Surya Ambardar, Siddha Ganju, and Peter Jenniskens

Meteor showers are some of the most dazzling and memorable events occuring in the night sky. Caused by bits of celestial debris from comets and asteroids entering Earth’s atmosphere at astronomical speeds, meteors are bright streaks of light in the night sky, sometimes called shooting stars. Those meteors are recorded, tracked and triangulated by low-light surveillance cameras in a project called CAMS: Cameras for Allsky Meteor Surveillance. CAMS offers insights into a universe of otherwise invisible solar system bodies, but that task has proven difficult due to the lack of automated supervision. Until recently, much of the data control was done by hand. Necessary to build supervised classification models,  labeled training data is essential because other man-made objects such as airplanes and satellites can be mistaken for meteors. To address this issue, we leverage one year's worth of meteor activity data from CAMS to provide weak supervision for over a decade of collected data, drastically reducing the amount of manual annotation necessary and expanding available labelled meteor training data.

 

Founded in 2010, CAMS aims to automate video surveillance of the night sky to validate the International Astronomical Union’s Working List of Meteor Showers, discover new meteor showers, and predict future meteor showers. Since 2010, CAMS has collected a decade's worth of night sky activity data in the form of astrometric tracks and brightness profiles, a year of which has been manually annotated. We utilize this one year's labelled data to train a high confidence LSTM meteor classifier to generate low confidence labels for the remaining decade’s worth of meteor data. Our classifier yields confidence levels for each prediction, and when the confidence lies above a statistically significant threshold, predicted labels can be treated as weak supervision for future training runs. Remaining predictions below the threshold can be manually annotated. Using a high threshold minimizes label noise and ensures instances are correctly labeled while considerably reducing the  amount of data that needs to be annotated. Weak supervision can be confirmed by checking date ranges and data distributions for known meteor showers to verify predicted labels.

 

To encourage discovery and distribution of training data and models, we additionally provide scripts to automate data ingestion and model training from raw camera data files. The data scripts handle processing of CAMS data, providing a pipeline to encourage open sharing and reproduction of our research. Additionally, we provide code for a LSTM classifier baseline model which can identify probable meteors. This baseline model script allows further exploration of CAMS data and an opportunity to experiment with other model types.  

 

In conclusion, our contributions are (1) a weak supervision method utilizing a year’s worth of labelled CAMS data to generate labels for a decade’s worth of data, along with (2) baseline data processing and model scripts to encourage open discovery and distribution. Our unique contributions expand access to labeled training meteor data and make the data globally and publicly accessible thorough daily generated maps of meteor shower activity posted at http://cams.seti.org/FDL/. 

How to cite: Ambardar, S., Ganju, S., and Jenniskens, P.: It's a Bird it's a Plane it's a Meteor, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6490, https://doi.org/10.5194/egusphere-egu21-6490, 2021.

EGU21-11957 | vPICO presentations | ESSI1.2

Conditional spatio-temporal random crop for weak labeled SAR datasets

Francesco Asaro, Gianluca Murdaca, and Claudio Maria Prati

This work presents a methodology to improve supervised learning of segmentation tasks for convolutional architectures in unbalanced and weak labeled synthetic aperture radar (SAR) dataset scenarios, which characterize the Earth Observation (EO) domain. The presented methodology exploits multitemporality and stochasticity to regularize training by reducing overfitting and thus improving validation and test performances.

Traditional precisely annotated datasets are made of patches extracted from a set of image-label pairs, often in a deterministic fashion. Through a set of experiments, we show that this approach is sub-optimal when using weak labels since it leads to early overfitting, mainly because weak labels only mark the simplest features of the target class.

The presented methodology builds up the dataset from a multitemporal stack of images aligned with the weakly labeled ground truth and samples the patches both in time and space. The patches are selected only if a given condition of the positive class frequency is met. We show learning improvements against the traditional methodology by applying our strategy to a benchmark task, which consists of training a typical deep convolutional network, Unet (Ronneberger et al, 2015), for the segmentation of water surfaces in SAR images.

The dataset sources are Sentinel-1, calibrated sigma zero, VV-VH polarized, single-look, intensity images for the inputs, and the Copernicus’s “Water and Wetness High Resolution Layer” for the weak labels. To avoid spatial autocorrelation phenomena, the training set covers the Low Countries (Belgium, the Netherlands, and Luxembourg), while the validation and test-set span the Padana plain area (Italy). The training dataset is built up according to the methodology, while the validation and test datasets are defined in a deterministic fashion as usual.

We show the beneficial effects of multitemporality, stochasticity, and conditional selection in three different sets of experiments, as well as in a combined one. In particular, we observe performance improvements in terms of the F-1 score, which increases together with the degree of multitemporality (number of images in the stack), as well as when stochasticity and conditional rules that compensate the under-representation of the positive class are added. Furthermore, we show that in the specific framework of SAR data, the introduction of multitemporality improves the learned representation of the speckle, thus implicitly optimizing the Unet for both the filtering and segmentation tasks. We prove this by comparing the number of looks of the input patch to that of the patch reconstructed before the classification layer.

Overall, in this framework, we show that solely using the presented training strategy, the classifier's performance improves up to 5% in terms of the F-1 score.

How to cite: Asaro, F., Murdaca, G., and Prati, C. M.: Conditional spatio-temporal random crop for weak labeled SAR datasets, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11957, https://doi.org/10.5194/egusphere-egu21-11957, 2021.

EGU21-1762 | vPICO presentations | ESSI1.2

RainBench: Enabling Data-Driven Precipitation Forecasting on a Global Scale

Christian Schroeder de Witt, Catherine Tong, Valentina Zantedeschi, Daniele De Martini, Alfredo Kalaitzis, Matthew Chantry, Duncan Watson-Parris, and Piotr Bilinski

Climate change is expected to aggravate extreme precipitation events, directly impacting the livelihood of millions. Without a global precipitation forecasting system in place, many regions – especially those constrained in resources to collect expensive ground station data – are left behind. To mitigate such unequal reach of climate change, a solution is to alleviate the reliance on numerical models (and by extension ground station data) by enabling machine-learning-based global forecasts from satellite imagery. Though prior works exist in regional precipitation nowcasting, there lacks work in global, medium-term precipitation forecasting. Importantly, a common, accessible baseline for meaningful comparison is absent. In this work, we present RainBench, a multi-modal benchmark dataset dedicated to advancing global precipitation forecasting. We establish baseline tasks and release PyRain, a data-handling pipeline to enable efficient processing of decades-worth of data by any modeling framework. Whilst our work serves as a basis for a new chapter on global precipitation forecasting from satellite imagery, the greater promise lies in the community joining forces to use our released datasets and tools in developing machine learning approaches to tackle this important challenge.

How to cite: Schroeder de Witt, C., Tong, C., Zantedeschi, V., De Martini, D., Kalaitzis, A., Chantry, M., Watson-Parris, D., and Bilinski, P.: RainBench: Enabling Data-Driven Precipitation Forecasting on a Global Scale, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1762, https://doi.org/10.5194/egusphere-egu21-1762, 2021.

EGU21-4683 | vPICO presentations | ESSI1.2

Improved Training for Machine Learning: The Additional Potential of Innovative Algorithmic Approaches.

Octavian Dumitru, Gottfried Schwarz, Mihai Datcu, Dongyang Ao, Zhongling Huang, and Mila Stillman

During the last years, much progress has been reached with machine learning algorithms. Among the typical application fields of machine learning are many technical and commercial applications as well as Earth science analyses, where most often indirect and distorted detector data have to be converted to well-calibrated scientific data that are a prerequisite for a correct understanding of the desired physical quantities and their relationships.

However, the provision of sufficient calibrated data is not enough for the testing, training, and routine processing of most machine learning applications. In principle, one also needs a clear strategy for the selection of necessary and useful training data and an easily understandable quality control of the finally desired parameters.

At a first glance, one could guess that this problem could be solved by a careful selection of representative test data covering many typical cases as well as some counterexamples. Then these test data can be used for the training of the internal parameters of a machine learning application. At a second glance, however, many researchers found out that a simple stacking up of plain examples is not the best choice for many scientific applications.

To get improved machine learning results, we concentrated on the analysis of satellite images depicting the Earth’s surface under various conditions such as the selected instrument type, spectral bands, and spatial resolution. In our case, such data are routinely provided by the freely accessible European Sentinel satellite products (e.g., Sentinel-1, and Sentinel-2). Our basic work then included investigations of how some additional processing steps – to be linked with the selected training data – can provide better machine learning results.

To this end, we analysed and compared three different approaches to find out machine learning strategies for the joint selection and processing of training data for our Earth observation images:

  • One can optimize the training data selection by adapting the data selection to the specific instrument, target, and application characteristics [1].
  • As an alternative, one can dynamically generate new training parameters by Generative Adversarial Networks. This is comparable to the role of a sparring partner in boxing [2].
  • One can also use a hybrid semi-supervised approach for Synthetic Aperture Radar images with limited labelled data. The method is split in: polarimetric scattering classification, topic modelling for scattering labels, unsupervised constraint learning, and supervised label prediction with constraints [3].

We applied these strategies in the ExtremeEarth sea-ice monitoring project (http://earthanalytics.eu/). As a result, we can demonstrate for which application cases these three strategies will provide a promising alternative to a simple conventional selection of available training data.

[1] C.O. Dumitru et. al, “Understanding Satellite Images: A Data Mining Module for Sentinel Images”, Big Earth Data, 2020, 4(4), pp. 367-408.

[2] D. Ao et. al., “Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X”, Remote Sensing, 2018, 10(10), pp. 1-23.

[3] Z. Huang, et. al., "HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Images", IEEE Transactions on Geoscience and Remote Sensing, 2020, pp.1-18.

How to cite: Dumitru, O., Schwarz, G., Datcu, M., Ao, D., Huang, Z., and Stillman, M.: Improved Training for Machine Learning: The Additional Potential of Innovative Algorithmic Approaches., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4683, https://doi.org/10.5194/egusphere-egu21-4683, 2021.

EGU21-12384 | vPICO presentations | ESSI1.2

AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data.

Alastair McKinstry, Oisin Boydell, Quan Le, Inder Preet, Jennifer Hanafin, Manuel Fernandez, Adam Warde, Venkatesh Kannan, and Patrick Griffiths

The ESA-funded AIREO project [1] sets out to produce AI-ready training dataset specifications and best practices to support the training and development of machine learning models on Earth Observation (EO) data. While the quality and quantity of EO data has increased drastically over the past decades, availability of training data for machine learning applications is considered a major bottleneck. The goal is to move towards implementing FAIR data principles for training data in EO, enhancing especially the finability, interoperability and reusability aspects.  To achieve this goal, AIREO sets out to provide a training data specification and to develop best practices for the use of training datasets in EO. An additional goal is to make training data sets self-explanatory (“AI-ready) in order to expose challenging problems to a wider audience that does not have expert geospatial knowledge. 

Key elements that are addressed in the AIREO specification are granular and interoperable metadata (based on STAC), innovative Quality Assurance metrics, data provenance and processing history as well as integrated feature engineering recipes that optimize platform independence. Several initial pilot datasets are being developed following the AIREO data specifications. These pilot applications include for example  forest biomass, sea ice detection and the estimation of atmospheric parameters.An API for the easy exploitation of these datasets will be provided.to allow the Training Datasets (TDS) to work against EO catalogs (based on OGC STAC catalogs and best practises from ML community) to allow updating and updated model training over time.

 

This presentation will present the first version of the AIREO training dataset specification and will showcase some elements of the best-practices that were developed. The AIREO compliant pilot datasets will be presented which are openly accessible and community feedback is explicitly encouraged. 



[1] https://aireo.net/

How to cite: McKinstry, A., Boydell, O., Le, Q., Preet, I., Hanafin, J., Fernandez, M., Warde, A., Kannan, V., and Griffiths, P.: AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12384, https://doi.org/10.5194/egusphere-egu21-12384, 2021.

ESSI1.6 – Spatio-temporal data science: advances in computational geosciences and innovative evaluation tools for weather and climate science

EGU21-13357 | vPICO presentations | ESSI1.6

Overcoming challenges in spatio-temporal modelling of large-scale (global) data

Aoibheann Brady, Jonathan Rougier, Yann Ziegler, Bramha Dutt Vishwakarma, Sam Royston, Stephen Chuter, Richard Westaway, and Jonathan Bamber

Modelling spatio-temporal data on a large scale presents a number of obstacles for statisticians and environmental scientists. Issues such as computational complexity, combining point and areal data, separation of sources into their component processes, and the handling of both large volumes of data in some areas and sparse data in others must be considered. We discuss methods to overcome such challenges within a Bayesian hierarchical modelling framework using INLA.

In particular, we illustrate the approach using the example of source-separation of geophysical signals both on a continental and global scale. In such a setting, data tends to be available both at a local and areal level. We propose a novel approach for integrating such sources together using the INLA-SPDE method, which is normally reserved for point-level data. Additionally, the geophysical processes involved are both spatial (time-invariant) and spatio-temporal in nature. Separation of such processes into physically sensible components requires careful modelling and consideration of priors (such as physical model outputs where data is sparse), which will be discussed. We also consider methods to overcome the computational costs of modelling on such a large scale, from efficient mesh design, to thinning/aggregating of data, to considering alternative approaches for inference. This holistic approach to modelling of large-scale data ensures that spatial and spatio-temporal processes can be sensibly separated into their component parts, without being prohibitively expensive to model.

How to cite: Brady, A., Rougier, J., Ziegler, Y., Vishwakarma, B. D., Royston, S., Chuter, S., Westaway, R., and Bamber, J.: Overcoming challenges in spatio-temporal modelling of large-scale (global) data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13357, https://doi.org/10.5194/egusphere-egu21-13357, 2021.

EGU21-1255 | vPICO presentations | ESSI1.6

Scientific workflow scheduling based on data transformation graph for remote sensing application

Zhuojing tian, Zhenchun huang, Yinong zhang, Yanwei zhao, En fu, and Shuying wang

Abstract: As the amount of data and computation of scientific workflow applications continue to grow, distributed and heterogeneous computing infrastructures such as inter-cloud environments provide this type of application with a great number of computing resources to meet corresponding needs. In the inter-cloud environment, how to effectively map tasks to cloud service providers to meet QoS(quality of service) constraints based on user requirements has become an important research direction. Remote sensing applications need to process terabytes of data each time, however frequent and huge data transmission across the cloud will bring huge performance bottlenecks for execution, and seriously affect the result of QoS constraints such as makespan and cost. Using a data transformation graph(DTG) to study the data transfer process of global drought detection application, the specific optimization strategy is obtained based on the characteristics of application and environment, and according to this, one inter-cloud workflow scheduling method based on genetic algorithm is proposed, under the condition of satisfying the user’s QoS constraints, the makespan the cost can be minimized. The experimental results show that compared with the standard genetic algorithm, random algorithm, random algorithm, and round-robin algorithm, the optimized genetic algorithm can greatly improve the scheduling performance of data computation-intensive scientific workflows such as remote sensing applications and reduce the impact of performance bottlenecks.

Keywords: scientific workflow scheduling; inter-cloud environment; remote sensing application; data transformation graph;

How to cite: tian, Z., huang, Z., zhang, Y., zhao, Y., fu, E., and wang, S.: Scientific workflow scheduling based on data transformation graph for remote sensing application, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1255, https://doi.org/10.5194/egusphere-egu21-1255, 2021.

EGU21-14423 | vPICO presentations | ESSI1.6

Biclustering for uncovering spatial-temporal patterns in telecom data

Nastasija Grujić, Sanja Brdar, Olivera Novović, Nikola Obrenović, Miro Govedarica, and Vladimir Crnojević

Understanding human dynamics is of crucial importance for managing human activities for sustainable development. According to the United Nations, 68% of people will live in cities by 2050. Therefore, it is important to understand human footprints in order to develop policies that will improve the lives in urban and suburban areas. Our study aims at detecting spatial-temporal activity patterns from mobile phone data provided by a telecom service provider. To be more precise we used the activity data set which contains the amount of sent/received SMS, calls, as well as internet usage per radio-base station in defined time-stamps. The case study focus is on the capital city of Serbia, Belgrade, which has have nearly 2 million inhabitants and included the month of February 2020 in the analysis. We applied the biclustering (spectral co-clustering) algorithm on the telecom data to detect locations in the city that behave similarly in the specific time windows. Biclustering is a data mining technique that is being used for finding homogeneous submatrices among rows and columns of a matrix, widely used in text mining and gene expression data analysis.  Although, there are no examples in the literature of the algorithm usage on location-based data for urban application, we have seen the potential due to its ability to detect clusters in a more refined way, during a specific period of time that could not otherwise be detected with global clustering approach. To prepare the data for the algorithm appliance, we normalized each type of activity (SMS/Call In/Out and Internet activity) and aggregated the total activity on each antenna per hour. We transformed the data into the matrix, where rows were presenting the antennas, and columns the hours. The algorithm was applied for each day separately. On average number of discovered biclusters was 5, usually corresponding to regular based activities, such as work, home, commuting, and free time, but also to the city’s nightlife. Our results confirmed that urban spaces are the function of space and time. They revealed different functionalities of the urban and suburban parts in the city. We observed the presence of patterned behavior across the analyzed days. The type of day dictated the spatial-temporal activities that occurred. We distinguished different types of days, such as working days (Monday to Thursday), Fridays, weekends, and holidays. These findings showed the promising potential of the biclustering algorithm and could be utilized by policymakers for precisely detecting activity clusters across space and time that correspond to specific functions of the city.

How to cite: Grujić, N., Brdar, S., Novović, O., Obrenović, N., Govedarica, M., and Crnojević, V.: Biclustering for uncovering spatial-temporal patterns in telecom data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14423, https://doi.org/10.5194/egusphere-egu21-14423, 2021.

EGU21-9749 | vPICO presentations | ESSI1.6

Cluster analysis in the studying of stress relation in the Vrancea-zone

Lili Czirok, Lukács Kuslits, and Katalin Gribovszki

The SE-Carpathians produce significant geodynamic activity due to the current subduction process. The strong seismicity in the Vrancea-zone is its most important indicator. The focus area of these seismic events is relatively small, around 80*100 km and the distribution of their locations is quiet dense.

The authors have carried out cluster analyses of the focal mechanism solutions estimated from local and tele-seismic measurements and stress inversions to support the recent and previously published studies in this region. They have applied different pre-existing clustering methods – e.g. HDBSCAN (hierarchical density-based clustering for applications with noise) and agglomerative hierarchical analysis – considering to the geographical coordinates, focal depths and parameters of the focal mechanism solutions of the used seismic events, as well. Moreover, they have attempted to improve a fully-automated algorithm for the classification of the earthquakes for the estimations. This algorithm does not call for the setting of hyper-parameters, thus the affection of the subjectivity can be reduced significantly and the running time can be also decreased. In all cases, the resulted stress tensors are in close agreement with the earlier presented results.

How to cite: Czirok, L., Kuslits, L., and Gribovszki, K.: Cluster analysis in the studying of stress relation in the Vrancea-zone, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9749, https://doi.org/10.5194/egusphere-egu21-9749, 2021.

EGU21-12324 | vPICO presentations | ESSI1.6 | Highlight

AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series

Christina Pacher, Irene Schicker, Rosmarie DeWit, and Claudia Plant
Both clustering and outlier detection play an important role in meteorology. With clustering large sets of data points, such as numerical weather predicition (NWP) model data or observation sites, are separated into groups based on the characteristics found in the data grouping similar data points in a cluster. Clustering enables one, too, to detect outliers in the data. The resulting clusters are useful in many ways such as atmospheric pattern recognition (e.g. clustering NWP ensemble predictions to estimate the likelihood of the predicted weather patterns), climate applications (grouping point observations for climate pattern recognition), forecasting(e.g. data pool enhancement using data of similar sites for forecasting applications), in urban meteorology, air quality, renewable energy systems, and hydrologogical applications.  
 
Typically, one does not know in advance how many clusters or groups are present in the data. However, for algorithms such as K-means one needs to define how many clusters one wants to have as an outcome. With the proposed novel algorithm AWT,  a modified combination of several well-known clustering algorithms, this is not needed. It chooses the number of clusters automatically based on a user-defined threshold parameter. Furthermore, the algorithm can be used for heterogeneous meteorological input data as well as data sets that exceed the available memory size.
Similar as the classical BIRCH algorithm, our method AWT works on a multi-resolution data structure, an Aggregated Wavelet Tree that is suitable for representing multivariate time series. In contrast to BIRCH, the user does not need to specify the number of clusters K, as that is difficult in our application. Instead, AWT relies on a single threshold parameter for clustering and outlier detection. This threshold corresponds to the highest resolution of the tree. Points that are not in any cluster with respect to the threshold are naturally flagged as outliers.
 
With the recent increasing usage of non-traditional data sources, such as private, smart-home weather station, in NWP  models and other forecasting and applications outlier and clustering methods are useful in pre-processing and filtering these rather novel data sources. Especially in urban areas changes in the surface energy balance caused by urbanization result in temperatures generally being higher in cities than in the surrounding areas. In order to capture the spatial features of this effect data with high spatial resoltion are necessary. Here, these privately owned smart-home weather stations are useful as often only a limited number of official observation sites exist. However, to be able to use these data they need to be pre-processed.  
  
In this work we apply our novel algorithm AWT to crowdsourced data from the city of Vienna. We demonstrate the skill of the algorithm in outlier detection and filtering as well as clustering the data and evaluate it against commonly used algorithms. Furthermore, we show how one could use the algorithm in renewable energy applications.

How to cite: Pacher, C., Schicker, I., DeWit, R., and Plant, C.: AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12324, https://doi.org/10.5194/egusphere-egu21-12324, 2021.

EGU21-12734 | vPICO presentations | ESSI1.6

Statistical downscaling of wind speed time series data based on topographic variables

Wenxuan Hu, Yvonne Scholz, Madhura Yeligeti, Lüder von Bremen, and Marion Schroedter-Homscheidt

Renewable energy sources such as wind energy play a crucial role in most climate change mitigation scenarios because of their ability to significantly reduce energy-related carbon emissions. In order to understand and design future energy systems, detailed modeling of renewable energy sources is important. In the light of making energy system modelling possible at all variability scales of local weather conditions, renewable energy source information with high resolution in both space and time are required.

Nowadays, renewable energy resources data that are widely used among the energy modeling community are reanalysis data such as ERA5, COSMO REA6, and MERRA2. Taking wind speed as an example, reanalysis data can provide long term spatially resolved wind information on any desired height in a physically consistent way. However, their spatial resolution is coarse. In order to obtain a fine spatial resolution data focusing on wind speed, this paper proposes a statistical downscaling method based on reanalysis data, observation data, and the local topography.

While most statistical wind downscaling studies have focused on obtaining site specific data or downscaling probability density functions, this paper focuses on downscaling one-year hourly wind speed time series for Europe to 0.00833 degree X 0.00833 degree (approximately 1km X 1km) resolution. It has been proven by various studies that the local topography influences wind speed. The topographic structure in this study is determined by two metrics: TPI, a topographic position index that compares the elevation of each cell to the mean elevation of the neighborhood areas and Sx, a slope-based, direction-dependent parameter that describes the topography in the upwind direction. The observation data used in this study are MeteoSwiss measurement values which provide the hourly wind speed time series at the station heights. For each weather station with observation data, biases described by the local terrain features are introduced to minimize the root mean square error (RMS) and Kolmogorov-Smirnov D (KSD) statistic between the corrected and the observed wind speed. These biases are then assigned to grid points with the same terrain types as the weather station, which enables downscaling of the wind speed for whole Europe.

The results show that this downscaling method can improve the RMS and KSD for both ERA5 and COSMO REA6, especially at mountain ridges, which indicates that it can not only decrease the bias, but also provide a better match to the observed wind speed distributions.

How to cite: Hu, W., Scholz, Y., Yeligeti, M., von Bremen, L., and Schroedter-Homscheidt, M.: Statistical downscaling of wind speed time series data based on topographic variables, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12734, https://doi.org/10.5194/egusphere-egu21-12734, 2021.

EGU21-13210 | vPICO presentations | ESSI1.6

Spatio-temporal clustering methodologies for point-event natural hazards

Uldis Zandovskis, Bruce D. Malamud, and Davide Pigoli

Natural hazards are inherently spatio-temporal processes. Spatio-temporal clustering methodologies applied to natural hazard data can help distinguish clustering patterns that would not only identify point-event dense regions and time periods, but also provide insight into the hazardous process. Here we review spatio-temporal clustering methodologies applicable to point event datasets representative of natural hazards and we evaluate their performance using both synthetic and real life data. We first present a systematic overview of major spatio-temporal clustering methodologies used in the literature, which include clustering procedures  that are (i) global (providing a single quantitative measure of the degree of clustering in the dataset) and (ii) local (i.e. assigning individual point events to a cluster). A total of seven methodologies from these two groups of clustering procedures are applied to real-world (lightning) and synthetic datasets. For (i) global procedures, we explore Knox, Mantel, Jacquez k-NN tests and spatio-temporal K-functions and for (ii) local procedures we consider spatio-temporal scan statistic, kernel density estimation and density-based clustering method OPTICS. The dataset of 7021 lightning strikes is from 1 and 2 July 2015 over the UK, when a severe three-storm system crossed the region with different convective modes producing each of the storms. The synthetic datasets are representative of various topologies of a point-event natural hazard data with a moving source. We introduce a two-source model with input parameters related to the physical properties of the source. Each source has a set number of points events, initiation point in space and time, movement speed, direction, inter-event time distribution and spatial spread distribution. In addition to a base model of two identical moving sources with a set temporal separation, we produce four different topologies of the data by incrementally varying the speed parameter of the source, spatial spread parameters, direction and initiation points, and angle of two sources. With these five synthetic datasets representative of various two-source models, we evaluate the performance of the methodologies. The performance is assessed based on the ability of each methodology to separate the point events produced by the two sources and the sensitivity of these results to changes in the model input parameters. We further discuss the benefits of combining global and local clustering procedures in the analyses as we gain an initial understanding of the spatial and temporal scales over which clustering is present in the data by using global clustering procedures. This information then helps to inform and limit the choice of input parameters for the local clustering procedures.

How to cite: Zandovskis, U., Malamud, B. D., and Pigoli, D.: Spatio-temporal clustering methodologies for point-event natural hazards, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13210, https://doi.org/10.5194/egusphere-egu21-13210, 2021.

EGU21-7162 | vPICO presentations | ESSI1.6

Wildfire susceptibility assessment: evaluation of the performance of different machine learning algorithms

Andrea Trucchia, Sara Isnardi, Mirko D'Andrea, Guido Biondi, Paolo Fiorucci, and Marj Tonini

Wildfires constitute a complex environmental disaster triggered by several interacting natural and human factors that can affect the biodiversity, species composition and ecosystems, but also human lives, regional economies and environmental health. Therefore, wildfires have become the focus on forestry and ecological research and are receiving considerable attention in forest management. Current advances in automated learning and simulation methods, like machine learning (ML) algorithms, recently aroused great interest in wildfires risk assessment and mapping. This quantitative evaluation is carried out by taking into account two factors: the location and spatial extension of past wildfires events and the geo-environmental and anthropogenic predisposing factors that favored their ignition and spreading. When dealing with risk assessment and predictive mapping for natural phenomena, it is crucial to ascertain the reliability and validity of collected data, as well as the prediction capability of the obtained results. In a previous study (Tonini et al. 2020) authors applied Random Forest (RF) to elaborate wildfire susceptibility mapping for Liguria region (Italy). In the present study, we address to the following outstanding issues, which are still unsolved: (1) the vegetation map included a class labeled “burned area” that masked to true burned vegetation; (2) the implemented model based on RF gave good results, but it needs to be compared with other ML based approaches; (3) to test the predictive capabilities of the model, the last three years of observations were taken, but these are not fully representative of different wildfires regimes, characterizing non-consecutives years. Thus, by improving the analyses, the following results were finally achieved. 1) the class “burned areas” has been reclassified based on expert knowledge, and the type of vegetation correctly assigned. This allowed correctly estimating the relative importance of each vegetation class belonging to this variable. (2) Two additional ML based approach, namely Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM), were tested besides RF and the performance of each model was assessed, as well as the resulting variable ranking and the predicting outputs. This allowed comparing the three ML based approaches and evaluating the pros and cons of each one. (3) The training and testing dataset were selected by extracting the yearly-observations based on a clustering procedure, allowing accounting for the temporal variability of the burning seasons. As result, our models can perform on average better prediction in different situations, by taking into considering years experiencing more or less wildfires than usual. The three ML-based models (RF, SVM and MLP) were finally validated by means of two metrics: i) the Area Under the ROC Curve, selecting the validation dataset by using a 5-folds cross validation procedure; ii) the RMS errors, computed by evaluating the difference between the predicted probability outputs and the presence/absence of an observed event in the testing dataset.

Bibliography:

Tonini, M.; D’Andrea, M.; Biondi, G.; Degli Esposti, S.; Trucchia, A.; Fiorucci, P. A Machine Learning-Based Approach for Wildfire Susceptibility Mapping. The Case Study of the Liguria Region in Italy. Geosciences 202010, 105. https://doi.org/10.3390/geosciences10030105

How to cite: Trucchia, A., Isnardi, S., D'Andrea, M., Biondi, G., Fiorucci, P., and Tonini, M.: Wildfire susceptibility assessment: evaluation of the performance of different machine learning algorithms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7162, https://doi.org/10.5194/egusphere-egu21-7162, 2021.

EGU21-13713 | vPICO presentations | ESSI1.6

RapidAI4EO: Advancing the State-of-the-Art in Continuous Land Monitoring

Giovanni Marchisio, Patrick Helber, Benjamin Bischke, Tim Davis, and Annett Wania

New catalogues of nearly daily or even intraday temporal data will soon dominate the global archives. However, there has been little exploration of artificial intelligence (AI) techniques to leverage the high cadence that is already possible to achieve through the fusion of multiscale, multimodal sensors. Under the sponsorship of the European Union’s Horizon 2020 programme, RapidAI4EO will establish the foundations for the next generation of Copernicus Land Monitoring Service (CLMS) products. Focus is on the CORINE Land Cover programme, which is the flagship of CLMS. 

Specific objectives of the project are to: 1) explore and stimulate the development of new spatiotemporal monitoring applications based on the latest advances in AI and Deep Learning (DL); 2) demonstrate the fusion of Copernicus high resolution satellite imagery and third party very high resolution imagery; 3) provide intensified monitoring of Land Use and Land Cover, and Land Use change at a much higher level of detail and temporal cadence than it is possible today. 

Our strategy is two-fold. The first aspect involves developing vastly improved DL architectures to model the phenomenology inherent in high cadence observations with focus on disentangling phenology from structural change. The second involves providing critical training data to drive advancement in the Copernicus community and ecosystem well beyond the lifetime of this project. To this end we will create the most complete and dense spatiotemporal training sets ever, combining Sentinel-2 with daily, harmonized, cloud-free, gap filled, multispectral 3m time series resulting from fusion of open satellite data with Planet imagery at as many as 500,000 patch locations over Europe. The daily time series will span the entire year 2018, to coincide with the latest release of CORINE. We plan to open source these datasets for the benefit of the entire remote sensing community.

This talk focuses on the description of the datasets whose inspirations comes from the recently released EuroSAT (Helbert et al, 2019) and BigEarthNet corpora (Sumbul et al, 2019). The new corpora will look at the intersection of CORINE 2018 with all the countries in the EU, balancing relative country surface with relative LULC distribution and most notably adding the daily high resolution time series at all locations for the year 2018. Annotations will be based on the CORINE ontology. The higher spatial resolution will support modeling of more LC classes, while the added  temporal dimension should enable disambiguation of land covers across diverse climate zones, as well as an improved understanding of land use.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004356.

How to cite: Marchisio, G., Helber, P., Bischke, B., Davis, T., and Wania, A.: RapidAI4EO: Advancing the State-of-the-Art in Continuous Land Monitoring, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13713, https://doi.org/10.5194/egusphere-egu21-13713, 2021.

EGU21-16051 | vPICO presentations | ESSI1.6

Land-use change effects on biodiversity through mechanistic simulations: A case study with South-Asian mammals

Andre P. Silva, Filip Thorn, Damaris Zurell, and Juliano Cabral

Land-use change remains the main driver of biodiversity loss, and fragmentation and habitat loss are expected to lead to further population declines and species losses . We apply a recently developed R package for a spatially-explicit mechanistic simulation model (RangeShiftR), which incorporates habitat suitability, demographic as well as dispersal processes to understand temporal effects of land-use change (Land-use harmonization scenarios for the 1900-2100 period) on abundance and richness of mammalian species in South-Asia. We then compare land-use scenarios with and without protected areas to understand if current spatial conservation strategies are able to sustain viable populations independently of the land-use scenarios followed. Our approach is innovative in assessing how land-use scenarios can influence animal populations through underlying ecological processes.

How to cite: P. Silva, A., Thorn, F., Zurell, D., and Cabral, J.: Land-use change effects on biodiversity through mechanistic simulations: A case study with South-Asian mammals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16051, https://doi.org/10.5194/egusphere-egu21-16051, 2021.

EGU21-13903 | vPICO presentations | ESSI1.6

Fostering International Collaboration Through a Unified Verification, Validation, and Diagnostics Framework - METplus 

Tara Jensen, Marion Mittermaier, Paul Kucera, and Barbara Brown

Verification and validation activities are critical for the success of modeling and prediction efforts at organizations around the world.  Having reproducible results via a consistent framework is equally important for model developers and users alike.  The Model Evaluation Tools (MET) was developed over a decade ago and expanded to the METplus framework with a view towards providing a consistent platform delivering reproducible results.   

The METplus system is an umbrella verification, validation and diagnostic framework for use by thousands of users from both US and international organizations.  These tools are designed to be highly flexible to allow for quick adaption to meet additional evaluation and diagnostic needs.  A suite of python wrappers have been implemented to facilitate a quick set-up and implementation of the system, and to enhance the pre-existing plotting capabilities.  Recently, several organizations within the National Oceanic and Atmospheric Adminstration (NOAA), the United States Department of Defense (DOD), and international partnerships such as Unified Model (UM) Partnership led by the Met Office have adopted the tools for their use both operationally and for research purposes.  Many of these organizations are also now contributing to METplus development, leading to a more robust and dynamic framework for the entire earth system modeling community to use.

This presentation will provide an update on the current status of METplus and how it is being used in across multiple scales and applications.  It will highlight examples of METplus applied to verification and validation efforts throughout the international community to address a range of temporal (hourly forecasts to subseasonal-to-seasonal) and spatial scales (convection allowing to mesoscale, regional to global, tropical to cryosphere to space).

How to cite: Jensen, T., Mittermaier, M., Kucera, P., and Brown, B.: Fostering International Collaboration Through a Unified Verification, Validation, and Diagnostics Framework - METplus , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13903, https://doi.org/10.5194/egusphere-egu21-13903, 2021.

EGU21-4468 | vPICO presentations | ESSI1.6

Free Evaluation System Framework (Freva) - New Features and Development

Etor E. Lucio-Eceiza, Christopher Kadow, Martin Bergemann, Mahesh Ramadoss, Sebastian Illing, Oliver Kunst, Thomas Schartner, Jens Grieger, Mareike Schuster, Andy Richling, Ingo Kirchner, Henning Rust, Philipp Sommer, Ulrich Cubasch, Uwe Ulbrich, Hannes Thiemann, and Thomas Ludwig

The Free Evaluation System Framework (Freva - freva.met.fu-berlin.de , xces.dkrz.de , www-regiklim.dkrz.de - https://github.com/FREVA-CLINT/Freva) is a software infrastructure for standardized data and tool solutions in Earth system science. Freva runs on high performance computers (HPC) to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, where all its features are accessible via shell and web environment. Freva indexes different data projects into one common search environment by storing the metadata information of the self-describing model, reanalysis and observational data sets in a database. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). This implemented metadata system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Facilitation of the provision and usage of tools and climate data automatically increases the number of scientists working with the data sets and identifying discrepancies. Plugins are also able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-system stores every analysis performed with the evaluation system in a database. Configurations and results of the tools can be shared among scientists via shell or web system. Therefore, plugged-in tools benefit from transparency and reproducibility. Furthermore, the system suggests existing results already produced by other users – saving CPU hours, I/O, disk space and time. An integrated web shell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects on a HPC. Freva efficiently frames the interaction between different technologies thus improving the Earth system modeling science. New Features and aspects of further development and collaboration are discussed.

How to cite: Lucio-Eceiza, E. E., Kadow, C., Bergemann, M., Ramadoss, M., Illing, S., Kunst, O., Schartner, T., Grieger, J., Schuster, M., Richling, A., Kirchner, I., Rust, H., Sommer, P., Cubasch, U., Ulbrich, U., Thiemann, H., and Ludwig, T.: Free Evaluation System Framework (Freva) - New Features and Development, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4468, https://doi.org/10.5194/egusphere-egu21-4468, 2021.

EGU21-4918 | vPICO presentations | ESSI1.6

Novel assessment of model relative humidity with satellite probabilistic estimates

Chloé Radice, Hélène Brogniez, Pierre-Emmanuel Kirstetter, and Philippe Chambon

Remote sensing data are often used to assess model forecasts on multiple scales, generally by confronting past simulations to observations. This paper introduces a novel probabilistic  method that evaluates  tropical atmospheric relative humidity (RH) profiles simulated by the global numerical model for weather forecasting ARPEGE  with respect to probability distributions of finer scale satellite observations.   

The reference RH is taken from the SAPHIR microwave sounder onboard the Megha-Tropiques satellite in operations since 2011. ARPEGE simulates the RH field every 6h hours on a 0.25° grid over 18 vertical levels ranging between 950hPa and 100hPa. The reference probabilistic RH field is retrieved from brightness temperatures measured by SAPHIR with a footprint resolution ranging from 10 km (nadir) to 23 km (edge of swath) on 6 vertical layers, ranging from 950hPa to 100hPa. Footprint scale RH are aggregated (convoluted) over the spatial and temporal scale of comparison to match the model resolution and summarize the patterns over a significant period. Comparison  results will be shown over the April-May-June 2018 period for two configurations of the ARPEGE model (two parametrization schemes for convection). The probabilistic comparison is discussed with respect to a classical deterministic comparison of RH values.

This probabilistic approach allows to keep all the sub-grid information and, by looking at the distribution as a whole, avoids the classical determinist simplification that consists of working with a simple “best” estimate. This method allows a finer assessment by working on a case-by-case basis and allowing a characterisation of specific situations. It provides an added-value by accounting for  additional information in the evaluation of the simulated field, especially for model simulations that are close to the traditional mean.

How to cite: Radice, C., Brogniez, H., Kirstetter, P.-E., and Chambon, P.: Novel assessment of model relative humidity with satellite probabilistic estimates, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4918, https://doi.org/10.5194/egusphere-egu21-4918, 2021.

EGU21-3476 | vPICO presentations | ESSI1.6

Recent developments on the Earth System Model Evaluation Tool

Bouwe Andela, Fakhereh Alidoost, Lukas Brunner, Jaro Camphuijsen, Bas Crezee, Niels Drost, Bettina Gier, Birgit Hassler, Peter Kalverla, Axel Lauer, Saskia Loosveldt-Tomas, Ruth Lorenz, Valeriu Predoi, Mattia Righi, Manuel Schlund, Stef Smeets, Javier Vegas-Regidor, Jost Von Hardenberg, Katja Weigel, and Klaus Zimmermann

The Earth System Model Evaluation Tool (ESMValTool) is a free and open-source community diagnostic and performance metrics tool for the evaluation of Earth system models such as those participating in the Coupled Model Intercomparison Project (CMIP). Version 2 of the tool (Righi et al. 2020, www.esmvaltool.org) features a brand new design composed of a core that finds and processes data according to a ‘recipe’ and an extensive collection of ready-to-use recipes and associated diagnostic codes for reproducing results from published papers. Development and discussion of the tool (mostly) takes place in public on https://github.com/esmvalgroup and anyone with an interest in climate model evaluation is welcome to join there.

 

Since the initial release of version 2 in the summer of 2020, many improvements have been made to the tool. It is now more user friendly with extensive documentation available on docs.esmvaltool.org and a step by step online tutorial. Regular releases, currently planned three times a year, ensure that recent contributions become available quickly while still ensuring a high level of quality control. The tool can be installed from conda, but portable docker and singularity containers are also available.

 

Recent new features include a more user-friendly command-line interface, citation information per figure including CMIP6 data citation using ES-DOC, more and faster preprocessor functions that require less memory, automatic corrections for a larger number of CMIP6 datasets, support for more observational and reanalysis datasets, and more recipes and diagnostics.

 

The tool is now also more reliable, with improved automated testing through more unit tests for the core, as well as a recipe testing service running at DKRZ for testing the scientific recipes and diagnostics that are bundled into the tool. The community maintaining and developing the tool is growing, making the project less dependent on individual contributors. There are now technical and scientific review teams that review new contributions for technical quality and scientific correctness and relevance respectively, two new principal investigators for generating a larger support base in the community, and a newly created user engagement team that is taking care of improving the overall user experience.

How to cite: Andela, B., Alidoost, F., Brunner, L., Camphuijsen, J., Crezee, B., Drost, N., Gier, B., Hassler, B., Kalverla, P., Lauer, A., Loosveldt-Tomas, S., Lorenz, R., Predoi, V., Righi, M., Schlund, M., Smeets, S., Vegas-Regidor, J., Von Hardenberg, J., Weigel, K., and Zimmermann, K.: Recent developments on the Earth System Model Evaluation Tool, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3476, https://doi.org/10.5194/egusphere-egu21-3476, 2021.

EGU21-4805 | vPICO presentations | ESSI1.6

Bringing ESMValTool to the Jupyter Lab

Peter C. Kalverla, Stef Smeets, Niels Drost, Bouwe Andela, Fakhereh Alidoost, and Jaro Camphuijsen

Ease of use can easily become a limiting factor to scientific quality and progress. In order to verify and build upon previous results, the ability to effortlessly access and process increasing data volumes is crucial.

To level the playing field for all researchers, a shared infrastructure had to be developed. In Europe, this effort is coordinated mainly through the IS-ENES projects. The current infrastructure provides access to the data as well as compute resources. This leaves the tools to easily work with the data as the main obstacle for a smooth scientific process. Interestingly, not the scarcity of tools, but rather their abundance can lead to diverging workflows that hamper reproducibility.

The Earth System Model eValuation Tool (ESMValTool) was originally developed as a command line tool for routine evaluation of important analytics workflows. This tool encourages some degree of standardization by factoring out common operations, while allowing for custom analytics of the pre-processed data. All scripts are bundled with the tool. Over time this has grown into a library of so-called ‘recipes’.

In the EUCP project, we are now developing a Python API for the ESMValTool. This allows for interactive exploration, modification, and execution of existing recipes, as well as creation of new analytics. Concomitantly, partners in IS-ENES3 are making their infrastructure accessible through JupyterLab. Through the combination of these technologies, researchers can easily access the data and compute, but also the workflows or methods used by their colleagues - all through the web browser. During the vEGU, we will show how this extended infrastructure can be used to easily reproduce, and build upon, previous results.

How to cite: Kalverla, P. C., Smeets, S., Drost, N., Andela, B., Alidoost, F., and Camphuijsen, J.: Bringing ESMValTool to the Jupyter Lab, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4805, https://doi.org/10.5194/egusphere-egu21-4805, 2021.

EGU21-7724 | vPICO presentations | ESSI1.6

New scientific diagnostics in the ESMValTool – an overview

Lisa Bock, Birgit Hassler, and Axel Lauer and the ESMValTool Develpoment Team

The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest major release (v2.0) of the ESMValTool has been officially introduced in August 2020 as a large community effort, and since then several additional smaller releases have followed.

The diagnostic part of the ESMValTool includes a large collection of standard “recipes” for reproducing peer-reviewed analyses of many variables across ESM compartments including atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation gives an overview on the latest scientific diagnostics and metrics added during the last year including examples of applications of these diagnostics to CMIP6 model data.

How to cite: Bock, L., Hassler, B., and Lauer, A. and the ESMValTool Develpoment Team: New scientific diagnostics in the ESMValTool – an overview, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7724, https://doi.org/10.5194/egusphere-egu21-7724, 2021.

EGU21-15681 | vPICO presentations | ESSI1.6

Model evaluation expectations of European ESM communities: results from a survey

Jerome Servonnat, Eric Guilyardi, Zofia Stott, Kim Serradell, Axel Lauer, Klaus Zimmerman, Fanny Adloff, Marie-Claire Greening, Remi Kazeroni, and Javier Vegas

Developing an Earth system model evaluation tool for a broad user community is a real challenge, as the potential users do not necessarily have the same needs or expectations. While many evaluation tasks across user communities include common steps, significant differences are also apparent, not least the investment by institutions and individuals in bespoke tools. A key question is whether there is sufficient common ground to pursue a community tool with broad appeal and application.

We present the main results of a survey carried out by Assimila for the H2020 IS-ENES3 project to review the model evaluation needs of European Earth System Modelling communities. Interviewing approximately 30 participants among several European institutions, the survey targeted a broad range of users, including model developers, model users, evaluation data providers, and infrastructure providers. The output of the study provides an analysis of  requirements focusing on key technical, standards, and governance aspects.

The study used ESMValTool as a  current benchmark in terms of European evaluation tools. It is a community diagnostics and performance metrics tool for the evaluation of Earth System Models that allows for comparison of single or multiple models, either against predecessor versions or against observations. The tool is being developed in such a way that additional analyses can be added. As a community effort open to both users and developers, it encourages open exchange of diagnostic source code and evaluation results. It is currently used in Coupled Model Intercomparison Projects as well as for the development and testing of “new” models.

A key result of the survey is the widespread support for ESMValTool amongst users, developers, and even those who have taken or promote other approaches. The results of the survey identify priorities and opportunities in the further development of the ESMValTool to ensure long-term adoption of the tool by a broad community.

How to cite: Servonnat, J., Guilyardi, E., Stott, Z., Serradell, K., Lauer, A., Zimmerman, K., Adloff, F., Greening, M.-C., Kazeroni, R., and Vegas, J.: Model evaluation expectations of European ESM communities: results from a survey, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15681, https://doi.org/10.5194/egusphere-egu21-15681, 2021.

ESSI1.12 – Novel methods and applications of satellite and aerial imagery

EGU21-15661 | vPICO presentations | ESSI1.12

Simulating Marine Litter observations from space to support Operations Research

Stephen Emsley, Manuel Arias, Théodora Papadopoulou, and François-Régis Martin-Lauzer

An breadboard for end-to-end (E2E) Marine Litter Optical Performance Simulations (ML-OPSI) is being designed in the frame of the ESA Open Space Innovation Platform (OSIP) Campaign to support Earth Observation (EO) scientists with the design of computational experiments for Operations Research. The ML-OPSI breadboard will estimate Marine Litter signal at Top-Of-Atmosphere (TOA) from a set of Bottom-Of-Atmosphere (BOA) scenarios representing the various case studies by the community (e.g., windrows, frontal areas, river mouths, sub-tropical gyres), coming from synthetic data (computer-simulated) or from real observations. It is a modular, pluggable and extensible framework, promoting re-use and be adapted for different missions, sensors and scenarios.

The breadboard consists of (a) the OPSI components for the simulation i.e. the process of using a model to study the characteristics of the system by manipulating variables and by studying the properties of the model allowing an evaluation to optimise performance and make predictions about the real system; and (b) the Marine Litter model components for the detection of marine litter. It shall consider the changes caused in the water reflectance and properties due to marine litter, exploiting gathered information of plastic polymers, different viewing geometries, and atmospheric conditions as naturally occurring. The modules of the breadboard include a Scenario Builder Module (SB) with maximum spatial resolution and best modelling as possible of the relevant physical properties, which for spectral sensors could include high spatial resolution and high spectral density/resolution BOA radiance simulations in the optical to SWIR bands; a Radiative Transfer Module (RTM) transforming water-leaving to TOA reflectance for varying atmospheric conditions and observational geometries; a Scene Generator Module (SGM) which could use Sentinel-2, Landsat, or PRISMA data as reference or any other instrument as pertinent; a Performance Assessment Module (PAM) for ML detection that takes into account the variability of the atmosphere, the sunlight & skylight at BOA, the sea-surface roughness with trains of wind waves & swells, sea-spray (whitecaps), air bubbles in the mixed layer, marine litter dynamics as well as instrumental noise to assess marine litter detection feasibility.

Marine Litter scenarios of reference shall be built based on in-situ campaigns, to reflect the true littering conditions at each case, both in spatial distribution and composition. The breadboard shall be validated over artificial targets at sea in field campaigns as relevant. This might include spectral measurements from ASD, on-field radiometers, and cameras on UAVs, concomitant with Copernicus Sentinel-2 acquisitions. Combined, they can be used to estimate atmospheric contribution and assess performance of the testes processing chain.

This activity collaborates on the ““Remote Sensing of Marine Litter and Debris” IOCCG taskforce.

How to cite: Emsley, S., Arias, M., Papadopoulou, T., and Martin-Lauzer, F.-R.: Simulating Marine Litter observations from space to support Operations Research, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15661, https://doi.org/10.5194/egusphere-egu21-15661, 2021.

EGU21-16478 | vPICO presentations | ESSI1.12

Synergistic use of SMOS and Sentinel-3 for retrieving spatiotemporally estimates of surface soil moisture and evaporative fraction

Maria Piles, Miriam Pablos Hernandez, Mercè Vall-llossera, Gerard Portal, Ionut Sandric, George P. Petropoulos, and Dionisis Hristopulos

Earth Observation (EO) makes it possible to obtain information on key parameters characterizing interactions among Earth’s system components, such as evaporative fraction (EF) and surface soil moisture (SSM). Notably, techniques utilizing EO data of land surface temperature (Ts) and vegetation index (VI) have shown promise in this regard. The present study presents an implementation of a downscaling method that combined the soil moisture product from SMOS and the Fractional Vegetation Cover provided by Sentinel 3 ESA platform.

The applicability of the investigated technique is demonstrated for a period of two years (2017-2018) using in-situ data acquired from five CarboEurope sites and from all the sites available in the REMEDHUS soil moisture monitoring network, representing a variety of climatic, topographic and environmental conditions. Predicted parameters were compared against co-orbital ground measurements acquired from several European sites belonging to the CarboEurope ground observational network.

Results indicated a close agreement between all the inverted parameters and the corresponding in-situ data. SSM maps predicted from the “triangle”  SSM showed a small bias, but a large scatter. The results of this study provide strong supportive evidence of the potential value of the investigated herein methodology in accurately deriving estimates of key parameters characterising land surface interactions that can meet the needs of fine-scale hydrological applications. Moreover, the applicability of the presented approach demonstrates the added value of the synergy between ESA’s operational products acquired from different satellite sensors, namely in this case SMOS & Sentienl-3. As it is not tight to any particular sensor can also be implemented with technologically advanced EO sensors launched recently or planned to be launched.

In the present work Dr Petropoulos participation has received funding from the European Union’s Horizon 2020 research and innovation programme ENViSIoN under the Marie Skłodowska-Curie grant agreement No 752094.

How to cite: Piles, M., Pablos Hernandez, M., Vall-llossera, M., Portal, G., Sandric, I., Petropoulos, G. P., and Hristopulos, D.: Synergistic use of SMOS and Sentinel-3 for retrieving spatiotemporally estimates of surface soil moisture and evaporative fraction, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16478, https://doi.org/10.5194/egusphere-egu21-16478, 2021.

Automatically extracting buildings from remote sensing images (RSI) plays important roles in urban planning, population estimation, disaster emergency response, etc. With the development of deep learning technology, convolutional neural networks (CNN) with better performance than traditional methods have been widely used in extracting buildings from remote sensing imagery (RSI). But it still faces some problems. First of all, low-level features extracted by shallow layers and abstract features extracted by deep layers of the artificial neural network could not be fully fused. it makes building extraction is often inaccurate, especially for buildings with complex structures, irregular shapes and small sizes. Secondly, there are so many parameters that need to be trained in a network, which occupies a lot of computing resources and consumes a lot of time in the training process. By analyzing the structure of the CNN, we found that abstract features extracted by deep layers with low geospatial resolution contain more semantic information. These abstract features are conducive to determine the category of pixels while not sensitive to the boundaries of the buildings. We found the stride of the convolution kernel and pooling operation reduced the geospatial resolution of feature maps, so, this paper proposed a simple and effective strategy—reduce the stride of convolution kernel contains in one of the layers and reduced the number of convolutional kernels to alleviate the above two bottlenecks. This strategy was used to deeplabv3+net and the experimental results for both the WHU Building Dataset and Massachusetts Building Dataset. Compared with the original deeplabv3+net the result showed that this strategy has a better performance. In terms of WHU building data set, the Intersection over Union (IoU) increased by 1.4% and F1 score increased by 0.9%; in terms of Massachusetts Building Dataset, IoU increased by 3.31% and F1 score increased by 2.3%.

How to cite: Chen, M., Wu, J., and Tian, F.: Reducing the stride of the convolution kernel: a simple and effective strategy to increase the performance of CNN in building extraction from remote sensing image, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10783, https://doi.org/10.5194/egusphere-egu21-10783, 2021.

EGU21-3740 | vPICO presentations | ESSI1.12

Deep learning for extracting water body from Sentinel-2 MSI imagery

Shuren Chou

Deep learning has a good capacity of hierarchical feature learning from unlabeled remote sensing images. In this study, the simple linear iterative clustering (SLIC) method was improved to segment the image into good quality super-pixels. Then, we used the convolutional neural network (CNN) to extract of water bodies from Sentinel-2 MSI data using deep learning technique. In the proposed framework, the improved SLIC method obtained the correct water bodies boundary by optimizing the initial clustering center, designing a dynamic distance measure, and expanding the search space. In addition, it is different from traditional extraction of water bodies methods that cannot achieve multi-level water bodies detection. Experimental results showed that this method had higher detection accuracy and robustness than other methods. This study was able to extract water bodies from remotely sensed images with deep learning and to conduct accuracy assessment.

How to cite: Chou, S.: Deep learning for extracting water body from Sentinel-2 MSI imagery, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3740, https://doi.org/10.5194/egusphere-egu21-3740, 2021.

EGU21-14869 | vPICO presentations | ESSI1.12

A new MODIS-Landsat fusion method to reconstruct Landsat NDVI time-series data

Xiaofang Ling and Ruyin Cao

The Normalized Difference Vegetation Index (NDVI) data provided by the satellite Landsat have rich historical archive data with a spatial resolution of 30 m. However, the Landsat NDVI time-series data are quite discontinuous due to its 16-day revisit cycle, cloud contamination and some other factors. The spatiotemporal data fusion technology has been proposed to reconstruct continuous Landsat NDVI time-series data by blending the MODIS data with the Landsat data. Although a number of spatiotemporal fusion algorithms have been developed during the past decade, most of the existing algorithms usually ignore the effective use of partially cloud-contaminated images. In this study, we presented a new spatiotemporal fusion method, which employed the cloud-free pixels in the partially cloud-contaminated images to improve the performance of MODIS-Landsat data fusion by Correcting the inconsistency between MODIS and Landsat data in Spatiotemporal DAta Fusion (called CSDAF). We tested the new method at three sites covered by different vegetation types, including deciduous forests in the Shennongjia Forestry District of China (SNJ), evergreen forests in Southeast Asia (SEA), and the irrigated farmland in the Coleambally irrigated area of Australia (CIA). Two experiments were designed. In experiment I, we first simulated different cloud coverages in cloud-free Landsat images and then used both CSDAF and the recently developed IFSDAF method to restore these “missing” pixels for quantitative assessments. Results showed that CSDAF performed better than IFSDAF by achieving the smaller average Root Mean Square Error (RMSE) values (0.0767 vs. 0.1116) and the larger average Structural SIMilarity index (SSIM) values (0.8169 vs. 0.7180). In experiment II, we simulated the scenario of “inconsistence” between MODIS and Landsat by simulating different levels of noise on MODIS and Landsat data. Results showed that CSDAF was able to reduce the influence of the inconsistence between MODIS and Landsat data on MODIS-Landsat data fusion to some extent. Moreover, CSDAF is simple and can be implemented on the Google Earth Engine. We expect that CSDAF is potentially to be used to reconstruct Landsat NDVI time-series data at the regional and continental scales.

How to cite: Ling, X. and Cao, R.: A new MODIS-Landsat fusion method to reconstruct Landsat NDVI time-series data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14869, https://doi.org/10.5194/egusphere-egu21-14869, 2021.

EGU21-10374 | vPICO presentations | ESSI1.12

Dynamic evaluation of small urban green areas at local level using GEOBIA

Ana-Maria Popa, Diana Andreea Onose, Ionut Cosmin Sandric, Simona Raluca Gradinaru, and Athanasios Alexandru Gavrilidis

Urban green infrastructure has various benefits known as ecosystem services such as regulating, cultural, provisioning and supporting services. Among the provided benefits there are decrease of air temperature, increasing humidity and mitigating urban heat island as regulating services; human-nature relations as cultural services; improving air quality, carbon sequestration as provisioning services and photosynthesis, nutrient and water cycling as supporting services. The high intensity of the urbanization process across the last decades coupled with weak legislative frameworks resulted both in large areas affected by urban sprawl and densification of the existing urban fabric. Both phenomenon generated loss in open spaces, especially green areas. In the context of the sustainable urbanization promoted by HABITAT Agenda, the knowledge related with the distribution, size and quality of urban green areas represents a priority. The study aim is to identify small urban green areas at local level at different time moments for a dynamic evaluation. We focused on small urban green areas since they are scarcely analysed even if their importance for the urban quality of life Is continuously increasing given the urbanization process. We used satellite imagery acquired by Planet Satellite Constellations, with a spatial resolution of 3.7 m and daily coverage, for extracting green areas. The images were processed using Geographic Object-Based Image Analysis (OBIA) techniques implemented in Esri ArcGIS Pro. The spatial analysis we performed generated information about distribution, surfaces, quality (based on NDVI) and dynamic of small urban green areas. The results are connected with the local level development of the urban areas we analysed, but also with the population consumption pattern for leisure services, housing, transport or other public utilities. The analysis can represent a complementary method for extracting green areas at urban level and can support the data collection for calculating urban sustainability indicators.

How to cite: Popa, A.-M., Onose, D. A., Sandric, I. C., Gradinaru, S. R., and Gavrilidis, A. A.: Dynamic evaluation of small urban green areas at local level using GEOBIA, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10374, https://doi.org/10.5194/egusphere-egu21-10374, 2021.

EGU21-12722 | vPICO presentations | ESSI1.12

SemantiX: a cross-sensor semantic EO data cube to open and leverage AVHRR time-series and essential climate variables with scientists and the public 

Hannah Augustin, Martin Sudmanns, Helga Weber, Andrea Baraldi, Stefan Wunderle, Christoph Neuhaus, Steffen Reichel, Lucas van der Meer, Philipp Hummer, and Dirk Tiede

Long time series of essential climate variables (ECVs) derived from satellite data are key to climate research. SemantiX is a research project to establish, complement and expand Advanced Very High Resolution Radiometer (AVHRR) time series using Copernicus Sentinel-3 A/B imagery, making them and derived ECVs accessible using a semantic Earth observation (EO) data cube. The Remote Sensing Research Group at the University of Bern has one of the longest European times series of AVHRR imagery (1981-now). Data cube technologies are a game changer for how EO imagery are stored, accessed, and processed. They also establish reproducible analytical environments for queries and information production and are able to better represent multi-dimensional systems. A semantic EO data cube is a newly coined concept by researchers at the University of Salzburg referring to a spatio-temporal data cube containing EO data, where for each observation at least one nominal (i.e., categorical) interpretation is available and can be queried in the same instance (Augustin et al. 2019). Offering analysis ready data (i.e., calibrated and orthorectified AVHRR Level 1c data) in a data cube along with semantic enrichment reduces barriers to conducting spatial analysis through time based on user-defined AOIs.

This contribution presents a semantic EO data cube containing selected ECV time series (i.e., snow cover extent, lake surface water temperature, vegetation dynamics) derived from AVHRR imagery (1981-2019), a temporal and spatial subset of AVHRR Level 1c imagery (updated after Hüsler et al. 2011) from 2016 until 2019, and, for the later, semantic enrichment derived using the Satellite Image Automatic Mapper (SIAM). SIAM applies a fully automated, spectral rule-based routine based on a physical-model to assign spectral profiles to colour names with known semantic associations; no user parameters are required, and the result is application-independent (Baraldi et al. 2010). Existing probabilistic cloud masks (Musial et al. 2014) generated by the Remote Sensing Research Group at the University of Bern are also included as additional data-derived information to support spatio-temporal semantic queries. This implementation is a foundational step towards the overall objective of combining climate-relevant AVHRR time series with Sentinel-3 imagery for the Austrian-Swiss alpine region, a European region that is currently experiencing serious changes due to climate change that will continue to create challenges well into the future.

Going forward, this semantic EO data cube will be linked to a mobile citizen science smartphone application. For the first time, scientists in disciplines unrelated to remote sensing, students, as well as interested members of the public will have direct and location-based access to these long EO data time series and derived information. SemantiX runs from August 2020-2022 funded by the Austrian Research Promotion Agency (FFG) under the Austrian Space Applications Programme (ASAP 16) (project #878939) in collaboration with the Swiss Space Office (SSO).

How to cite: Augustin, H., Sudmanns, M., Weber, H., Baraldi, A., Wunderle, S., Neuhaus, C., Reichel, S., van der Meer, L., Hummer, P., and Tiede, D.: SemantiX: a cross-sensor semantic EO data cube to open and leverage AVHRR time-series and essential climate variables with scientists and the public , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12722, https://doi.org/10.5194/egusphere-egu21-12722, 2021.

EGU21-10882 | vPICO presentations | ESSI1.12

Using Sentinel-5P time-series products for Nitrogen Dioxide (NO2) Spatio-Temporal Analysis over Europe During the Coronavirus Pandemic Lockdown

Marina Vîrghileanu, Ionuț Săvulescu, Bogdan-Andrei Mihai, Constantin Nistor, and Robert Dobre

Nitrogen dioxide (NO2) is one of the main air quality pollutants of concern in many urban and industrial areas worldwide. Being emitted by fossil fuel burning activities including mainly road traffic, the NO2 pollution is responsible for population health degradation and secondary pollutants formation as nitric acid and ozone. In the European region, almost 20 countries exceeded in 2017 the NO2 annual limit values imposed by European Commission Directive 2008/50/EC (EEA, 2019). Therefore, NO2 pollution monitoring and regulation is a necessary task to help decision makers to search for a sustainable solution for environmental quality and population health status improvement. In this study, we propose a comparative analysis of the tropospheric NO2 column density spatial configuration over Europe between similar periods from 2019 and 2020, based on ESA Copernicus Sentinel-5P products. Our results highlight the NO2 pollution dynamics over the abrupt transition from a normal condition situation to the COVID-19 outbreak context, characterized by short-time decrease of traffic intensities and industrial activities, this situation being also reflected by the national level statistics referring to COVID-19 cases and economic indicatiors. The validation approach provides high correlation between TROPOMI derived data and independent data from ground-based observations with encouraging values of the R2 ranging between 0.5 and 0.75 in different locations.  

How to cite: Vîrghileanu, M., Săvulescu, I., Mihai, B.-A., Nistor, C., and Dobre, R.: Using Sentinel-5P time-series products for Nitrogen Dioxide (NO2) Spatio-Temporal Analysis over Europe During the Coronavirus Pandemic Lockdown, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10882, https://doi.org/10.5194/egusphere-egu21-10882, 2021.

EGU21-14320 | vPICO presentations | ESSI1.12

Improving the Quality Assessment band in Landsat cloud images for the application of cloud removal  

Boli Yang, Yan Feng, and Ruyin Cao

Cloud contamination is a serious obstacle for the application of Landsat data. Thick clouds can completely block land surface information and lead to missing values. The reconstruction of missing values in a Landsat cloud image requires the cloud and cloud shadow mask. In this study, we raised the issue that the quality of the quality assessment (QA) band in current Landsat products cannot meet the requirement of thick-cloud removal. To address this issue, we developed a new method (called Auto-PCP) to preprocess the original QA band, with the ultimate objective to improve the performance of cloud removal on Landsat cloud images. We tested the new method at four test sites and compared cloud-removed images generated by using three different QA bands, including the original QA band, the modified QA band by a dilation of two pixels around cloud and cloud shadow edges, and the QA band processed by Auto-PCP (“QA_Auto-PCP”). Experimental results, from both actual and simulated Landsat cloud images, show that QA_Auto-PCP achieved the best visual assessment for the cloud-removed images, and had the smallest RMSE values and the largest Structure SIMilarity index (SSIM) values. The improvement for the performance of cloud removal by QA_Auto-PCP is because the new method substantially decreases omission errors of clouds and shadows in the original QA band, but meanwhile does not increase commission errors. Moreover, Auto-PCP is easy to implement and uses the same data as cloud removal without additional image collections. We expect that Auto-PCP can further popularize cloud removal and advance the application of Landsat data.     

 

Keywords: Cloud detection, Cloud shadows, Cloud simulation, Cloud removal, MODTRAN

How to cite: Yang, B., Feng, Y., and Cao, R.: Improving the Quality Assessment band in Landsat cloud images for the application of cloud removal  , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14320, https://doi.org/10.5194/egusphere-egu21-14320, 2021.

The climate is strongly affected by interaction with clouds. To reduce major errors in climate predictions, this interaction requires a much finer understanding of cloud physics than current knowledge. Current knowledge is based on empirical remote sensing data that is analyzed under the assumption that the atmosphere and clouds are made of very broad and uniform layers. To help to overcome this problem, 3D scattering computed tomography (CT) has been suggested as a way to study clouds. 

CT is a powerful way to recover the inner structure of three dimensional (3D) volumetric heterogeneous objects. CT has extensive use in many research and operational domains. Aside from its common usage in medicine, CT is used for sensing geophysical terrestrial structures, atmospheric pollution and fluid dynamics. CT requires imaging from multiple directions and in nearly all CT approaches, the object is considered static during image acquisition. However, in many cases, the object changes while multi-view images are acquired sequentially. Thus, an effort has been invested to expand 3D CT to four-dimensional (4D) spatiotemporal CT. This effort has been directed at linear CT modalities. Since linear CT is computationally easier to handle, it has been a popular method for medical imaging. However, these linear CT modalities do not apply to clouds: clouds constitute a scattering medium, and therefore radiative transfer is non-linear in the clouds’ content.

This work focuses on the challenge of 4D scattering CT of clouds. Scattering CT of clouds requires high-resolution multi-view images from space. There are spaceborne and high-altitude systems that may provide such data, for example AirMSPI, MAIA, HARP and AirHARP. An additional planned system is the CloudCT formation, funded by the ERC. However, these systems are costly. Deploying them in large numbers to simultaneously acquire images of the same clouds from many angles can be impractical. Therefore, the platforms are planned to move above the clouds: a sequence of images is taken, in order to span and sample a wide angular breadth. However, the clouds evolve while the angular span is sampled.

We pose conditions under which this task can be performed. These regard temporal sampling and angular breadth, in relation to the correlation time of the evolving cloud. Then, we generalize scattering CT. The generalization seeks spatiotemporal recovery of the cloud extinction field in high resolution (10m), using data taken by a small number of moving cameras. We present an optimization-based method to reach this, and then demonstrate the method both in rigorous simulations and on real data.

How to cite: Ronen, R., Schechner, Y. Y., and Eytan, E.: Spatiotemporal tomography based on scattered multiangular signals and its use for resolving evolving clouds using moving platforms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10892, https://doi.org/10.5194/egusphere-egu21-10892, 2021.

EGU21-12831 | vPICO presentations | ESSI1.12

The potential of monitoring traffic conditions up to 15 times a day using sub-meter resolution EO images

Refiz Duro, Georg Neubauer, and Alexandra-Ioana Bojor

Urbanization and the trend of people moving to cities often leads to problematic traffic conditions, which can be very challenging for traffic management. It can hamper the flow of people and goods, negatively affecting businesses through delays and the inability to estimate travel times and thus plan, as well as the environment and health of population due to increased fuel consumption and subsequent air pollution. Many cities have a policy and rules to manage traffic, ranging from standard traffic lights to more dynamic and adaptable solutions involving in-road sensors or cameras to actively modify the duration of traffic lights, or even more sophisticated IoT solutions to monitor and manage the conditions on a city-wide scale. The core to these technologies and to decision making processes is the availability of reliable data on traffic conditions, and better yet real-time data. Thus, a lot of cities are still coping with the lack of good spatial and temporal data coverage, as many of these solutions are requiring not only changes to the infrastructure, but also large investments.

One approach is to exploit the current and the forthcoming advancements made available by Earth Observation (EO) satellite technologies. The biggest advantage is EOs great spatial coverage ranging from a few km² to 100 km² per image on a spatial resolution down to 0.3m, thus allowing for a quick, city-spanning data collection. Furthermore, the availability of imaging sensors covering specific bands allows the constituent information within an image to be separated and the information to be leveraged.

In this respect, we present the findings of our work on multispectral image sets collected on three occasions in 2019 using very high resolution WorldView-3 satellite. We apply a combination of machine learning and PCA methods to detect vehicles and devise their kinematic properties (e.g., movement, direction, speed), only possible with satellites with a specific design allowing for short time lags between imaging in different spectral bands. As these data basically constitute a time-series, we will discuss how the results presented fully apply to the forthcoming WorldView-Legion constellation of satellites providing up to 15 revisits per day, and thus near-real time traffic monitoring and its impact on the environment.

How to cite: Duro, R., Neubauer, G., and Bojor, A.-I.: The potential of monitoring traffic conditions up to 15 times a day using sub-meter resolution EO images, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12831, https://doi.org/10.5194/egusphere-egu21-12831, 2021.

EGU21-13727 | vPICO presentations | ESSI1.12

ICEComb − A New Software Tool for Satellite Laser Altimetry Data Processing and Visualisation

Bruno Silva, Luiz Guerreiro Lopes, and Pedro Campos

Processing, handling and visualising the large data volume produced by satellite altimetry missions is a challenging task. A reference tool for the visualisation of satellite laser altimetry data is the OpenAltimetry platform, a tool that provides altimetry-specific data from the Ice, Cloud, and land Elevation Satellite (ICESat) and ICESat-2 satellite missions through a web-based interactive interface. However, by focusing only on altimetry data, that tool leaves out access to many other equally important information existing in the data products from both missions.

The main objective of the work reported here was the development of a new web-based tool, called ICEComb, that offers end users the ability to access all the available data from both satellite missions, visualise and interact with them on a geographic map, store the data records locally, and process and explore data in an efficient, detailed and meaningful way, thus providing an easy-to-use software environment for satellite laser altimetry data analysis and interpretation.

The proposed tool is intended to be mainly used by researchers and scientists to aid their work using ICESat and ICESat-2 data, offering users a ready-to-use system to rapidly access the raw collected data in a visually engaging way, without the need to have prior understanding of the format, structure and parameters of the data products. In addition, the architecture of the ICEComb tool was developed with possible future expansion in mind, for which well-documented and standard languages were used in its implementation. This allows, e.g., to extend its applicability to data from other satellite laser altimetry missions and integrate models that can be coupled with ICESat and ICESat-2 data, thus expanding and enriching the context of studies carried out with such data.

The use of the ICEComb tool is illustrated and demonstrated by its application to ICESat/GLAS measurements over Lake Mai-Ndombe, a large and shallow freshwater lake located within the Ngiri-Tumba-Maindombe area, one of the largest Ramsar wetlands of international importance, situated in the Cuvette Centrale region of the Congo Basin.

Keywords: Laser altimetry, ICESat/GLAS, software tool design, data visualization, Congo Basin.

Acknowledgement. This work was partially supported by the Portuguese Foundation for Science and Technology (FCT) through LARSyS − FCT Pluriannual funding 2020−2023.

How to cite: Silva, B., Guerreiro Lopes, L., and Campos, P.: ICEComb − A New Software Tool for Satellite Laser Altimetry Data Processing and Visualisation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13727, https://doi.org/10.5194/egusphere-egu21-13727, 2021.

EGU21-13762 | vPICO presentations | ESSI1.12

Landsat thermal infrared to detect sub-canopy riparian flooding

Emanuel Storey, Witold Krajewski, and Efthymios Nikolopoulos

Satellite based flood detection can enhance understanding of risk to humans and infrastructures, geomorphic processes, and ecological effects.  Such application of optical satellite imagery has been mostly limited to the detection of water exposed to sky, as plant canopies tend to obstruct water visibility in short electromagnetic wavelengths.  This case study evaluates the utility in multi-temporal thermal infrared observations from Landsat 8 as a basis for detecting sub-canopy fluvial inundation resulting in ambient temperature change.

We selected three flood events of 2016 and 2019 along sections of the Mississippi, Cedar, and Wapsipinicon Rivers located in Iowa, Minnesota, and Wisconsin, United States.  Classification of sub-canopy water involved logical, threshold-exceedance criteria to capture thermal decline within channel-adjacent vegetated zones.  Open water extent in the floods was mapped based on short-wave infrared thresholds determined parametrically from baseline (non-flooded) observations.  Map accuracy was evaluated using higher-resolution (0.5–5.0 m) synchronic optical imagery.

Results demonstrate improved ability to detect sub-canopy inundation when thermal infrared change is incorporated: sub-canopy flood class accuracy was comparable to that of open water in previous studies.  The multi-temporal open-water mapping technique yielded high accuracy as compared to similar studies.  This research highlights the utility of Landsat thermal infrared data for monitoring riparian inundation and for validating other remotely sensed and simulated flood maps.

How to cite: Storey, E., Krajewski, W., and Nikolopoulos, E.: Landsat thermal infrared to detect sub-canopy riparian flooding, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13762, https://doi.org/10.5194/egusphere-egu21-13762, 2021.

The use of remote sensing in mineral detection and lithological mapping has become a generally accepted augmentative tool in exploration. With the advent of multispectral sensors (e.g. ASTER, Landsat, Sentinel and PlanetScope) having suitable wavelength coverage and bands in the Shortwave Infrared (SWIR) and Thermal Infrared (TIR) regions, multispectral sensors have become increasingly efficient at routine lithological discrimination and mineral potential mapping. It is with this paradigm in mind that this project sought to evaluate and discuss the detection and mapping of vanadium bearing magnetite, found in discordant bodies and magnetite layers, on the Eastern Limb of the Bushveld Complex. The Bushveld Complex hosts the world’s largest resource of high-grade primary vanadium in magnetitite layers, so the wide distribution of magnetite, its economic importance, and its potential as an indicator of many important geological processes warranted the delineation of magnetite.

 

The detection and mapping of the vanadium bearing magnetite was evaluated using specialized traditional, and advanced machine learning algorithms. Prior to this study, few studies had looked at the detection and exploration of magnetite using remote sensing, despite remote sensing tools having been regularly applied to diverse aspects of geosciences. Maximum Likelihood, Minimum Distance to Means, Artificial Neural Networks, Support Vector Machine classification algorithms were assessed for their respective ability to detect and map magnetite using the PlanetScope data in ENVI, QGIS, and Python. For each classification algorithm, a thematic landcover map was attained and the accuracy assessed using an error matrix, depicting the user's and producer's accuracies, as well as kappa statistics.

 

The Maximum Likelihood Classifier significantly outperformed the other techniques, achieving an overall classification accuracy of 84.58% and an overall kappa value of 0.79. Magnetite was accurately discriminated from the other thematic landcover classes with a user’s accuracy of 76.41% and a producer’s accuracy of 88.66%. The erroneous classification of some mining activity pixels as magnetite in the Maximum Likelihood was inherent to all classification algorithms. The overall results of this study illustrated that remote sensing techniques are effective instruments for geological mapping and mineral investigation, especially in iron oxide mineralization in the Eastern Limb of Bushveld Complex. 

 

How to cite: Twala, M., Roberts, J., and Munghemezulu, C.: Use of multispectral remote sensing data to map magnetite bodies in the Bushveld Complex, South Africa: a case study of Roossenekal, Limpopo., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7932, https://doi.org/10.5194/egusphere-egu21-7932, 2021.

EGU21-7895 | vPICO presentations | ESSI1.12

Application of optical and radar data for lineaments mapping in Kerdous inlier of the Anti Atlas belt, Morocco

Amine Jellouli, Abderrazak El Harti, Zakaria Adiri, Mohcine Chakouri, Jaouad El Hachimi, and El Mostafa Bachaoui

Lineament mapping is an important step for lithological and hydrothermal alterations mapping. It is considered as an efficient research task which can be a part of structural investigation and mineral ore deposits identification. The availability of optical as well as radar remote sensing data, such as Landsat 8 OLI, Terra ASTER and ALOS PALSAR data, allows lineaments mapping at regional and national scale. The accuracy of the obtained results depends strongly on the spatial and spectral resolution of the data. The aim of this study was to compare Landsat 8 OLI, Terra ASTER, and radar ALOS PALSAR satellite data for automatic and manual lineaments extraction. The module Line of PCI Geomatica software was applied on PC1 OLI, PC3 ASTER and HH and HV polarization images to automatically extract geological lineaments. However, the manual extraction was achieved using the RGB color composite of the directional filtered images N - S (0°), NE - SW (45°) and E - W (90°) of the OLI panchromatic band 8. The obtained lineaments from automatic and manual extraction were compared against the faults and photo-geological lineaments digitized from the existing geological map of the study area. The extracted lineaments from PC1 OLI and ALOS PALSAR polarizations images showed the best correlation with faults and photo-geological lineaments. The results indicate that the lineaments extracted from HH and HV polarizations of ALOS PALSAR radar data used in this study, with 1499 and 1507 extracted lineaments, were more efficient for structural lineament mapping, as well as the PC1 OLI image with 1057 lineaments.

Keywords Remote Sensing . OLI. ALOS PALSAR . ASTER . Kerdous Inlier . Anti Atlas

How to cite: Jellouli, A., El Harti, A., Adiri, Z., Chakouri, M., El Hachimi, J., and Bachaoui, E. M.: Application of optical and radar data for lineaments mapping in Kerdous inlier of the Anti Atlas belt, Morocco, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7895, https://doi.org/10.5194/egusphere-egu21-7895, 2021.

EGU21-6438 | vPICO presentations | ESSI1.12

Improving the Classification Accuracy of Fragmented Cropland by using an Advanced Classification Algorithm

Dr. Shreedevi Moharana, Dr. BVNP Kambhammettu, Mr. Syam Chintala, Ms. Arjangi Sandhya Rani, and Dr. Ram Avtar

Fragmented crop land and marginal landholdings play an important role to classify the landuse and adopt different cropping and management practices. Here the implementation crop classification algorithms are very much difficult and produce results with lower accuracy. Static imagery captured in the optical bands are often contaminated with cloud cover and fail to detect the phenological as well as the structural changes happening during the crop growth. This is very common and most typical in Indian climatic condition. Here, during monsoon period capturing temporal satellite images of the crop periods is a very challenging task. Therefore, the present study aims at application of a novel crop classification algorithm that utilizes the temporal patterns of synthetic aperture radar (SAR) datasets from Sentinel-1 in mapping of landuse of an agriculture system, that is fragmented, small and heterogeneous in nature. Here we used different polarization of Sentinel-1 datasets and developed the temporal crop patterns of different crops grown in semi-arid region of India. Further, an advanced classification algorithm such as time weighted dynamic time wrapping (TWDTW) is employed to classify the cropland with a higher accuracy. Pixel based image analysis was carried out and tested their applicability for cropland mapping. In-situ data sets are collected from the study site to validate the exhibited results from classification outputs. The overall accuracy of the pixel based TWDTW method performed very good results with accuracy of 63 %. The Kappa coefficient is found to be 0.58. The findings confirmed that the pixel based TWDTW algorithm has the potential to delineate the croplands, which were subjected to varying irrigation treatments and management practices, using sentinel-1 datasets.

Keywords: crop classification, landuse, image analysis, Sentinel-1, TWDTW

How to cite: Moharana, Dr. S., Kambhammettu, Dr. B., Chintala, Mr. S., Sandhya Rani, Ms. A., and Avtar, Dr. R.: Improving the Classification Accuracy of Fragmented Cropland by using an Advanced Classification Algorithm, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6438, https://doi.org/10.5194/egusphere-egu21-6438, 2021.

ESSI1.19 – Stability and Accuracy of Earth satellite measurements through calibration and validation

EGU21-1327 | vPICO presentations | ESSI1.19

Community geometric standards for remote sensing products

Guoqing (Gary) Lin, Robert Wolfe, Bin Tan, and Jaime Nickeson

We have developed a set of geometric standards for assessing earth observing data products derived from space-borne remote sensors.  We have worked with the European Space Agency (ESA) Earthnet Data Assessment Pilot (EDAP) project to provide a set of guidelines to assess geometric performance in data products from commercial electronic-optical remote sensors aboard satellites such as those from Planet Labs. The guidelines, or the standards, are based on performance from a few NASA procured sensors, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) sensors, the Visible Infrared Imaging Radiometer Suite (VIIRS) sensors and the Advanced Baseline Imager (ABI) sensors. The standards include sensor spatial response, absolute positional accuracy, and band-to-band co-registration. They are tiered in “basic”, “intermediate” and “goal” criteria. These are important geometric factors affecting scientific use of remote sensing data products. We also discuss possible approaches achieving the highest goal in geometric performance standards.

How to cite: Lin, G. (., Wolfe, R., Tan, B., and Nickeson, J.: Community geometric standards for remote sensing products, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1327, https://doi.org/10.5194/egusphere-egu21-1327, 2021.

EGU21-7919 | vPICO presentations | ESSI1.19

Calibration and Validation of Infrared Sounders with Moon and Mercury

Martin Burgdorf, Stefan A. Buehler, Viju John, Thomas Müller, and Marc Prange

Serendipitous observations of airless bodies of the inner solar system provide a unique means to the calibration of instruments on meteorological research satellites, because the physical properties of their surfaces change very little, even on large time scales. We investigated how certain instrumental effects can be characterised with observations of the Moon and Mercury. For this we identified and analysed intrusions of the Moon in the deep space views of HIRS/2, /3, and /4 (High-resolution Infrared Sounder) on various satellites in polar orbits and as well some images obtained with SEVIRI (Spinning Enhanced Visible Infra-Red Imager) on MSG-3 and -4 (Meteosat Second Generation), which had Mercury standing close to the Earth in the rectangular field of view.

A full-disk, infrared Moon model was developed that describes how the lunar flux density depends on phase angle and wavelength. It is particularly helpful for inter-calibration, checks of the photometric consistency of the sounding channels, and the calculation of an upper limit on the non-linearity of the shortwave channels of HIRS. In addition, we used the Moon to determine the co-registration of the different spectral channels.

Studies of the channel alignment are also presented for SEVIRI, an infrared sounder with an angular resolution about a hundred times better than HIRS. As we wanted to check the image quality of this instrument with a quasi-point source as well, we replaced here the Moon with Mercury. We found the typical smearing of the point spread function in the scan direction and occasionally a nearby ghost image, which is three to four times fainter than the main image of the planet. Both effects cause additional uncertainties of the photometric calibration.  

How to cite: Burgdorf, M., Buehler, S. A., John, V., Müller, T., and Prange, M.: Calibration and Validation of Infrared Sounders with Moon and Mercury, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7919, https://doi.org/10.5194/egusphere-egu21-7919, 2021.

EGU21-8481 | vPICO presentations | ESSI1.19

Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) Newly Released V5.2 Validation of Ozone and Water Vapor Data

Susan Kizer, David Flittner, Marilee Roell, Robert Damadeo, Carrie Roller, Dale Hurst, Emrys Hall, Allen Jordan, Patrick Cullis, Bryan Johnson, and Richard Querel

The Stratospheric Aerosol and Gas Experiment III (SAGE III) instrument installed on the International Space Station (ISS) has completed over three and a half years of data collection and production of science data products. The SAGE III/ISS is a solar and lunar occultation instrument that scans the light from the Sun and Moon through the limb of the Earth’s atmosphere to produce vertical profiles of aerosol, ozone, water vapor, and other trace gases. It continues the legacy of previous SAGE instruments dating back to the 1970s to provide data continuity of stratospheric constituents critical for assessing trends in the ozone layer. This presentation shows the validation results of comparing SAGE III/ISS ozone and water vapor vertical profiles from the newly released v5.2 science product with those of in situ and satellite data .

How to cite: Kizer, S., Flittner, D., Roell, M., Damadeo, R., Roller, C., Hurst, D., Hall, E., Jordan, A., Cullis, P., Johnson, B., and Querel, R.: Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) Newly Released V5.2 Validation of Ozone and Water Vapor Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8481, https://doi.org/10.5194/egusphere-egu21-8481, 2021.

EGU21-8702 | vPICO presentations | ESSI1.19

Atmospheric Correction Inter-comparison eXercise: the second implementation 

Georgia Doxani, Eric F. Vermote, Sergii Skakun, Ferran Gascon, and Jean-Claude Roger

The atmospheric correction inter-comparison exercise (ACIX) is an international initiative to benchmark various state-of-the-art atmospheric correction (AC) processors. The first inter-comparison exercise initiated in 2016 with the collaboration of European Space Agency (ESA) and National Aeronautics and Space Administration (NASA) in the frame of the CEOS WGCV (Committee on Earth Observation Satellites, Working Group on Calibration & Validation). The evolution of the participating processors and the increasing interest of AC community to repeat and improve such experiment stimulated the continuation of ACIX and its second implementation (ACIX-II). In particular, 12 AC developer teams from Europe and USA participated in ACIX-II over land sites. In this presentation the benchmarking protocol, i.e. test sites, input data, inter-comparison metrics, etc. will be briefly described and some representative results of ACIX-II will be presented. The inter-comparison outputs varied depending on the sensors, products and sites, demonstrating the strengths and weaknesses of the corresponding processors. In continuation of ACIX-I achievements, the outcomes of the second one are expected to provide an enhanced standardised approach to inter-compare AC processing products, i.e. Aerosol Optical Thickness (AOT), Water Vapour (WV) and Surface Reflectance (SR), and quantitively assessed their quality when in situ measurements are available.

How to cite: Doxani, G., Vermote, E. F., Skakun, S., Gascon, F., and Roger, J.-C.: Atmospheric Correction Inter-comparison eXercise: the second implementation , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8702, https://doi.org/10.5194/egusphere-egu21-8702, 2021.

The NASA/ESA Joint Program Planning Group (JPPG) subgroup on satellite calibration/validation was created to facilitate coordinated efforts between ESA, NASA, and their respective investigator communities to enhance calibration and/or validation activities for current and/or future satellite missions. The cooperation enabled through this activity includes airborne campaigns, use of surface-based measurements, and satellite-to-satellite intercomparisons. Numerous examples of such activities exist over the ten years of the JPPG. In this talk, examples of calibration/validation focused activities, accomplishments, and future plans will be presented. A particular focus will be on how the COVID-19 pandemic has affected field work planned for 2020 and 2021.  The JPPG subgroup also includes joint European-US studies of satellite results that integrate the results of both parties’ observational capabilities, and the status of those activities will be presented as well.

How to cite: Kaye, J. and Davidson, M.: Satellite Calibration/Validation and Related Activities Carried out through NASA/ESA Joint Program Planning Group Subgroup, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8731, https://doi.org/10.5194/egusphere-egu21-8731, 2021.

EGU21-9030 | vPICO presentations | ESSI1.19

Characterization of the in-flight spectral response function of Geostationary Environment Monitoring Spectrometer (GEMS) retrieved using observed solar irradiance

Mina Kang, Myoung-Hwan Ahn, Dai Ho Ko, Jhoon Kim, Dennis Nicks, Mijin Eo, Yeeun Lee, Kyung-Jung Moon, and Dong-Won Lee

The successful launch of Geostationary Environment Monitoring Spectrometer (GEMS) onboard the Geostationary Korea Multipurpose Satellite 2B (GK-2B) opens up a new possibility to provide daily air quality information for trace gases and aerosols over East Asia with high spatiotemporal resolution. As a part of major efforts to calibrate and validate the performance of the GEMS, accurate characterization of the spectral response functions (SRFs) is critical. The characteristics of preflight SRFs examined in terms of shape, width, skewness, and kurtosis vary smoothly along both the spectral and spatial direction thanks to highly symmetrical optic system of GEMS. While the preflight SRFs are determined with high accuracy, there is possibility of changes of in-flight SRFs during the harsh launch processes and/or operations over the mission lifetime. Thus, it is important to verify the in-flight SRFs after launch and to continue monitoring of their variability over time to assure the reliable trace gases retrievals. Here, we retrieve the in-flight SRFs for all spectral and spatial domain of the GEMS using spectral fitting of observed daily solar measurement and high-resolution solar reference spectrum. A variety of analytic model functions including hybrid form of Gaussian and flat-topped function, asymmetric super Gaussian, Voigt function are tested to determine the best representative function for GEMS SRF. The SRFs retrieved from early solar irradiances measured during the in-orbit tests agree well with the preflight SRFs indicating that no significant change occurred during the launch process. Continuous monitoring of the in-flight SRF is planned, using daily solar irradiances to investigate the temporal variation along with spectral and spatial directions. The detailed results of the in-flight SRF retrieval are to be presented.

How to cite: Kang, M., Ahn, M.-H., Ko, D. H., Kim, J., Nicks, D., Eo, M., Lee, Y., Moon, K.-J., and Lee, D.-W.: Characterization of the in-flight spectral response function of Geostationary Environment Monitoring Spectrometer (GEMS) retrieved using observed solar irradiance, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9030, https://doi.org/10.5194/egusphere-egu21-9030, 2021.

EGU21-9360 | vPICO presentations | ESSI1.19

Characterization of GEMS level 1B based on inter-comparison using the visible channel of AMI

Yeeun Lee, Myoung-Hwan Ahn, Mijin Eo, Mina Kang, Kyung-jung Moon, Dai-Ho Ko, Jhoon Kim, and Dong-won Lee

             The Geostationary Korean Multi-Purpose Satellite (GK-2) program consisting of GK-2A and GK-2B provides consistent monitoring information in the Asia Pacific region, including the Korean peninsula. The Geostationary Environment Monitoring Spectrometer (GEMS) onboard GK-2B in particular provides information on the atmospheric composition and aerosol properties, retrieved from the calibrated radiance (Level 1B) with high spectral resolution in 300-500 nm. GEMS started its extended validation measurement after the in-orbit test (IOT) in October following the launch of the satellite in February 2020. One of issues found during the IOT is that GEMS shows a spatial dependence in the measured solar irradiance along the north-south direction, albeit the solar irradiance does not have such a dependency. Thus, such a dependence should be from the optical system or the solar diffuser which is placed in front of the scan mirror. To clarify the root cause of the dependence, we utilize inter-comparison of the Earth measurement between GEMS and the Advanced Meteorological Imager (AMI), a multi-channel imager onboard GK-2A for meteorological monitoring. As the spectral range of GEMS fully covers the spectral response function (SRF) of the AMI visible channel having a central wavelength of 470 nm, spectral matching is properly done by convolving the SRF with the hyperspectral data of GEMS. By taking advantage of the fact that the position of GK-2A and GK-2B is maintained within a 0.5 degree square box centered at 128.2°E, match-up data set for the inter-comparison is prepared by temporal and spatial collocation. To reduce spatio-temporal mis-match and increase the signal to noise, zonal mean is applied to the collocated data. Results show that the north-south dependence occurs in the comparison of reflectance, the ratio between the earth radiance and solar irradiance, while not in the comparison of radiance. This indicates the dependence occurs due to the characteristics of the solar diffuser, not because of optical system. It is further deduced that dependence of diffuser transmittance on the solar azimuth angle is the main cause of the north-south dependency which was not characterized during the pre-flight ground test.

How to cite: Lee, Y., Ahn, M.-H., Eo, M., Kang, M., Moon, K., Ko, D.-H., Kim, J., and Lee, D.: Characterization of GEMS level 1B based on inter-comparison using the visible channel of AMI, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9360, https://doi.org/10.5194/egusphere-egu21-9360, 2021.

EGU21-10066 | vPICO presentations | ESSI1.19

LIME: the Lunar Irradiance Model of the European Space Agency

Sarah Taylor, Stefan Adriaensen, Carlos Toledano, África Barreto, Emma Woolliams, and Marc Bouvet

Absolute calibration of Earth observation sensors is key to ensuring long term stability and interoperability, essential for long term global climate records and forecasts. The Moon provides a photometrically stable calibration source, within the range of the Earth radiometric levels, and is free from atmospheric interference. However, to use this ideal calibration source, one must model the variation of its disk integrated irradiance resulting from changes in Sun-Earth-Moon geometries.

LIME, the Lunar Irradiance Model of the European Space Agency, is a new lunar irradiance model developed from ground-based observations acquired using a lunar photometer operating from the Izaña Atmospheric Observatory and Teide Peak, Tenerife. Approximately 300 lunar observations acquired between March 2018 and October 2020 currently contribute to the model, which builds on the widely-used ROLO (Robotic Lunar Observatory) model.

This presentation will outline the strategy used to derive LIME. First, the instrument was calibrated traceably to SI and characterised to determine its thermal sensitivity and its linearity over the wide dynamic range required. Second, the instrument was installed at the observatory, and nightly observations over a two-hour time window were extrapolated to provide top-of-atmosphere lunar irradiance using the Langley plot method. Third, these observations were combined to derive the model. Each of these steps includes a metrologically rigorous uncertainty analysis.

Comparisons to several EO sensors will be presented including Proba-V, Pleiades and Sentinel 3A and 3B, as well as a comparison to GIRO, the GSICS implementation   of the ROLO model. Initial results indicate LIME predicts 3% - 5% higher disk integrated lunar irradiance than the GIRO/ROLO model for the visible and near-infrared channels. The model has an expanded (k = 2) absolute radiometric uncertainty of ~2%, and it is expected that planned observations until at least 2024 will further constrain the model in subsequent updates.

How to cite: Taylor, S., Adriaensen, S., Toledano, C., Barreto, Á., Woolliams, E., and Bouvet, M.: LIME: the Lunar Irradiance Model of the European Space Agency, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10066, https://doi.org/10.5194/egusphere-egu21-10066, 2021.

EGU21-13302 | vPICO presentations | ESSI1.19

New and improved data from the Pandonia Global Network for satellite validation

Alexander Cede, Martin Tiefengraber, Manuel Gebetsberger, Michel Van Roozendael, Henk Eskes, Christophe Lerot, Diego Loyola, Nicolas Theys, Isabelle De Smedt, Nader Abuhassan, Thomas Hanisco, Angelika Dehn, Jonas Von Bismarck, Stefano Casadio, Luke Valin, and Barry Lefer

The worldwide operating Pandonia Global Network (PGN) is measuring atmospheric trace gases at high temporal resolution with the purpose of air quality monitoring and satellite validation. It is an activity carried out jointly by NASA and ESA as part of their “Joint Program Planning Group Subgroup” on calibration and validation and field activities, with additional collaboration from other institutions, most notably a strongly growing participation of the US Environmental Protection Agency (EPA). The more than 50 official PGN instruments are homogeneously calibrated and their data are centrally processed in real-time. Since 2019, total NO2 column amounts from the PGN are uploaded daily to the ESA Atmospheric Validation Data Centre (EVDC), where they are used for operational validation of Sentinel 5P (S5P) retrievals. During 2020, a new processor version 1.8 has been developed, which produces improved total NO2 column amounts and also the following new PGN products: total columns of O3, SO2 and HCHO based on direct sun observations and tropospheric columns, surface concentrations and tropospheric profiles of NO2 and HCHO based on sky observations. In this presentation we show some first examples of comparisons of the new PGN products with S5P data. Compared to the total NO2 columns from the previous processor version 1.7, the 1.8 data use better estimations for the effective NO2 temperature and the air mass factor. The effect of this improvement on the comparison with S5P retrievals is shown for some remote and high-altitude PGN sites. The new PGN total O3 column algorithm also retrieves the effective O3 temperature, which is a rather unique feature for ground-based direct sun retrievals. This allows us to analyze whether potential differences to satellite O3 columns might be influenced by the O3 temperature. Including the O3 temperature in the spectral fitting has also allowed the retrieval of accurate total SO2 columns. This PGN data product is of particular interest for satellite validation, as ground-based total SO2 column amounts are hardly measured by other instrumentation. An initial comparison of the PGN SO2 columns with S5P retrievals at selected PGN sites around the world is shown. PGN total HCHO columns from direct sun measurements are now possible for those PGN instruments, where the hardware parts made of Delrin, which outgasses HCHO, have been replaced by Nylon pieces. An initial comparison to HCHO retrievals from S5P is shown for locations with these upgraded instruments. Another new feature in the 1.8 PGN data is that they come with comprehensive uncertainty estimations, separated in the output files as independent, structured, common and total uncertainty.

How to cite: Cede, A., Tiefengraber, M., Gebetsberger, M., Van Roozendael, M., Eskes, H., Lerot, C., Loyola, D., Theys, N., De Smedt, I., Abuhassan, N., Hanisco, T., Dehn, A., Von Bismarck, J., Casadio, S., Valin, L., and Lefer, B.: New and improved data from the Pandonia Global Network for satellite validation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13302, https://doi.org/10.5194/egusphere-egu21-13302, 2021.

EGU21-14222 | vPICO presentations | ESSI1.19

Air-LUSI: Supporting Advancement of the Moon as a Reference for Earth Observations from Space

Kevin Turpie, Steven Brown, John Woodward, Thomas Stone, Andrew Gadsden, Steven Grantham, Thomas Larason, Stephen Maxwell, Andrew Cataford, and Andrew Newton

To monitor global environments from space, satellites must be calibrated accurately and consistently across time, missions and instruments.  This requires the use of a stable, common reference that is continuously accessible to Earth observing satellites, whether they make up series of missions spanning long periods of time or comprise constellations acquiring many simultaneous observations across the planet.  The Moon can serve well as such a common reference.  Its surface reflectance is stable to within one part in 108.  It is theorized that its radiant output with time changes repeatedly and very predictably with viewing and illumination geometry.  In addition, it has a radiant flux more comparable to the Earth’s surface than the Sun and can be viewed directly by the instrument.  Currently, to predict the lunar irradiance given an illumination and viewing geometry, the United States Geological Survey (USGS) has developed the Robotic Lunar Observatory (ROLO) Model of exo-atmospheric lunar spectral irradiance. The USGS ROLO model represents the current most precise knowledge of lunar spectral irradiance and is used frequently as a relative calibration standard by space-borne Earth-observing sensors.  Current knowledge of the Moon's spectral irradiance is thought to be limited to 5-10% uncertainty.  However, monitoring changing Earth environments calls for an absolute lunar reference with higher accuracy. 

The development of the ROLO model and subsequent attempts to better characterize the lunar spectral irradiance cycle were based on observations made from the Earth surface.  This requires applying corrections to remove effects of the atmosphere, which limits the accuracy.  The Airborne LUnar Spectral Irradiance (Air-LUSI) system was developed to make highly accurate, SI-traceable measurements of lunar spectral irradiance from NASA’s ER-2 aircraft flying at 21 km, above 95% of the atmosphere.  To that end, the air-LUSI system employs an autonomous, robotic telescope system that tracks the Moon in flight and a stable spectrometer housed in an enclosure providing a robustly controlled environment.  During November 2019, the Air-LUSI system was demonstrated with flights on five consecutive nights acquiring observations of the Moon at lunar phases of 10°, 21°, 34°, 46°, and 59°.  Air-LUSI is now ready for operational use.  This paper provides an overview of this new capability and how it, along with other efforts underway, can help transform how we monitor the Earth from space.

How to cite: Turpie, K., Brown, S., Woodward, J., Stone, T., Gadsden, A., Grantham, S., Larason, T., Maxwell, S., Cataford, A., and Newton, A.: Air-LUSI: Supporting Advancement of the Moon as a Reference for Earth Observations from Space, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14222, https://doi.org/10.5194/egusphere-egu21-14222, 2021.

EGU21-14656 | vPICO presentations | ESSI1.19

Toward a Climate and Calibration Observatory in space: NASA CLARREO Pathfinder and ESA TRUTHS

Nigel Fox, Yolanda Shea, Thorsten Fehr, Fleming Gary, Constantine Lukashin, Peter Pilewskie, John Remedios, and Paul Smith

The number, range and criticality of applications of Earth viewing optical sensors is increasing rapidly.  Not only from national/international space agencies but also through the launch of commercial constellations such as those of planet and the concept of Analysis Ready Data (ARD) reducing the skill needed for utilisation of the data.  However, no one organisation can provide all the tools necessary, and the need for a coordinated holistic earth observing system has never been greater. Achieving this vision has led to international initiatives coordinated by bodies such as the Committee on Earth Observation Satellites (CEOS and Global Space Inter-Calibration System (GISCS) of WMO to establish strategies to facilitate interoperability and the understanding and removal of bias through post-launch Calibration and Validation. 

In parallel, the societal challenge resulting from climate change has been a major stimulus for significantly improved accuracy and trust of satellite data. Instrumental biases and uncertainty must be sufficiently small to minimise the multi-decadal timescales needed to detect small trends and attribute their cause, enabling them to become unequivocally accepted as evidence. 

Although there have been many advances in the pre-flight SI-traceable calibration of optical sensors, in the last decade, unpredictable degradation in performance from both launch and operational environment remains a major difficulty.  Even with on-board calibration systems, uncertainties of less than a few percent are rarely achieved and maintained and the evidential link to SI-traceability is weak. For many climate observations the target uncertainty needs to be improved ten-fold. 

However, this decade will hopefully see the launch of two missions providing spectrally resolved observations of the Earth at optical wavelengths, CLARREO Pathfinder on the International Space Station from NASA [1] and TRUTHS from ESA [2] to change this paradigm.  Both payloads are explicitly designed to achieve uncertainties close to the ideal observing system, commensurate with the needs of climate, with robust SI-Traceability evidenced in space.  Not only can they make high accuracy climate quality observations of the Earth and in the case of TRUTHS also the Sun, but they will also transfer their SI-traceable uncertainty to other sensors.  In this way creating the concept of a ‘metrology laboratory in space’, providing a ‘gold standard’ reference to anchor and improve the calibration of other sensors. The two missions achieve their traceability in orbit through differing methods but will use synergistic approaches for establishing in-flight cross-calibrations.  This paper will describe these strategies and illustrate the benefit through examples where improved accuracy has the most impact on the Earth observing system.

The complementarity and international value of these missions has ensured a strong partnership during early development phases of the full CLARREO mission and that of the NPL conceived TRUTHS. Following a proposal by the UK Space Agency  and subsequent adoption into the ESA EarthWatch program this partnership is further strengthened with the ESA team and a vision that together the two missions can lay the foundation of a framework for a future sustainable international climate and calibration observatory to the benefit of the global Earth Observing community.

References

[1]  https://clarreo-pathfinder.larc.nasa.gov/

[2] https://www.npl.co.uk/earth-observation/truths

How to cite: Fox, N., Shea, Y., Fehr, T., Gary, F., Lukashin, C., Pilewskie, P., Remedios, J., and Smith, P.: Toward a Climate and Calibration Observatory in space: NASA CLARREO Pathfinder and ESA TRUTHS, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14656, https://doi.org/10.5194/egusphere-egu21-14656, 2021.

EGU21-15144 | vPICO presentations | ESSI1.19

The Joint ESA-NASA Tropical Campaign Activity – Aeolus Calibration/Validation and Science in the Topics

Thorsten Fehr, Gail Skofronick-Jackson, Vassilis Amiridis, Jonas von Bismarck, Shuyi Chen, Cyrille Flamant, Rob Koopman, Christian Lemmerz, Griša Močnik, Tommaso Parrinello, Aaron Piña, and Anne Grete Straume

The Tropics are covering around 40% of the globe and are home to approximately 40% of the world population. However, numerical weather prediction (NWP) for this region still remains challenging due to the lack of basic observations and incomplete understanding of atmospheric processes, also affecting extratropical storm developments. As a result, the largest impact of the ESA’s Aeolus satellite observations on NWP is expected in the Tropics where only a very limited number of wind profile observations from the ground can be performed.

An especially important case relating to the predictability of tropical weather system is the outflow of Saharan dust, its interaction with cloud micro-physics and the overall impact on the development of tropical storms over the Atlantic Ocean. The region of the coast of West Africa uniquely allows the study of the Saharan Aerosol layer, African Easterly Waves and Jets, Tropical Easterly Jet, as well as the deep convection in ITCZ and their relation to the formation of convective systems and the transport of dust.

Together with international partners, ESA and NASA are currently implementing a joint Tropical campaign from July to August 2021 with its base in Cape Verde. The campaign objective is to provide information on the validation and preparation of the ESA missions Aeolus and EarthCARE, respectively, as well as supporting a range of related science objectives for the investigation in the interactions between African Easterly and other tropical waves with the mean flow, dust and their impact on the development of convective systems; the structure and variability of the marine boundary layer in relation to initiation and lifecycle of the convective cloud systems within and across the ITCZ; and impact of wind, aerosol, clouds, and precipitation effects on long range dust transport and air quality over the western Atlantic.

The campaign comprises a unique combination of both strong airborne and ground-based elements collocated on Cape Verde. The airborne component with wind and aerosol lidars, cloud radars, in-situ instrumentation and additional observations includes the NASA DC-8 with science activities coordinated by the U. of Washington, the German DLR Falcon-20, the French Safire Falcon-20 with activities led by LATMOS, and the Slovenian Aerovizija Advantic WT-10 light aircraft in cooperation with the U. Novo Gorica. The ground-based component led by the National Observatory of Athens is a collaboration of more than 25 European teams providing in-situ and remote sensing aerosol and cloud measurements with a wide range of lidar, radar and radiometer systems, as well as drone observatins by the Cyprus Institute.

In preparation for the field campaign, the NASA and ESA management and science teams are closely collaborating with regular coordination meetings, in particular in coordinating the shift of the activity by one year due to the COVID-19 pandemic. The time gained has been used to further consolidate the planning, and in particular with a dry-run campaign organized by NASA with European participation where six virtual flights were conducted in July 2020.

 This paper will present a summary of the campaign preparation activities and the consolidated plan for the 2021 Tropical campaign.

How to cite: Fehr, T., Skofronick-Jackson, G., Amiridis, V., von Bismarck, J., Chen, S., Flamant, C., Koopman, R., Lemmerz, C., Močnik, G., Parrinello, T., Piña, A., and Straume, A. G.: The Joint ESA-NASA Tropical Campaign Activity – Aeolus Calibration/Validation and Science in the Topics, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15144, https://doi.org/10.5194/egusphere-egu21-15144, 2021.

EGU21-15161 | vPICO presentations | ESSI1.19

Pi-MEP Salinity – an ESA-NASA Platform for sustained satellite surface salinity validation 

Roberto Sabia, Sebastien Guimbard, Nicolas Reul, Tony Lee, Julian Schanze, Nadya Vinogradova, David Le Vine, Fred Bingham, Fabrice Collard, Klaus Scipal, and Henri Laur

The Pilot Mission Exploitation Platform (Pi-MEP) for Salinity (www.salinity-pimep.org) has been released operationally in 2019 to the broad oceanographic community, in order to foster satellite sea surface salinity validation and exploitation activities.

Specifically, the Platform aims at enhancing salinityvalidation, by allowing systematic inter-comparison of various EO datasets with a broad suite of in-situ data, and also at enabling oceanographic process studies by capitalizing on salinity data in synergy with additional spaceborne estimates.

 

Despite Pi-MEP was originally conceived as an ESA initiative to widen the uptake of the Soil Moisture and Ocean Salinity (SMOS) mission data over ocean, a project partnership with NASA was devised soon after the operational deployment, and an official collaboration endorsed within the ESA-NASA Joint Program Planning Group (JPPG).

 

The Salinity Pi-MEP has therefore become a reference hub for SMOS, SMAP and Aquarius satellite salinity missions, which are assessed in synergy with additional thematic datasets (e.g., precipitation, evaporation, currents, sea level anomalies, ocean color, sea surface temperature). 

Match-up databases of satellite/in situ (such as Argo, TSG, moorings, drifters) data and corresponding validation reports at different spatiotemporal scales are systematically generated; furthermore, recently-developed dedicated tools allow data visualization, metrics computation and user-driven features extractions.

 

The Platform is also meant to monitor salinity in selected oceanographic “case studies”, ranging from river plumes monitoring to SSS characterization in challenging regions, such as high latitudes or semi-enclosed basins.

 

The two Agencies are currently collaborating to widen the Platform features on several technical aspects - ranging from a triple-collocation software implementation to a sustained exploitation of data from the SPURS-1/2 campaigns. In this context, an upgrade of the satellite/in-situ match-up methodology has been recently agreed, resulting into a redefinition of the validation criteria that will be subsequently implemented in the Platform.

 

A further synthesis of the three satellites salinity algorithms, models and auxiliary data handling is at the core of the ESA Climate Change Initiative (CCI) on Salinity and of ESA-NASA further collaboration.

How to cite: Sabia, R., Guimbard, S., Reul, N., Lee, T., Schanze, J., Vinogradova, N., Le Vine, D., Bingham, F., Collard, F., Scipal, K., and Laur, H.: Pi-MEP Salinity – an ESA-NASA Platform for sustained satellite surface salinity validation , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15161, https://doi.org/10.5194/egusphere-egu21-15161, 2021.

EGU21-15166 | vPICO presentations | ESSI1.19

NASA-ESA Cooperation on the SBG and CHIME Hyperspectral Satellite Missions: a roadmap for the joint Working Group on Cal/Val activities

Valentina Boccia, Jennifer Adams, Kurtis J. Thome, Kevin R. Turpie, Raymond Kokaly, Marc Bouvet, Robert O. Green, and Michael Rast

Imaging spectroscopy has been identified by ESA, NASA and other international space agencies as key to addressing a number of most important scientific and environmental management objectives. To implement the critical EU- and related policies for the management of natural resources, assets and benefits, and to achieve the objectives outlined by NASA’s Decadal Survey in ecosystem science, hydrology and geology, high fidelity imaging spectroscopy data with global coverage and high spatial resolution are required. As such, ESA’s CHIME (Copernicus Hyperspectral Imaging Mission for the Environment) and NASA’s SBG (Surface Biology and Geology) satellite missions aim to provide imaging spectroscopy data at global coverage at regular intervals of time with high spatial resolution.

However, the scientific and applied objectives motivate more spatial coverage and more rapid revisit than any one agency’s observing system can provide. With the development of SBG and CHIME, the mid-to-late 2020s will see more global coverage spectroscopic observing systems, whereby these challenging needs can be more fully met by a multi-mission and multi-Agency synergetic approach, rather than by any single observing system.

Therefore, an ESA-NASA cooperation on imaging spectroscopy space missions was seen as a priority for collaboration, specifically given the complementarity of mission objectives and measurement targets of the SBG and CHIME. Such cooperation is now being formalized as part of the ESA-NASA Joint Program Planning Group activities.

Among the others, calibration and validation activities (Cal/Val) are fundamental for imaging spectroscopy while the satellites are in-orbit and operating. They determine the quality and integrity of the data provided by the spectrometers and become even more crucial when data from different satellites, carrying different imaging sensors, are used by users worldwide in a complementary and synergetic manner, like it will be the case for CHIME and SBG data. Indeed, Cal/Val activities not only have enormous downstream impacts on the accuracy and reliability of the products, but also facilitate cross-calibration and interoperability among several imaging spectrometers, supporting their synergistic use. Accordingly, within the context of this cooperation, a Working Group (WG) on Calibration/Validation has been set up, aiming to establish a roadmap for future SBG-CHIME coordination activities and collaborative studies.

This contribution aims to outline the key areas of cooperation between SBG and CHIME in terms of Calibration and Validation, and present the establishment of a roadmap between the two missions, focusing on the following topics:

  • Establishing an end-to-end cal/val strategy for seamless data products across missions, including transfer standards;
  • Measurement Networks and commonly recognised Cal/Val reference sites;
  • Status of atmospheric radiative transfer and atmospheric–correction procedures;
  • Standardisation and Quality Control of reference data sets;
  • Definition and implementation of joint airborne spectroscopy campaigns, such as the executed 2018 and planned 2021 campaigns, to simulate both missions and exercise the capabilities needed for eventual interoperability (incl. data collection, calibration, data product production);
  • Continuous validation throughout the lifetime of products;
  • Identifying other opportunities for efficiency and success through cooperation on calibration and validation, downlink capabilities and shared algorithms (e.g. compression and on-board data reduction).

How to cite: Boccia, V., Adams, J., Thome, K. J., Turpie, K. R., Kokaly, R., Bouvet, M., Green, R. O., and Rast, M.: NASA-ESA Cooperation on the SBG and CHIME Hyperspectral Satellite Missions: a roadmap for the joint Working Group on Cal/Val activities, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15166, https://doi.org/10.5194/egusphere-egu21-15166, 2021.

In this paper, the authors propose to describe the methodologies developed for the validation of Very High-Resolution (VHR) optical missions within the Earthnet Data Assessment Pilot (EDAP) Framework.  The use of surface-based, drone, airborne, and/or space-based observations to build calibration reference is playing a fundamental role in the validation process. A rigorous validation process must compare mission data products with independent reference data suitable for the satellite measurements. As a consequence, one background activity within EDAP is the collection, the consolidation of reference data of various nature depending on the validation methodology.

The validation methodologies are conventionally divided into three categories; i.e. validations of the measurement, the geometry and the image quality. The validation of the measurement requires an absolute calibration reference. This latter on is built up by using either in situ measurements collected with RadCalNet[1] stations or by using space based observations performed with “gold” mission (Sentinel-2, Landsat-8) over Pseudo Invariant Calibration Site (PICS). For the geometric validation, several test sites have been set up. A test site is equipped with data from different reference sources. The full usability of a test site is not systematic. It depends on the validation metrics and the specifications of the sensor, particularly the spatial resolution and image acquisition geometry. Some existing geometric sites are equipped with Ground Control Point (GCP) set surveyed by using Global Navigation Satellite System (GNSS) devices in the field.  In some cases, the GCP set comes in support to the refinement of an image observed with drones in order to produce a raster reference, subsequently used to validate the internal geometry of images under assessment. Besides, a limiting factor in the usage of VHR optical ortho-rectified data is the accuracy of the Digital Surface Model (DSM) / Digital Terrain Model (DTM). In order to separate errors due to terrain elevation and error due to the sensor itself, some test sites are also equipped with very accurate Light Detection and Ranging (LIDAR) data.

The validation of image quality address all aspect related to the spatial resolution and is strongly linked to both the measurement and the geometry. The image quality assessments are performed with both qualitative and quantitative approaches. The quantitative approach relies on the analysis of artificial ground target images and lead to the estimate of Modulation Transfer Function (MTF) together with additional image quality parameters such as Signal to Noise Ratio (SNR). On the other hand, the qualitative approach assesses the interpretability of input images and leads to a rating scaling[2] which is strongly related to the sensor Ground Resolution Distance (GRD). This visual inspection task required a database including very detailed image of man-made objects. This database is considered within EDAP as a reference.

[1] https://www.radcalnet.org

[2] https://fas.org/irp/imint/niirs.htm

How to cite: Saunier, S.: Reference Data and Methods for Validation of Very High Resolution Optical Data Within ESA / EDAP Project, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15501, https://doi.org/10.5194/egusphere-egu21-15501, 2021.

ESSI2.2 – Find, access, share and use data across the globe: Infrastructure solutions for Earth System Sciences

EGU21-8458 | vPICO presentations | ESSI2.2

The ICOS Carbon Portal as example of a  FAIR community data repository supporting scientific workflows

Alex Vermeulen, Margareta Hellström, Oleg Mirzov, Ute Karstens, Claudio D'Onofrio, and Harry Lankreijer

The Integrated Carbon Observation System (ICOS) provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO2 partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.
As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.

In this presentation we will introduce the ICOS operational data repository named ICOS Carbon Portal that is based on the linked open data approach. All metadata is modelled in an ontology coded in OWL and based on a RDF triple store that is available through an open SparQL endpoint. The repository supports versioning, collections and models provenance through a simplified Prov-O ontology. All data objects are ingested under strict control for the identified data types on provision of the correct and sufficient (provenance) metadata, data format and data integrity. All data, including raw data, is stored in the long term trusted repository  B2SAFE with two replicates. On top of the triple store and SparQL endpoint we have built a series of services, APIs and graphical interfaces that allow machines to machine and user interaction with the data and metadata. Examples are a full faceted search with connected data cart and download facility, preview of higher level data products (time series of  point observations and spatial data), and cloud computing services like eddy covariance data processing and on demand atmospheric footprint calculations, all connected to the observational data from ICOS.  Another interesting development is the community support for scientific workflows using Jupyter notebook services that connect to our repository through a dedicated python library for direct metadata and data access.

How to cite: Vermeulen, A., Hellström, M., Mirzov, O., Karstens, U., D'Onofrio, C., and Lankreijer, H.: The ICOS Carbon Portal as example of a  FAIR community data repository supporting scientific workflows, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8458, https://doi.org/10.5194/egusphere-egu21-8458, 2021.

EGU21-15394 | vPICO presentations | ESSI2.2

EPOS-Norway Portal

Jan Michalek, Kuvvet Atakan, Christian Rønnevik, Helga Indrøy, Lars Ottemøller, Øyvind Natvik, Tor Langeland, Ove Daae Lampe, Gro Fonnes, Jeremy Cook, Jon Magnus Christensen, Ulf Baadshaug, Halfdan Pascal Kierulf, Bjørn-Ove Grøtan, Odleiv Olesen, John Dehls, and Valerie Maupin

The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aim of the Norwegian EPOS e‑infrastructure is to integrate data from the seismological and geodetic networks, as well as the data from the geological and geophysical data repositories. Among the six EPOS-N project partners, four institutions are providing data – University of Bergen (UIB), - Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR.

In this contribution, we present the EPOS-Norway Portal as an online, open access, interactive tool, allowing visual analysis of multidimensional data. It supports maps and 2D plots with linked visualizations. Currently access is provided to more than 300 datasets (18 web services, 288 map layers and 14 static datasets) from four subdomains of Earth science in Norway. New datasets are planned to be integrated in the future. EPOS-N Portal can access remote datasets via web services like FDSNWS for seismological data and OGC services for geological and geophysical data (e.g. WMS). Standalone datasets are available through preloaded data files. Users can also simply add another WMS server or upload their own dataset for visualization and comparison with other datasets. This portal provides unique way (first of its kind in Norway) for exploration of various geoscientific datasets in one common interface. One of the key aspects is quick simultaneous visual inspection of data from various disciplines and test of scientific or geohazard related hypothesis. One of such examples can be spatio-temporal correlation of earthquakes (1980 until now) with existing critical infrastructures (e.g. pipelines), geological structures, submarine landslides or unstable slopes.  

The EPOS-N Portal is implemented by adapting Enlighten-web, a server-client program developed by NORCE. Enlighten-web facilitates interactive visual analysis of large multidimensional data sets, and supports interactive mapping of millions of points. The Enlighten-web client runs inside a web browser. An important element in the Enlighten-web functionality is brushing and linking, which is useful for exploring complex data sets to discover correlations and interesting properties hidden in the data. The views are linked to each other, so that highlighting a subset in one view automatically leads to the corresponding subsets being highlighted in all other linked views.

How to cite: Michalek, J., Atakan, K., Rønnevik, C., Indrøy, H., Ottemøller, L., Natvik, Ø., Langeland, T., Lampe, O. D., Fonnes, G., Cook, J., Christensen, J. M., Baadshaug, U., Kierulf, H. P., Grøtan, B.-O., Olesen, O., Dehls, J., and Maupin, V.: EPOS-Norway Portal, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15394, https://doi.org/10.5194/egusphere-egu21-15394, 2021.

EGU21-15205 | vPICO presentations | ESSI2.2

SIOS Data Management System: distributed data system for Earth System Science

Dariusz Ignatiuk, Øystein Godøy, Lara Ferrighi, Inger Jennings, Christiane Hübner, Shridhar Jawak, and Heikki Lihavainen

Svalbard Integrated Arctic Earth Observing System (SIOS) is an international consortium to develop and maintain a regional observing system in Svalbard and the associated waters. SIOS brings together the existing infrastructure and data of its members into a multidisciplinary network dedicated to answering Earth System Science (ESS) questions related to global change. The Observing System is built around “SIOS core data” – long-term data series collected by SIOS partners. SIOS Data Management System (SDMS) is dedicated to harvesting information on historical and current datasets from collaborating thematic and institutional data centres and making them available to users. A central data access portal is linked to the data repositories maintained by SIOS partners, which manage and distribute data sets and their associated metadata. The integrity of the information and harmonisation of data is based on internationally accepted protocols assuring interoperability of data, standardised documentation of data through the use of metadata and standardised interfaces by data systems through the discovery of metadata. By these means, SDMS is working towards FAIR data compliance (making data findable, accessible, interoperable and reusable), among other initiatives through the H2020 funded ENVRI-FAIR project (http://envri.eu/envri-fair/).

How to cite: Ignatiuk, D., Godøy, Ø., Ferrighi, L., Jennings, I., Hübner, C., Jawak, S., and Lihavainen, H.: SIOS Data Management System: distributed data system for Earth System Science, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15205, https://doi.org/10.5194/egusphere-egu21-15205, 2021.

EGU21-9400 | vPICO presentations | ESSI2.2

The brokering framework empowering WMO Hydrological Observing System (WHOS)

Enrico Boldrini, Paolo Mazzetti, Fabrizio Papeschi, Roberto Roncella, Mattia Santoro, Massimiliano Olivieri, Stefano Nativi, Silvano Pecora, Igor Chernov, and Claudio Caponi

The WMO Commission of Hydrology (CHy) is realizing the WMO Hydrological Observing System (WHOS), a software (and human) framework with the aim of improving sharing of hydrological data and knowledge worldwide.

National Hydrological Services (NHS) are already sharing on the web (both archived and near real time) data collected in each country, using disparate publication services. WHOS is leveraging the Discovery and Access Broker (DAB) technology developed and operated in its cloud infrastructure by CNR-IIA to realize WHOS-broker, a key component of WHOS architecture. WHOS-broker is in charge of harmonizing the available and heterogeneous metadata, data and services making the already published information more accessible to scientists (e.g. modelers), decision makers and general public worldwide. 

WHOS-broker supports many service interfaces and API that hydrological application builders already can leverage, example given OGC SOS, OGC CSW, OGC WMS, ESRI Feature Service, CUAHSI WaterOneFlow, DAB REST API, USGS RDB, OAI-PMH/WIGOS, THREDDS. New API and service protocols are continuously added to support new applications, being WHOS-broker a modular and flexible framework with the aim of enabling interoperability and assuring it as the standards will change/evolve through time. 

Three target programmes have already benefited from WHOS:

  • La Plata river basin: hydro and meteo data from Argentina, Bolivia, Brazil, Paraguay, Uruguay are harmonized and shared by WHOS-broker to the benefit of different applications, one of them is the Plata Basin Hydrometeorological Forecasting and Early Warning System (PROHMSAT-Plata model, developed by HRC), based on CUAHSI WaterOneFlow and experts from the five countries.
  • Arctic-HYCOS: hydro data from Canada, Finland, Greenland, Iceland, Norway, Russia, United States are harmonized and shared by WHOS-broker to the benefit of different applications, one of them is the WMO HydroHub Arctic portal, based on ESRI technologies.
  • Dominican Republic: hydro and meteo data of Dominican Republic published by different originators is being harmonized by WHOS-broker to the benefit of different applications, one of them is the Met data explorer application developed by BYU based on THREDDS catalog service.

The three programmes should act as a driving force for more to follow, by demonstrating possible applications that can be built on top of WHOS.

The public launch of WHOS official homepage at WMO is expected by mid 2021, will include:

  • A dedicated web portal based on Water Data Explorer application developed by BYU
  • Results from the three programs
  • Detailed information on how to access WHOS data by using one of the many WHOS-broker service interfaces
  • An online training course for data providers interested in WHOS
  • The WHOS Hydro Ontology, leveraged by WHOS-broker in order to both semantically augment user queries and harmonize results (e.g. in case of synonyms of the same concept in different languages).

How to cite: Boldrini, E., Mazzetti, P., Papeschi, F., Roncella, R., Santoro, M., Olivieri, M., Nativi, S., Pecora, S., Chernov, I., and Caponi, C.: The brokering framework empowering WMO Hydrological Observing System (WHOS), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9400, https://doi.org/10.5194/egusphere-egu21-9400, 2021.

EGU21-15148 | vPICO presentations | ESSI2.2

Teams Win: The European Datacube Federation

Peter Baumann

Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework in particular for regular and irregular spatio-temporal datacubes. Services grounding on such common understanding have proven more uniform to access and handle, implementing a principle of "minimal surprise" for users visiting different portals while using their favourite clients. Data combination and fusion benefits from canonical metadata allowing automatic alignment, e.g, between 2D DEMs, 3D satellite image time series, 4D atmospheric data, etc.

The EarthServer datacube federation s showing the way towards unleashing in full the potential of pixels for supporting the UN Sustainable Development Goals, local governance, and also businesses. EarthServer is an open, free, transparent, and democratic network of data centers offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on the open OGC standards, an ecosystem of data has been established that is available to users as a single pool, without the need for any coding skills (such as python). A specific unique capability is location-transparency: clients can fire their query against any of the mebers, and the federation nodes will figure out the optimal work distribution irrespective of data location.

The underlying datacube engine, rasdaman, enables all datacube access, analytics, and federation. Query evaluation is optimized automatically applying highly efficient intelligent, rule-based methods in homogeneous and heterogeneous mashups, up to satellite on-board deployments as done in the ORBiDANSe project. Users perceive one single, common information space accessible through a wide spectrum of open-source and proprietary clients.

In our talk we present technology, services, and governance of this unique line-up of data centers. A demo will show distributed datacube fusion live.

 

How to cite: Baumann, P.: Teams Win: The European Datacube Federation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15148, https://doi.org/10.5194/egusphere-egu21-15148, 2021.

EGU21-23 | vPICO presentations | ESSI2.2

Towards Developing Community Guidelines for Sharing and Reusing Quality Information of Earth Science Datasets

Carlo Lacagnina, Ge Peng, Robert R. Downs, Hampapuram Ramapriyan, Ivana Ivanova, David F. Moroni, Yaxing Wei, Lucy Bastin, Nancy A. Ritchey, Gilles Larnicol, Lesley A. Wyborn, Chung-Lin Shie, Ted Habermann, Anette Ganske, Sarah M. Champion, Mingfang Wu, Irina Bastrakova, Dave Jones, and Gary Berg-Cross

The knowledge of data quality and the quality of the associated information, including metadata, is critical for data use and reuse. Assessment of data and metadata quality is key for ensuring credible available information, establishing a foundation of trust between the data provider and various downstream users, and demonstrating compliance with requirements established by funders and federal policies.

Data quality information should be consistently curated, traceable, and adequately documented to provide sufficient evidence to guide users to address their specific needs. The quality information is especially important for data used to support decisions and policies, and for enabling data to be truly findable, accessible, interoperable, and reusable (FAIR).

Clear documentation of the quality assessment protocols used can promote the reuse of quality assurance practices and thus support the generation of more easily-comparable datasets and quality metrics. To enable interoperability across systems and tools, the data quality information should be machine-actionable. Guidance on the curation of dataset quality information can help to improve the practices of various stakeholders who contribute to the collection, curation, and dissemination of data.

This presentation outlines a global community effort to develop international guidelines to curate data quality information that is consistent with the FAIR principles throughout the entire data life cycle and inheritable by any derivative product.

How to cite: Lacagnina, C., Peng, G., Downs, R. R., Ramapriyan, H., Ivanova, I., Moroni, D. F., Wei, Y., Bastin, L., Ritchey, N. A., Larnicol, G., Wyborn, L. A., Shie, C.-L., Habermann, T., Ganske, A., Champion, S. M., Wu, M., Bastrakova, I., Jones, D., and Berg-Cross, G.: Towards Developing Community Guidelines for Sharing and Reusing Quality Information of Earth Science Datasets, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-23, https://doi.org/10.5194/egusphere-egu21-23, 2021.

EGU21-10547 | vPICO presentations | ESSI2.2

Data flow, harmonization, and quality control

Brenner Silva, Philipp Fischer, Sebastian Immoor, Rudolf Denkmann, Marion Maturilli, Philipp Weidinger, Steven Rehmcke, Tobias Düde, Norbert Anselm, Peter Gerchow, Antonie Haas, Christian Schäfer-Neth, Angela Schäfer, Stephan Frickenhaus, and Roland Koppe and the Computing and Data Centre of the Alfred-Wegener-Institute

Earth system cyberinfrastructures include three types of data services: repositories, collections, and federations. These services arrange data by their purpose, level of integration, and governance.  For instance, registered data of uniform measurements fulfill the goal of publication but do not necessarily flow in an integrated data system. The data repository provides the first and high level of integration that strongly depends on the standardization of incoming data. One example here is the framework Observation to Archive and Analysis (O2A) that is operational and continuously developed at the Alfred-Wegener-Institute, Bremerhaven. A data repository is one of the components of the O2A framework and much of its functionality depends on the standardization of the incoming data. In this context, we focus on the development of a modular approach to provide the standardization and quality control for the monitoring of the near real-time data. Two modules are under development. First, the driver module transforms different tabular data to a common format. Second, the quality control module that runs the quality tests on the ingested data. Both modules rely on the sensor operator and on the data scientist, two actors that interact with both ends of the ingest component of the O2A framework (http://data.awi.de/o2a-doc). We demonstrate the driver and the quality control modules in the data flow within Digital Earth showcases that also connect repositories and federated databases to the end-user. The end-user is the scientist, who works closely in the development approach to ensure applicability. The result is the proven benefit of harmonizing data and metadata of multiple sources, easy integration and rapid assessment of the ingested data. Further, we discuss concepts and current development that aim at the enhanced monitoring and scientific workflow.

How to cite: Silva, B., Fischer, P., Immoor, S., Denkmann, R., Maturilli, M., Weidinger, P., Rehmcke, S., Düde, T., Anselm, N., Gerchow, P., Haas, A., Schäfer-Neth, C., Schäfer, A., Frickenhaus, S., and Koppe, R. and the Computing and Data Centre of the Alfred-Wegener-Institute: Data flow, harmonization, and quality control, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10547, https://doi.org/10.5194/egusphere-egu21-10547, 2021.

EGU21-5663 | vPICO presentations | ESSI2.2

Improved FAIR Data Publication Quality in Specialized Environmental Data Portals

Ionut Iosifescu Enescu, Gian-Kasper Plattner, Lucia Espona Pernas, Dominik Haas-Artho, and Rebecca Buchholz

Environmental research data from the Swiss Federal Research Institute WSL, an Institute of the ETH Domain, is published through the environmental data portal EnviDat (https://www.envidat.ch). EnviDat actively implements the FAIR (Findability, Accessibility, Interoperability and Reusability) principles and offers guidance and support to researchers throughout the research data publication process.

WSL strives to increase the fraction of environmental data easily available for reuse in the public domain. At the same time, WSL facilitates the publication of high-quality environmental research datasets by providing an appropriate infrastructure, a formal publication process and by assigning Document Object Identifiers (DOIs) and appropriate citation information.

Within EnviDat, we conceptualize and implement data publishing workflows that include automatic validation, interactive quality checks, and iterative improvement of metadata quality. The data publication workflow encompasses a number of steps, starting from the request for a DOI, to an approval process with a double-checking principle, and the submission of the metadata-record to DataCite for the final data publication. This workflow can be viewed as a decentralized peer-review and quality improvement process for safeguarding the quality of published environmental datasets. The workflow is being further developed and refined together with partner institutions within the ETH Domain.

We have defined and implemented additional features in EnviDat, such as (i) in-depth tracing of data provenance through related datasets; (ii) the ability to augment published research data with additional resources which support open science such as model codes and software; and (iii) a DataCRediT mechanism designed for specifying data authorship (Collection, Validation, Curation, Software, Publication, Supervision).

We foresee that these developments will help to further improve approaches targeted at modern documentation and exchange of scientific information. This is timely given the increasing expectations that institutions and researchers have towards capabilities of research data portals and repositories in the environmental domain.

How to cite: Iosifescu Enescu, I., Plattner, G.-K., Espona Pernas, L., Haas-Artho, D., and Buchholz, R.: Improved FAIR Data Publication Quality in Specialized Environmental Data Portals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5663, https://doi.org/10.5194/egusphere-egu21-5663, 2021.

EGU21-2139 | vPICO presentations | ESSI2.2

EASYDAB (Earth System Data Branding) for FAIR and Open Data

Anette Ganske, Amandine Kaiser, Angelina Kraft, Daniel Heydebreck, Andrea Lammert, and Hannes Thiemann

As in many scientific disciplines, there are a variety of activities in Earth system sciences that address the important aspects of good research data management. What has not been sufficiently investigated and dealt with so far is the easy discoverability and re-use of quality-checked data. This aspect is taken up by the EASYDAB label.

EASYDAB1 is a currently developed branding for FAIR and open data from the Earth System Sciences. The branding can be adopted by institutions running a data repository which stores data from the Earth System Sciences. EASYDAB is always connected to a research data publication with DataCite DOIs. Data published under EASYDAB are characterized by a high maturity, extensive metadata information and compliance with a comprehensive discipline-specific standard. For these datasets, the EASYDAB logo is added to the landing page of the data repository. Thereby, repositories can indicate their efforts to publish data with high maturity.

The first standard made for EASYDAB is the ATMODAT standard2, which has been developed within the AtMoDat3 project (Atmospheric Model Data). It incorporates concrete recommendations and requirements related to the maturity, publication and enhanced FAIRness of atmospheric model data. The requirements are for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages are a core element of the ATMODAT standard and should hold and present discipline-specific metadata on simulation and variable level. 

The ATMODAT standard includes checklists for the data producer and the data curator so that the compliance with the standard can easily be obtained by both sides. To facilitate automatic checking of the netCDF files headers, a checker program will also be provided and published with DOI. Moreover, a checker for the compliance with the requirements for the DOI Metadata will be developed and made openly available. 

The integration of standards from other disciplines in the Earth System Sciences, such as oceanography, into EASYDAB is helpful and desirable to improve the re-use of reviewed, high-quality data. 

 1www.easydab.de

2https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_0

3www.atmodat.de

How to cite: Ganske, A., Kaiser, A., Kraft, A., Heydebreck, D., Lammert, A., and Thiemann, H.: EASYDAB (Earth System Data Branding) for FAIR and Open Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2139, https://doi.org/10.5194/egusphere-egu21-2139, 2021.

EGU21-5965 | vPICO presentations | ESSI2.2

PalMod-II Data Management Plan: A FAIR-inspired conceptual framework for data simulation, inter-comparison, sharing and publication  

Swati Gehlot, Karsten Peters-von Gehlen, and Andrea Lammert

Large scale transient climate simulations and their intercomparison with paleo data within the German initiative PalMod (www.palmod.de, currently in phase II) provides an exclusive example of applying a Data Management Plan (DMP) to conceptualise  data workflows within and outside a large multidisciplinary project. PalMod-II data products include output of three state-of-the-art climate models with various coupling complexities and spatial resolutions  simulating the climate of the past 130,000 years. Additional to the long time series of model data, a comprehensive compilation of paleo-observation data (including a model-observation-comparison toolbox, Baudouin et al, 2021 EGU-CL1.2) is envisaged for validation. 

Owing to the enormous amount of data coming from models and observations, produced and handled by different groups of scientists spread across various institutions, a dedicated DMP as a living document provides a data-workflow framework for exchange and sharing of data within and outside the PalMod community. The DMP covers the data life cycle within the project starting from its generation (data formats and standards), analysis (intercomparison with models and observations), publication (usage, licences), dissemination (standardised, via ESGF) and finally archiving after the project lifetime. As an active and continually updated document, the DMP ensures the ownership and responsibilities of data subsets of various working groups along with their data sharing/reuse regulations within the working groups in order to ensure a sustained progress towards the project goals. 

This contribution discusses the current status and challenges of the DMP for PalMod-II which covers the details of data produced within various working groups, project-wide workflow strategy for sharing and exchange of data, as well as a definition of a PalMod-II variable list for ESGF standard publication. The FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles play a central role and are proposed for the entire life cycle of PalMod-II data products (model and proxy paleo data) for sharing/reuse during and after the project lifetime.



How to cite: Gehlot, S., Peters-von Gehlen, K., and Lammert, A.: PalMod-II Data Management Plan: A FAIR-inspired conceptual framework for data simulation, inter-comparison, sharing and publication  , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5965, https://doi.org/10.5194/egusphere-egu21-5965, 2021.

EGU21-8144 | vPICO presentations | ESSI2.2

A Standard for the FAIR publication of Atmospheric Model Data developed by the AtMoDat Project

Andrea Lammert, Anette Ganske, Amandine Kaiser, and Angelina Kraft

Due to the increasing amount of data produced in science, concepts for data reusability are of immense importance. One aspect is the publication of data in a way that ensures that it is findable, reusable, traceable and comparable (FAIR1 principles). However, putting these principles into practice often causes significant difficulties for researchers. Therefore some repositories accept datasets described only with the minimum metadata required for DOI allocation. Unfortunately, this contains not  enough information to conform to the FAIR principles - many research data cannot be reused despite having a DOI. In contrast, other repositories aid the researchers by providing advice and strictly controlling the data and their metadata. To simplify the process of defining the needed amount of metadata and of controlling the data and metadata, the AtMoDat2 (Atmospheric Model Data) project developed a detailed standard for the FAIR publication of atmospheric model data.

For this purpose we have developed a concept for the “ideal” description of atmospheric model data. A prerequisite for this is the data publication with a DataCite DOI. The ATMODAT standard3 was developed to implement this concept. The standard defines the data format as NetCDF, mandatory metadata (for DOI, landing page and data header), and naming conventions used in climate research - the Climate and Forecast conventions (CF-conventions4). However, many variable names used in urban climate research, for example, are not part of the CF-conventions. For this, standard names have to be defined together with the community and the inclusion in the list of CF-conventions has to be requested. Furthermore we developed and published Python routines which allow data producers as well as repositories to check model output data against the standard. 

The ATMODAT standard will first be applied by the project partners of the two participating universities (University of Hamburg and Leipzig). Here, climate model data are processed with a post-processor in preparation for publication. Subsequently, the files including the specified metadata for the DataCite metadata schema will be published by the World Data Center for Climate5 (WDCC). Data fulfilling the AtMoDat standard will be marked at the landing page by a special EASYDAB6 (Earth System Data Branding) logo. EASYDAB is a currently developed branding for FAIR and open data from the Earth System Sciences. This indicates to future data users that the dataset is a verified dataset that can be easily reused. The standardization of the data and the further steps are easily transferable to data from other disciplines.

1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 

2 https://www.atmodat.de/

3 https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_0

4 https://cfconventions.org/

5 https://cera-www.dkrz.de/WDCC/ui/cerasearch/

6 https://www.easydab.de/

 

How to cite: Lammert, A., Ganske, A., Kaiser, A., and Kraft, A.: A Standard for the FAIR publication of Atmospheric Model Data developed by the AtMoDat Project, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8144, https://doi.org/10.5194/egusphere-egu21-8144, 2021.

EGU21-13155 | vPICO presentations | ESSI2.2

The I-ADOPT Interoperability Framework: a proposal for FAIRer observable property descriptions

Barbara Magagna, Gwenaelle Moncoiffe, Maria Stoica, Anusuriya Devaraju, Alison Pamment, Sirko Schindler, and Robert Huber

Global environmental challenges like climate change, pollution, and biodiversity loss are complex. To understand environmental patterns and processes and address these challenges, scientists require the observations of natural phenomena at various temporal and spatial scales and across many domains. The research infrastructures and scientific communities involved in these activities are often following their own data management practices which inevitably leads to a high degree of variability and incompatibility of approaches. Consequently, a variety of metadata standards and vocabularies have been proposed to describe observations and are actively used in different communities. However, this diversity in approaches now causes severe issues regarding the interoperability across datasets and hampers their exploitation as a common data source.

Projects like ENVRI-FAIR, FAIRsFAIR, FAIRplus are addressing this difficulty by working on the full integration of services across research infrastructures based on FAIR Guiding Principles supporting the EOSC vision towards an open research culture. Beyond these projects, we need collaboration and community consensus across domains to build a common framework for representing observable properties. The Research Data Alliance InteroperAble Descriptions of Observable Property Terminology Working Group (RDA I-ADOPT WG) was formed in October 2019 to address this need. Its membership covers an international representation of terminology users and terminology providers, including terminology developers, scientists, and data centre managers. The group’s overall objective is to deliver a common interoperability framework for observable property variables within its 18-month work plan. Starting with the collection of user stories from research scientists, terminology managers, and data managers or aggregators, we drafted a set of technical and content-related requirements. A survey of terminology resources and annotation practices provided us with information about almost one hundred terminologies, a subset of which was then analysed to identify existing conceptualisation practices, commonalities, gaps, and overlaps. This was then used to derive a conceptual framework to support their alignment. 

In this presentation, we will introduce the I-ADOPT Interoperability Framework highlighting its semantic components. These represent the building blocks for specific ontology design patterns addressing different use cases and varying degrees of complexity in describing observed properties. We will demonstrate the proposed design patterns using a number of essential climate and essential biodiversity variables. We will also show examples of how the I-ADOPT framework will support interoperability between existing representations. This work will provide the semantic foundation for the development of more user-friendly data annotation tools capable of suggesting appropriate FAIR terminologies for observable properties.

How to cite: Magagna, B., Moncoiffe, G., Stoica, M., Devaraju, A., Pamment, A., Schindler, S., and Huber, R.: The I-ADOPT Interoperability Framework: a proposal for FAIRer observable property descriptions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13155, https://doi.org/10.5194/egusphere-egu21-13155, 2021.

EGU21-15922 | vPICO presentations | ESSI2.2

F-UJI : An Automated Tool for the Assessment and Improvement of the FAIRness of Research Data

Robert Huber and Anusuriya Devaraju

Making research data FAIR (Findable, Accessible, Interoperable, and Reusable) is critical to maximizing its impact. However, since the FAIR principles are designed as guidelines and do not specify implementation rules, it is difficult to verify the practice of these principles. Therefore, metrics and associated tools need to be developed to enable the assessment of FAIR compliance of services and datasets. Such practical solutions are important for many stakeholders to assess the quality of data-related services. They are important for selecting such services, but can also be used to iteratively improve data offerings, e.g., as part of FAIR advisory processes. With the increasing number of published datasets and the need to test them repeatedly, there is a growing body of literature that recognizes this importance of automated FAIR assessment tools. Our goal is to contribute to this area of FAIR through the development of an open source tool called F-UJI.  F-UJI supports programmatic FAIR assessment of research data based on a set of core metrics against which the implementation of FAIR principles can be assessed. This paper presents the development and application of F-UJI and the underlying metrics. For each of the metrics, we have designed and implemented practical tests based on existing standards and best practices for research data. The tests are important to our expanded understanding of how to test FAIR metrics in practice that have not been fully addressed in previous work on FAIR data assessment. We demonstrate the use of the tool by assessing several multidisciplinary datasets from selected trusted digital repositories, followed by recommendations for improving the FAIRness of these datasets. We summarize the experience and lessons learned from the development and testing.

How to cite: Huber, R. and Devaraju, A.: F-UJI : An Automated Tool for the Assessment and Improvement of the FAIRness of Research Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15922, https://doi.org/10.5194/egusphere-egu21-15922, 2021.

EGU21-2886 | vPICO presentations | ESSI2.2

CMIP6 data documentation and citation in IPCC's Sixth Assessment Report (AR6)

Martina Stockhause, Robin Matthews, Anna Pirani, Anne Marie Treguier, and Ozge Yelekci

The the amount of work and resources invested by the modelling centers to provide CMIP6 (Coupled Model Intercomparison Project Phase 6) experiments and climate projection datasets is huge, and therefore it is extremely important that the teams receive proper credit for their work. The Citation Service makes CMIP6 data citable with DOI references for the evolving CMIP6 model data published in the Earth System Grid Federation (ESGF). The Citation Service as a new piece of the CMIP6 infrastructure was developed upon the request from the CMIP Panel.

CMIP6 provides new global climate model data assessed in the IPCC's (Intergovernmental Panel on Climate Change) Sixth Assessment Report (AR6). Led by the Technical Support Unit of IPCC Working Group I (WGI TSU), the IPCC Task Group on Data Support for Climate Change Assessment (TG-Data) developed FAIR data guidelines, for implementation by the TSUs of the three IPCC WGs and the IPCC Data Distribution Centre (DDC) Partners. A central part of the FAIR data guidelines are the documentation and citation of data used in the report.

The contribution will show how CMIP6 data usage is documented in IPCC WGI AR6 from three angles: technical implementation, collection of CMIP6 data usage information from the IPCC authors, and a report users’ perspective.

 

Links:

  • CMIP6 Citation Service: http://cmip6cite.wdc-climate.de
  • CMIP6: https://pcmdi.llnl.gov/CMIP6/
  • IPCC AR6: https://www.ipcc.ch/assessment-report/ar6/
  • IPCC AR6 WGI report: https://www.ipcc.ch/report/sixth-assessment-report-working-group-i/
  • IPCC TG-Data: https://www.ipcc.ch/data/

How to cite: Stockhause, M., Matthews, R., Pirani, A., Treguier, A. M., and Yelekci, O.: CMIP6 data documentation and citation in IPCC's Sixth Assessment Report (AR6), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2886, https://doi.org/10.5194/egusphere-egu21-2886, 2021.

EGU21-5492 | vPICO presentations | ESSI2.2

Raiders of the Lost Code: Preserving the MOSS Codebase - Significance, Status, Challenges and Opportunities 

Peter Löwe, Māris Nartišs, and Carl N Reed

We report on the current status of the software repository of the Map Overlay and Statistical System (MOSS) and upcoming actions to ensure long term preservation of the codebase as a historic geospatial source. MOSS is the earliest known open source Geographic Information System (GIS). Active development of the vector-based interactive GIS by the U.S. Department of Interior began in 1977 on a CDC mainframe computer located at Colorado State University. Development continued until 1985 with MOSS being ported to multiple platforms, including DG-AOS, UNIX, VMS and Microsoft DOS. Many geospatial programming techniques and functionalities were first implemented in MOSS, including a fully interactive user interface and integrated vector and raster processing. The public availability of the WWW in the early 1990s sparked a growth of new Open Source GIS projects, which led to the formation of the Open Source Geospatial Foundation (OSGeo). The goal of OSGeo is to support and promote the collaborative development of open geospatial technologies and data. This includes best practices for project management and repositories for codebases. From its start, OSGeo recognised MOSS as the original forerunner project. After the decline of active use of MOSS since the 1990s, the U.S. Bureau of Land Management (BLM) continued to provide the open source MOSS codebase on an FTP-Server, which allowed use, analysis and reference by URL. This service was discontinued at some point before 2018, which was eventually discovered due to a broken URL link. This led to a global search and rescue effort among the OSGeo communities to track down remaining offline copies of the codebase. In mid 2020 a surviving copy of the MOSS codebase was discovered at the University of Latvia, which is temporarily preserved at the German Institute of Economic Research (DIW Berlin). OSGeo has agreed to make MOSS the first OSGeo Heritage Project to ensure long term preservation in a OSGeo code repository. This is a significant first step to enable MOSS-related research based on the FAIR (Findable, Accessible, Interoperable, Reusable) paradigm. Follow up actions will be required to enable scientific citation and credit by persistent identifiers for code and persons, such as Digital Object Identifiers (DOI) and Open Researcher Contributor Identification Initiative-ID (ORCID-ID) within the OSGeo repository environment. This will advance the OSGeo portfolio of best practices also for other open geospatial projects.

 

How to cite: Löwe, P., Nartišs, M., and Reed, C. N.: Raiders of the Lost Code: Preserving the MOSS Codebase - Significance, Status, Challenges and Opportunities , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5492, https://doi.org/10.5194/egusphere-egu21-5492, 2021.

EGU21-8294 | vPICO presentations | ESSI2.2

Enabling “LiDAR data processing” as a service in a Jupyter environment 

Spiros Koulouzis, Yifang Shi, Yuandou Wan, Riccardo Bianchi, Daniel Kissling, and Zhiming Zhao

Airborne Laser Scanning (ALS) data derived from Light Detection And Ranging (LiDAR) technology allow the construction of Essential Biodiversity Variables (EBVs) of ecosystem structure with high resolution at landscape, national and regional scales. Researchers nowadays often process such data, and rapidly prototype using script languages like R or python, and share their experiments via scripts or more recently via notebook environments, such as Jupyter. To scale experiments to large data volumes, extra data sources, or new models, researchers often employ Cloud infrastructures to enhance notebooks (e.g. Jupyter Hub) or execute the experiments as a distributed workflow. In many cases, a researcher has to encapsulate subsets of the code (namely, cells in Jupyter) from the notebook as components to be included in the workflow. However, it is usually time-consuming and a burden for the researcher to encapsulate those components based on the workflow systems' specific interface, where the Findability, Accessibility, Interoperability and Reusability (FAIR) of those components are often limited. We aim to enable the public cloud processing of massive amounts of ALS data across countries and regions and make the retrieval and uptake of such EBV data products of ecosystem structure easily available to a wide scientific community and stakeholders.

 

We propose and develop a tool called FAIR-Cells, that can be integrated into the Jupyter Lab environment as an extension,  to help scientists and researchers improve the FAIRness of their code. It can encapsulate user-selected cells of code as standardized RESTful API services, and allow users to containerize such Jupyter code cells and to publish them as reusable components via the community repositories.

 

We demonstrate the features of the FAIR-CELLS using an application from the ecology domain. Ecologists currently process various point cloud datasets derived from LiDAR to extract metrics that capture vegetation's vertical and horizontal structure. A new open-source software called ‘Laserchicken’ allows the processing of country-wide LiDAR datasets in a local environment (e.g. the Dutch national ICT infrastructure called SURF). However, the users have to use the Laserchicken application as a whole to process the LiDAR data. The capacity of the given infrastructure also limits the volume of data. In this work, we will first demonstrate how a user can apply the FAIR-Cells extension to interactively create RESTful services for the components in the Laserchicken software in a Jupyter environment, to automate the encapsulation of those services as Docker containers, and to publish the services in a community catalogue (e.g. LifeWatch) via the API (based on GeoNetwork). We will then demonstrate how those containers can be assembled as a workflow (e.g. using Common Workflow Language) and deployed on the cloud environment (offered by the EOSC early adopter program for ENVRI-FAIR) to process a much bigger dataset than in a local environment. The demonstration results suggest that our approach's technical roadmap can achieve FAIRness and behave good parallelism in large distributed volumes of data when executing the Jupyter-environment-based codes.

How to cite: Koulouzis, S., Shi, Y., Wan, Y., Bianchi, R., Kissling, D., and Zhao, Z.: Enabling “LiDAR data processing” as a service in a Jupyter environment , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8294, https://doi.org/10.5194/egusphere-egu21-8294, 2021.

EGU21-8418 | vPICO presentations | ESSI2.2

An online service for analysing ozone trends within EOSC-synergy

Tobias Kerzenmacher, Valentin Kozlov, Borja Sanchis, Ugur Cayoglu, Marcus Hardt, and Peter Braesicke

The European Open Science Cloud-Synergy (EOSC-Synergy) project delivers services that serve to expand the use of EOSC. One of these services, O3as, is being developed for scientists using chemistry-climate models to determine time series and eventually ozone trends for potential use in the quadrennial Global Assessment of Ozone Depletion, which will be published in 2022. A unified approach from a service like ours, which analyses results from a large number of different climate models, helps to harmonise the calculation of ozone trends efficiently and consistently. With O3as, publication-quality figures can be reproduced quickly and in a coherent way. This is done via a web application where users configure their queries to perform simple analyses. These queries are passed to the O3as service via an O3as REST API call. There, the O3as service processes the query and accesses the reduced dataset. To create a reduced dataset, regular tasks are executed on a high performance computer (HPC) to copy the primary data and perform data preparation (e.g. data reduction, standardisation and parameter unification). O3as uses EGI check-in (OIDC) to identify users and grant access to certain functionalities of the service, udocker (a tool to run Docker containers in multi-user space without root privileges) to perform data reduction in the HPC environment, and the Universitat Politècnica de València (UPV)  Infrastructure Manager to provision service resources (Kubernetes).

How to cite: Kerzenmacher, T., Kozlov, V., Sanchis, B., Cayoglu, U., Hardt, M., and Braesicke, P.: An online service for analysing ozone trends within EOSC-synergy, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8418, https://doi.org/10.5194/egusphere-egu21-8418, 2021.

EGU21-15669 | vPICO presentations | ESSI2.2

Applying VocPrez to operational semantic repositories: the NVS experience

Alexandra Kokkinaki, Quyen Luong, Christopher Thompson, Nicholas Car, and Gwenaelle Moncoiffe

The Natural Environment Research Council’s (NERC) Vocabulary Server (NVS1) has been serving the marine and wider community with controlled vocabularies for over a decade. NVS provides access to standardised lists of terms which are used for data mark-up, facilitating interoperability and discovery in the marine and associated earth science domains. The NVS controlled vocabularies are published as Linked Data on the web using the data model of the Simple Knowledge Organisation System (SKOS). They can also be accessed as web services (RESTFul, SOAP) or through a sparql endpoint. NVS is an operational semantic repository, which underpins data systems like SeaDataNet, the pan-European infrastructure of marine data management, and is embedded in SeaDataNet-specific tools like MIKADO. Its services are being constantly monitored by the SeaDataNet Argo monitoring system, ensuring a guarantee of reliability and availability. In this presentation we will discuss the pathway of challenges we encountered while enhancing an operational semantic repository like NVS with VocPrez, a read-only web delivery system for Simple Knowledge Organization System (SKOS)-formulated RDF vocabularies. We will also present our approach on implementing CI/CD delivery and the added value of VocPrez to NVS in terms of FAIRness. Finally we will discuss the lessons learnt during the lifecycle of this development. 

VocPrez2 is an open-source, pure Python, application that reads vocabularies from one or more sources and presents them online (HTTP) in several different ways: as human-readable web pages, using simple HTML templates for different SKOS objects and as machine-readable RDF or other formats, using mapping code. The different information model views supported by VocPrez are defined by profiles, that is, by formal specifications. VocPrez supports both different profiles and different formats (Media Types) for each profile.

VocPrez enhanced the publication of NVS both for human users and machines. Humans accessing NVS are presented with a new look and feel that is more user friendly, providing filtering of collections, concepts and thesauri, and sorting of results using different options. For machine-to-machine communication, VocPrez presents NVS content in machine-readable formats which Internet clients can request directly using the Content Negotiation by Profile standard3. The profiles and formats available are also listed on an “Alternate Profiles” web page which is automatically generated per resource thus allowing for discovery of options. As a result, human or machine end users can access NVS collections, thesauri and concepts according to different information models such as DCAT, NVS’ own vocabulary model or pure SKOS and also in different serializations like JSON-LD , turtle, etc. using content negotiation.

1http://vocab.nerc.ac.uk/

2https://github.com/RDFLib/VocPrez

3https://www.w3.org/TR/dx-prof-conneg/

How to cite: Kokkinaki, A., Luong, Q., Thompson, C., Car, N., and Moncoiffe, G.: Applying VocPrez to operational semantic repositories: the NVS experience, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15669, https://doi.org/10.5194/egusphere-egu21-15669, 2021.

EGU21-8903 | vPICO presentations | ESSI2.2

Generic concepts for organising data management in research projects

Ivonne Anders, Swati Gehlot, Andrea Lammert, and Karsten Peters-von Gehlen

Since few years Research Data Management is becoming an increasingly important part of scientific projects regardless of the number of topics or subjects, researchers or institutions involved. The bigger the project, the more are the data organization and data management requirements in order to assure the best outcome of the project. Despite this, projects rarely have clear structures or responsibilities for data management. The importance of clearly defining data management and also budgeting for it is often underestimated and/or neglected. A rather scarce number of reports and documentations explaining the research data management in certain projects and detailing best practice examples can be found in the current literature.  Additionally, these are often mixed up with topics of the general project management. Furthermore, these examples are very focused on the certain issues of the described projects and thus, a transferability (or general application) of provided methods is very difficult.

This contribution presents generic concepts of research data management with an effort to separate them from general project management tasks. Project size, details among the diversity of topics and the involved researcher, play an important role in shaping data management and determining which methods of data management can add value to the outcome of a project. We especially focus on different organisation types, including roles and responsibilities for data management in projects of different sizes. Additionally, we show how and when also education should be included, but also how important agreements in a project are.

How to cite: Anders, I., Gehlot, S., Lammert, A., and Peters-von Gehlen, K.: Generic concepts for organising data management in research projects, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8903, https://doi.org/10.5194/egusphere-egu21-8903, 2021.

ESSI2.6 – SMART Monitoring and Integrated Data Exploration of the Earth System

EGU21-8361 | vPICO presentations | ESSI2.6

Ensuring data trustworthiness within SMART Monitoring of environmental processes

Uta Koedel, Peter Dietrich, Philipp Fischer, and Claudia Schuetze

The term SMART Monitoring was also defined by the project Digital Earth (DE) , a central joint project of eight Helmholtz centers in Earth and Environment. SMART Monitoring in the sense of DE means that measured environmental parameters and values need to be specific/scalable, measurable/modular, accepted/adaptive, relevant/robust, and trackable/transferable (SMART) for sustainable use as data and improved real data acquisition. SMART Monitoring can be defined as a reliable monitoring approach with machine-learning, and artificial intelligence (A.I.) supported procedures for an “as automated as possible” data flow from individual sensors to databases. SMART Monitoring Tools must include various standardized data flows within the entire data lifecycle, e.g., specific sensor solutions, novel approaches for sampling designs, and defined standardized metadata descriptions. One of the SMART Monitoring workflows' essential components is enhancing metadata with comprehensive information on data quality. On the other hand, SMART Monitoring must be highly modular and adaptive to apply to different monitoring approaches and disciplines in the sciences.

In SMART monitoring, data quality is crucial, not only with respect to data FAIRness. It is essential to ensure data reliability and representativeness. Hence, comprehensively documented data quality is essential and required to enable meaningful data selection for specific data blending, integration, and joint interpretation. Data integration from different sources represents a prerequisite for parameterization and validation of predictive tools or models. This data integration demonstrates the importance of implementing the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) for sustainable data management (Wilkinson et al. 2016). So far, the principle of FAIRdata does not include a detailed description of data quality and does not cover content-related quality aspects. Even though data may be FAIR in terms of availability, it is not necessarily “good" in accuracy and precision. Unfortunately, there is still considerable confusion in science about the definition of good or trustworthy data.

An assessment of data quality and data origin is essential to preclude the possibility of inaccurate, incomplete, or even unsatisfactory data analysis applying, e.g., machine learning methods, and avoid poorly derived, misleading or incorrect conclusions. The terms trustworthiness and representativeness summarise all aspects related to these issues. The central pillars of trustworthiness/representativeness are validity, provenience/provenance, and reliability, which are fundamental features in assessing any data collection or processing step for transparent research. For all kinds of secondary data usage and analysis, a detailed description and assessment of reliability and validity involve an appraisal of applied data collection methods.

The presentation will give exemplary examples to show the importance of data trustworthiness and representativeness evaluation and description, allowing scientists to find appropriate tools and methods for FAIR data handling and more accurate data interpretation.

How to cite: Koedel, U., Dietrich, P., Fischer, P., and Schuetze, C.: Ensuring data trustworthiness within SMART Monitoring of environmental processes, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8361, https://doi.org/10.5194/egusphere-egu21-8361, 2021.

EGU21-7747 | vPICO presentations | ESSI2.6

Towards a definition of Essential Mountain Climate Variables

James Thornton, Elisa Palazzi, Nicholas Pepin, Paolo Cristofanelli, Richard Essery, Sven Kotlarski, Gregory Giuliani, Yaniss Guigoz, Aino Kulonen, Xiaofeng Li, David Pritchard, Hayley Fowler, Christophe Randin, Maria Shahgedanova, Martin Steinbacher, Marc Zebisch, and Carolina Adler

Numerous applications, including generating future predictions via numerical modelling, establishing appropriate policy instruments, and effectively tracking progress against them, require the multitude of complex processes and interactions operating in rapidly changing mountainous environmental systems to be well monitored and understood. At present, however, not only are environmental available data pertaining to mountains often severely limited, but interdisciplinary consensus regarding which variables should be considered absolute observation priorities remains lacking. In this context,  the concept of so-called Essential Mountain Climate Variables (EMCVs) is introduced as a potential means to identify critical observation priorities and thereby ameliorate the situation. Following a brief overview of the most critical aspects of ongoing and expected future climate-driven change in various key mountain system components (i.e. the atmosphere, cryosphere, biosphere and hydrosphere), a preliminary list of corresponding potential EMCVs – ranked according to perceived importance – is proposed. Interestingly, several of these variables do not currently feature amongst the globally relevant Essential Climate Variables (ECVs) curated by GCOS, suggesting this mountain-specific approach is indeed well justified. Thereafter, both established and emerging possibilities to measure, generate, and apply EMCVs are summarised. Finally, future activities that must be undertaken if the concept is eventually to be formalized and widely applied are recommended.

How to cite: Thornton, J., Palazzi, E., Pepin, N., Cristofanelli, P., Essery, R., Kotlarski, S., Giuliani, G., Guigoz, Y., Kulonen, A., Li, X., Pritchard, D., Fowler, H., Randin, C., Shahgedanova, M., Steinbacher, M., Zebisch, M., and Adler, C.: Towards a definition of Essential Mountain Climate Variables, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7747, https://doi.org/10.5194/egusphere-egu21-7747, 2021.

EGU21-3951 | vPICO presentations | ESSI2.6

A web-based visual-analytics tool for ad-hoc campaign planning in terrestrial hydrology

Erik Nixdorf, Daniel Eggert, Peter Morstein, Thomas Kalbacher, and Doris Dransch

A deeper understanding of the Earth system as a whole and its interacting sub-systems depends, perhaps more than ever, not only on accurate mathematical approximations of the physical processes but also on the availability of environmental data across temporal and spatial scales. Even though advanced numerical simulations and satellite-based remote sensing in conjunction with sophisticated algorithms such as machine learning tools can provide 4D environmental datasets, local and mesoscale measurements continue to be the backbone in many disciplines such as hydrology. Considering the limitations of human and technical resources, monitoring strategies for these types of measurements should be well designed to increase the information gain provided. One helpful set of tools to address these tasks are visual-analytical data exploration frameworks integrating qualified multi-parameter data from different sources and tailoring well-established computational and visual methods to explore and analyze it. In this context, we developed a smart monitoring workflow to determine the most suitable time and location for event-driven, ad-hoc monitoring in hydrology using soil moisture measurements as our target variable.

The Smart Monitoring workflow consists of three main steps. First is the identification of the region of interest, either via user selection or recommendation based on spatial environmental parameters provided by the user. Statistical filters and different color schemes can be applied to highlight potentially relevant regions. During the second step time-dependent environmental parameters (e.g., rainfall and soil moisture estimates of the recent past, weather predictions from numerical weather models and swath forecasts from Earth observation satellites) for those relevant regions can be evaluated to identify suitable time frames for the planned monitoring campaign. Lastly, a detailed assessment of the region of interest is conducted by applying filter and weight functions in combination with multiple linear regressions on selected input parameters. Depending on the measurement objective (e.g highest/lowest values, highest/lowest change), the most suitable areas for monitoring will subsequently be visually highlighted. Based on the common road network an efficient route for a corresponding monitoring campaign can be derived for the identified regions of interest and directly visualized in the visual-analytical environment

How to cite: Nixdorf, E., Eggert, D., Morstein, P., Kalbacher, T., and Dransch, D.: A web-based visual-analytics tool for ad-hoc campaign planning in terrestrial hydrology, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3951, https://doi.org/10.5194/egusphere-egu21-3951, 2021.

EGU21-15623 | vPICO presentations | ESSI2.6 | Highlight

The Digital Earth Viewer: A new visualization approach for geospatial time series data

Valentin Buck, Flemming Stäbler, Everardo Gonzalez, and Jens Greinert

The study of the earth’s systems depends on a large amount of observations from homogeneous sources, which are usually scattered around time and space and are tightly intercorrelated to each other. The understanding of said systems depends on the ability to access diverse data types and contextualize them in a global setting suitable for their exploration. While the collection of environmental data has seen an enormous increase over the last couple of decades, the development of software solutions necessary to integrate observations across disciplines seems to be lagging behind. To deal with this issue, we developed the Digital Earth Viewer: a new program to access, combine, and display geospatial data from multiple sources over time.

Choosing a new approach, the software displays space in true 3D and treats time and time ranges as true dimensions. This allows users to navigate observations across spatio-temporal scales and combine data sources with each other as well as with meta-properties such as quality flags. In this way, the Digital Earth Viewer supports the generation of insight from data and the identification of observational gaps across compartments.

Developed as a hybrid application, it may be used both in-situ as a local installation to explore and contextualize new data, as well as in a hosted context to present curated data to a wider audience.

In this work, we present this software to the community, show its strengths and weaknesses, give insight into the development process and talk about extending and adapting the software to custom usecases.

How to cite: Buck, V., Stäbler, F., Gonzalez, E., and Greinert, J.: The Digital Earth Viewer: A new visualization approach for geospatial time series data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15623, https://doi.org/10.5194/egusphere-egu21-15623, 2021.

EGU21-14621 | vPICO presentations | ESSI2.6

The use of ERDDAP in a self-monitoring and nowcast hazard alerting coastal flood system

Louise Darroch, Gardner Thomas, Yelland Margaret, Cardwell Christopher, Slater Emma, Bradshaw Elizabeth, Buck Justin, Jennings Robert, Hale Andrew, and Brown Jennifer

In the UK, £150bn of assets and 4 million people are at risk from coastal flooding. With reductions in public funding, rising sea levels and changing storm conditions, cost-effective and accurate early warning flood forecasting systems are required. However, numerical tools currently used to estimate wave overtopping are based on tank experiments and very limited previous field measurements of total overtopping volumes only. Furthermore, the setting of tolerable hazard thresholds in flood forecasting models requires site-specific information of wave overtopping during storms of varying severity. 

The National Oceanography Centre (NOC) are currently developing a new nowcast wave overtopping alert system that can be deployed in site-specific coastal settings to detect potentially dangerous flood conditions in near real-time (NRT) while validating operational forecasting services. At its core, it utilises a prototype overtopping sensor and an instance of the National Oceanic and Atmospheric Administration’s ERDDAP data server in a self-monitoring and alerting control system. In-situ detection will be performed by WireWall, a novel capacitance wire sensor that measures at the high (400 Hz) frequencies required to obtain the distribution of overtopping volume and horizontal velocity on a wave-by-wave basis. The sensor includes on-board data processing and 2-way telemetry to enable automation and control. The telemetry posts regular health summaries and high-resolution (1 sec) hazard data (produced by the on-board processing) using the standard internet protocol (https) to an open ERDDAP server so data are freely available via an application programming interface (API) alongside other NRT and delayed-mode global coastal ocean and weather information for further data exploration. ERDDAP allows NRT hazard data to be accessed by statistical algorithms and visual applications, as well as receiving alerts that are also fed to messaging queue points (RabbitMQ) that can be monitored by external systems. Combined, this will enable automated health monitoring and sensor operation as well as offer the potential for downstream hazard management tools (such as navigation systems and transport management systems) to ingest the nowcast wave overtopping hazard data. To integrate data with wider systems and different disciplines, ERDDAP data sets will be enriched with common and well-structured metadata. Data provenance, controlled vocabularies, Quality Control and attribution information embedded in the data workflow is fundamental to ensuring user trust in the data and any products generated, while enhancing FAIR data principles. 

The new nowcast wave overtopping alert system will be tested in 2021 during field deployments of multiple WireWall systems at two high energy coastal sites in the UK. Such data are crucial for validating operational flood forecast services as well as protecting local communities and minimising transport service disruptions. The addition of SMART monitoring optimises sensor maintenance and operation, reducing the costs associated with teams travelling to the site. Using ERDDAP embedded with well-structured metadata enables machines to access multiple flood parameters through a single point that abstracts users from the complexities associated with the source data, offering the potential for further data exploration through modelling or techniques such as machine learning. 

How to cite: Darroch, L., Thomas, G., Margaret, Y., Christopher, C., Emma, S., Elizabeth, B., Justin, B., Robert, J., Andrew, H., and Jennifer, B.: The use of ERDDAP in a self-monitoring and nowcast hazard alerting coastal flood system, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14621, https://doi.org/10.5194/egusphere-egu21-14621, 2021.

Soil moisture is a crucial variable in the earths critical zone. It depends on multiple factors such as climate, topographic conditions, soil characteristics and affects energy and water fluxes across the land-atmosphere interface and, therefore, is highly important for terrestrial ecosystems, ecosystem management and agriculture. The accurate mapping of soil moisture across time and space is challenging but highly desirable.

One option is to deploy ground-based moisture sensors at the point-scale and to interpolate and/or map the measurements into space. We have developed a data-driven approach to map the soil moisture in a reference area from point measurements at specific time points and the covariates: location, topographic conditions and soil characteristics. We tested the mapping capacity of different two machine-learning algorithms (Random Forest and Neural Networks) and compared those with Ordinary Kriging as standard method. Our questions were: 1) How accurate are the machine-learning methods for soil moisture mapping? 2) Which covariates are most important? 3) How does mapping accuracy vary with data density and temporal resolution? 

We used soil moisture data from the TERENO experimental sites Wüstebach and Rollesbroich located in western Germany. These small catchments are equipped with a dense network of soil moisture sensors using time domain reflectometry (TDR) that has been operated since 2010 (Bogena et al., 2010; Zacharias et al., 2011). From this, we created 2700 point-based soil moisture data sets at specific time points, specific depth and for various numbers of sensor locations. Then we merged these data sets with sampled data on soil texture and chemical composition (Qu et al., 2016; Gottselig, et al., 2017; ) as well as remote sensed terrain data. These time stamp specific point-based soil moisture measurements were mapped using Ordinary Kriging (OK), Random Forest (RF) and Neural Networks (ANN) using combinations of the soil and terrain attributes as well as geometric distances between sensor locations as covariates. Each model was trained (80% subset) and tested (20% subset) on the point-based data sets. 

In general, average model accuracy across the methods and individual data set types (depth, number of sensor locations, temporal averaging) was relatively low with R2 values of approximately 0.2-0.5. This originated in the high variability of soil moisture. Surprisingly, models using the spatial structure of the domain (using distances between sensors as covariates) already yield an R2 of approximately 0.45. Further adding covariates such as soil and terrain attributes did not substantially improve the accuracy for these models. In comparison,using only terrain attributes as covariates for RF and ANN did yield an accuracy of R2 of 0.25-0.27.The trained models were then used to map soil moisture onto the entire study area. This resulted in maps with interesting patterns that differed between the individual methods—even when using same covariate types.

Finally, it can be concluded that for spatial interpolation of soil moisture the Random Forest algorithm using distance between sensor locations as covariates is a promising alternative to Ordinary Kriging from the point of accuracy and simplicity.

How to cite: Boog, J. and Kalbacher, T.: Point to Space: Data-driven Soil Moisture Spatial Mapping using Machine-Learning for Small Catchments in Western Germany, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16308, https://doi.org/10.5194/egusphere-egu21-16308, 2021.

EGU21-9290 | vPICO presentations | ESSI2.6 | Highlight

Modeling methane from the North Sea region with ICON-ART

Christian Scharun, Roland Ruhnke, Michael Weimer, and Peter Braesicke

Methane (CH4) is the second most important greenhouse gas after CO2 affecting global warming. Various sources (e.g. fossil fuel production, agriculture and waste, biomass burning and natural wetlands) and sinks (the reaction with the OH-radical as the main sink contributes to tropospheric ozone production) determine the methane budget. Due to its long lifetime in the atmosphere methane can be transported over long distances.

Disused and active offshore platforms can emit methane, the amount being difficult to quantify. In addition, explorations of the sea floor in the North Sea showed a release of methane near the boreholes of both, oil and gas producing platforms. The basis of this study is the established emission data base EDGAR (Emission Database for Global Atmospheric Research), an inventory that includes methane emission fluxes in the North Sea region. While methane emission fluxes in the EDGAR inventory and platform locations are matching for most of the oil platforms almost all of the gas platform sources are missing in the database. We develop a method for estimating the missing emission sources based on the EDGAR inventory and the known locations of gas platforms as additional point sources will be inserted in the model.

In this study the global model ICON-ART (ICOsahedral Nonhydrostatic model - Aerosols and Reactive Trace gases) is used. ART is an online-coupled model extension for ICON that includes chemical gases and aerosols. One aim of the model is the simulation of interactions between the trace substances and the state of the atmosphere by coupling the spatiotemporal evolution of tracers with atmospheric processes. ICON-ART sensitivity simulations are performed with inserted and adjusted sources to access their influence on the methane and OH-radical distribution on regional (North Sea) and global scales.

How to cite: Scharun, C., Ruhnke, R., Weimer, M., and Braesicke, P.: Modeling methane from the North Sea region with ICON-ART, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9290, https://doi.org/10.5194/egusphere-egu21-9290, 2021.

ESSI2.10 – Joint JAXA-ESA Session on the Mutual Cooperation Using Synthetic Aperture Radar Satellites in Earth Science and Applications

EGU21-7100 | vPICO presentations | ESSI2.10 | Highlight

Synergetic use of L- and C-band SAR data in Earth Sciences – The JAXA-ESA mutual cooperation

Julia Kubanek, Malcolm Davidson, Maurice Borgeaud, Shin-ichi Sobue, and Takeo Tadono

Within the “Cooperation for the Use of Synthetic Aperture Radar Satellites in Earth Science and Applications”, the Japanese Aerospace Exploration Agency (JAXA) and the European Space Agency (ESA) agreed to mutually share C-band data from ESA’s Sentinel-1 mission and L-band data from JAXA’s ALOS-2 PALSAR-2 mission over selected test sites. Applications include wetland monitoring, hurricanes, sea ice, snow water equivalent and surface deformation.

The aim of the collaboration is to develop a better understanding of the benefits of combining L- and C-band data over various areas and for the different thematic applications. The findings of the different European, Japanese and international projects will help to develop future SAR satellite missions, such as JAXA’s ALOS-4, and ESA’s Copernicus mission ROSE-L and Sentinel-1 Next Generation.

This presentation will give an overview of the ongoing ESA-JAXA cooperation and will show highlights and first results of the different test sites and applications.

How to cite: Kubanek, J., Davidson, M., Borgeaud, M., Sobue, S., and Tadono, T.: Synergetic use of L- and C-band SAR data in Earth Sciences – The JAXA-ESA mutual cooperation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7100, https://doi.org/10.5194/egusphere-egu21-7100, 2021.

EGU21-534 | vPICO presentations | ESSI2.10 | Highlight

Japan's L-SAR missions

Shin-ich Sobue, Takeo Tadono, Satoko Miura, Akiko Noda, Takeshi Motooka, and Masato Ohki

Japan Aerospace Exploration Agency (JAXA) launched its first L-band SAR mission - Japanese Earth Resources Satellite (JERS-1) in 1992. Though the design life of JERS-1 was 2 years, the satellite had obtained observational data for more than 6 years and ended the mission in 1998. Following to JERS-1, Advanced Land Observing Satellite (ALOS) was launched in 2006. ALOS was equipped with three sensors: the Phased Array type L-band SAR (PALSAR), the Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM), and the Advanced Visible and Near Infrared Radiometer type 2 (AVNIR-2). ALOS's observation data has been used in various areas including disaster mitigation through observing regions damaged by earthquakes, tsunami, or typhoons, as well as carrying out forest monitoring, natural environment maintenance, agriculture, and compiling a 1/25,000 topographical map. When the Great East Japan Earthquake hit Japan in 2011, ALOS took some 400 images over disaster-stricken areas to provide information to all parties concerned.

Technologies acquired from the ALOS are succeeded to the second Advanced Land Observing Satellite “ALOS-2.”, which was successfully launched on 24th May 2014. The mission sensor of ALOS-2 is the Phased Array type L-band Synthetic Aperture Radar-2 called PALSAR-2 which is the state-of-the-art L-band SAR system. Until now after the successful completion of initial checkout after launching, ALOS-2 has been contributed to a lot of emergency observations for natural disasters, not only in Japan but also in the world. Furthermore, based on the Basic Observation Scenario (BOS) of ALOS-2, 10m global map data and other mode data are routinely collected and archived. This paper describes the results of ALOS-2 operation in nominal operation phase and outline of future ALOS series missions, especially ALOS-4 launched JFY2022.

How to cite: Sobue, S., Tadono, T., Miura, S., Noda, A., Motooka, T., and Ohki, M.: Japan's L-SAR missions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-534, https://doi.org/10.5194/egusphere-egu21-534, 2021.

EGU21-8660 | vPICO presentations | ESSI2.10

Combining Sentinel-1 and ALOS-2 observations for soil moisture retrieval

Anna Balenzano, Giuseppe Satalino, Francesco Lovergine, Davide Palmisano, Francesco Mattia, Michele Rinaldi, and Carsten Montzka

One of the limitations of presently available Synthetic Aperture Radar (SAR) surface soil moisture (SSM) products is their moderated temporal resolution (e.g., 3-4 days) that is non optimal for several applications, as most user requirements point to a temporal resolution of 1-2 days or less. A possible path to tackle this issue is to coordinate multi-mission SAR acquisitions with a view to the future Copernicus Sentinel-1 (C&D and Next Generation) and L-band Radar Observation System for Europe (ROSE-L).

In this respect, the recent agreement between the Japanese (JAXA) and European (ESA) Space Agencies on the use of SAR Satellites in Earth Science and Applications provides a framework to develop and validate multi-frequency and multi-platform SAR SSM products. In 2019 and 2020, to support insights on the interoperability between C- and L-band SAR observations for SSM retrieval, Sentinel-1 and ALOS-2 systematic acquisitions over the TERENO (Terrestrial Environmental Observatories) Selhausen (Germany) and Apulian Tavoliere (Italy) cal/val sites were gathered. Both sites are well documented and equipped with hydrologic networks.

The objective of this study is to investigate the integration of multi-frequency SAR measurements for a consistent and harmonized SSM retrieval throughout the error characterization of a combined C- and L-band SSM product. To this scope, time series of Sentinel-1 IW and ALOS-2 FBD data acquired over the two sites will be analysed. The short time change detection (STCD) algorithm, developed, implemented and recently assessed on Sentinel-1 data [e.g., Balenzano et al., 2020; Mattia et al., 2020], will be tailored to the ALOS-2 data. Then, the time series of SAR SSM maps from each SAR system will be derived separately and aggregated in an interleaved SSM product. Furthermore, it will be compared against in situ SSM data systematically acquired by the ground stations deployed at both sites. The study will assess the interleaved SSM product and evaluate the homogeneous quality of C- and L-band SAR SSM maps.

 

 

References

Balenzano. A., et al., “Sentinel-1 soil moisture at 1km resolution: a validation study”, submitted to Remote Sensing of Environment (2020).

Mattia, F., A. Balenzano, G. Satalino, F. Lovergine, A. Loew, et al., “ESA SEOM Land project on Exploitation of Sentinel-1 for Surface Soil Moisture Retrieval at High Resolution,” final report, contract number 4000118762/16/I-NB, 2020.

How to cite: Balenzano, A., Satalino, G., Lovergine, F., Palmisano, D., Mattia, F., Rinaldi, M., and Montzka, C.: Combining Sentinel-1 and ALOS-2 observations for soil moisture retrieval, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8660, https://doi.org/10.5194/egusphere-egu21-8660, 2021.

With the upcoming L-band Synthetic Aperture Radar (SAR) satellite mission Radar Observing System for Europe at L-band (ROSE-L) and its combination with existing C-band satellite missions such as Sentinel-1, multi-frequency SAR observations with high temporal and spatial resolution will become available. To investigate the potential for estimating soil and plant parameters, the SARSense campaign was conducted between June and August 2019 at the agricultural test site Selhausen in Germany. In this regard, we introduce a new publicly available, extensive SAR dataset and present a first analysis of C- and L-band co- and cross-polarized backscattering signals regarding their sensitivity to soil and plant parameters. The analysis includes C- and L-band airborne recordings as well as Senitnel-1 and ALOS-2 acquisitions, accompanied by in-situ soil moisture measurements and plant samplings. In addition, soil moisture was measured using cosmic-ray neutron sensing as well as unmanned aerial system (UAS) based multispectral and temperature measurements were taken during the campaign period. First analysis of the dataset revealed, that due to misalignments of corner reflectors during the SAR acquisition, temporal consistency of airborne SAR data is not given. In this regard, a scene-based, spatial analysis of backscatter behaviour from airborne SAR data was conducted, while the spaceborne SAR data enabled the analysis of temporal changes in backscatter behaviour. Focusing on root crops with radial canopy structure (sugar beet and potato) and cereal crops with elongated canopy structure (wheat, barley), the lowest correlations can be observed between backscattering signal and soil moisture, with R² values ranging below 0.35 at C-band and below 0.36 at L-band. Higher correlations can be observed focusing on vegetation water content, with R² values ranging between 0.12 and 0.64 at C-band and 0.06 and 0.64 at L-band. Regarding plant height, at C-band higher correlations with R² up to 0.55 can be seen compared to R² up to 0.36 at L-band. Looking at the individual agricultural corps in more detail, in almost all cases, the backscatter signals of C- and L-band contain a different amount of information about the soil and plant parameters, indicating that a multi-frequency approach is envisaged to disentangle soil and plant contributions to the signal and to identify specific scattering mechanisms related to the crop type, especially related to the different characteristics of root crops and cereals.

How to cite: Mengen, D. and the SARSense Campaign Team: The SARSense campaign: A dataset for comparing C- and L-band SAR backscattering behaviour to changes of soil and plant parameters in agricultural areas, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1351, https://doi.org/10.5194/egusphere-egu21-1351, 2021.

Emission and backscattering at different frequencies have varied responses to soil physical processes (e.g., moisture redistribution, freeze-thaw) and vegetation growing/senescencing. Combing the use of active and passive microwave multi-frequency signals may provide complementary information, which can be used to better retrieve soil moisture, and vegetation biomass and water content for ecological applications. To this purpose, a Community Land Active Passive Microwave Radiative Transfer Modelling Platform (CLAP) was adopted in this study to simulate both emission (TB) and backscatter (σ0), in which the CLAP is backboned by the TorVergata model for modelling vegetation scattering, and an air-to-soil transition model (ATS) (accounting for surface dielectric roughness) integrated with the Advanced Integral Equation Model (AIEM) for modelling soil surface scattering. The accuracy of CLAP was assessed by both ground-based and spaceborne measurements, and the former was from the deployed microwave radiometer/scatterometer observatory at Maqu site on an alpine meadow over the Tibetan plateau. Specifically, for the passive case, simulated TB (emissivity multiplied by effective temperature) were compared to the ground-based ELBARA-III L-band observations, as well as C-band Advanced Microwave Scanning Radiometer 2 (AMSR2) and L-band Soil Moisture Active Passive (SMAP) observations. For the active case, simulated σ0 were compared to the ground-based scatterometer C- and L-bands observations, and C-band Sentinel and L-band Phased Array type L-band Synthetic Aperture Radar 2 (PALSAR-2) observations. This study is expected to contribute to improving the soil moisture retrieval accuracy for dedicated microwave sensor configurations.

How to cite: Zhao, H., Zeng, Y., Su, B., and Hofste, J.: Modelling of Microwave Multi-Frequency Emission and Backscatter by a Community Land Active Passive Microwave Radiative Transfer Modelling Platform (CLAP), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6659, https://doi.org/10.5194/egusphere-egu21-6659, 2021.

EGU21-16539 | vPICO presentations | ESSI2.10

MAFIS-Multiple Actors Forest Information System: EO+AI+ODC for scalable forest monitoring services

Marcello Maranesi, Matteo Picchiani, Chiara Clementini, Fabio Salbitano, Marco Marchetti, Fabio Del Frate, Gherardo Chirici, Jaro Hofierka, Remo Bertani, Pietro Maroè, Julia Kubanek, and Stefano Ferretti

The Mission of GMATICS is to offer systematic monitoring services based on Earth Observation data, Artificial Intelligence techniques and Open Data Cube architectures.

After the development of two initial services, GMATICS is now focusing on forest monitoring through the ESA funded project MAFIS-Multiple Actors Forest Information System.

What is MAFIS? MAFIS performs a systematic monitoring of forests in natural environments as well as of forests and green areas in urban environments.

What is MAFIS composed of? For the natural environment we use time-series of multi-mission satellite data (Multispectral, SAR, Hyperspectral, and VHR) and in-situ surveys while for forests and green areas in urban environments we also use other geo-spatial data from aerial orthophotos, LIDAR sensing, drone surveys and specialized in situ measurements. All kind of data are organized within an Open Data Cube architecture and are processed and integrated by using various AI techniques. We also use a forest growth model, exploiting extensive meteorological data, and we make MAFIS service accessible through a Web-GIS platform, enabling customer access from desk-top and mobile devices.

What are the MAFIS outputs? A set of information layers suitable for different potential users: main tree species classification, identification of forest clear-cuts and selective cuttings, detection of disturbances due to forest fires, diseases or windstorms, estimation of Above Ground Biomass (AGB) gain and losses, detailed urban and peri-urban green area assessment for planning purpose, estimation and spatial assessment of various ecosystems services (carbon sequestration, pollutant removal, thermal comfort, pollen risks, etc.), monitoring of tree status for maintenance actions identification and prioritization.

Who are MAFIS potential users? Ministries of agriculture and environment, Local Administrations, wood-chain industry, municipalities, architects and urban planners, tree care and nursery companies, multiutility companies, International Organizations, Universities and Research Centres (forests, ecology, architecture).

How to cite: Maranesi, M., Picchiani, M., Clementini, C., Salbitano, F., Marchetti, M., Del Frate, F., Chirici, G., Hofierka, J., Bertani, R., Maroè, P., Kubanek, J., and Ferretti, S.: MAFIS-Multiple Actors Forest Information System: EO+AI+ODC for scalable forest monitoring services, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16539, https://doi.org/10.5194/egusphere-egu21-16539, 2021.

EGU21-4452 | vPICO presentations | ESSI2.10

Flood monitoring in remote areas: integration of multi-frequency SAR data

Alberto Refice, Annarita D'Addabbo, Marco Chini, and Marina Zingaro

The monitoring of inundation phenomena through synthetic aperture radar (SAR) data on vegetated areas can be improved through an integrated analysis of different spectral bands. The combination of data with different penetration depths beneath the vegetated canopy can help determine the response of flooded areas with distinct types of vegetation cover to the microwave signal. This is useful especially in cases, which actually constitute the majority, where ground data are scarce or not available.

The present study concerns the application of multi-temporal, multi-frequency, and multi-polarization SAR images, specifically data from the Sentinel-1 and PALSAR 2 SAR sensors, operating in C band, VV polarization, and L band, HH and HV polarizations, respectively, in synergy with globally-available land cover data, for improving flood mapping in densely vegetated areas, such as the Zambezi-Shire basin, Mozambique [1], characterized by wetlands, open and closed forest, cropland, grassland (herbaceous and shrubs), and a few urban areas.

We show how the combination of various data processing techniques and the simultaneous availability of data with different frequencies and polarizations can help to monitor floodwater evolution over various land cover classes. They also enable detection of different scattering mechanisms, such as double bounce interaction of vegetation stems and trunks with underlying floodwater, giving precious information about the distribution of flooded areas among the different ground cover types present on the site.

This kind of studies are expected to assume increasing importance as the availability of multi-frequency data from SAR satellite constellations will increase in the future, thanks to initiatives such as the EU Copernicus program L-band satellite mission ROSE-L [2], and their tight integration with Sentinel-1 as well as with other national constellations such as ALOS 2, or SAOCOM.

References

[1] Refice, A.; Zingaro, M.; D’Addabbo, A.; Chini, M. Integrating C- and L-Band SAR Imagery for Detailed Flood Monitoring of Remote Vegetated Areas. Water 2020, 12, 2745, doi:10.3390/w12102745.

[2] Pierdicca, N.; Davidson, M.; Chini, M.; Dierking, W.; Djavidnia, S.; Haarpaintner, J.; Hajduch, G.; Laurin, G.V.; Lavalle, M.; López-Martínez, C.; et al. The Copernicus L-band SAR mission ROSE-L (Radar Observing System for Europe). In Active and Passive Microwave Remote Sensing for Environmental Monitoring III; SPIE: Washington, DC, USA, 2019; Volume 11154, p. 13.

How to cite: Refice, A., D'Addabbo, A., Chini, M., and Zingaro, M.: Flood monitoring in remote areas: integration of multi-frequency SAR data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4452, https://doi.org/10.5194/egusphere-egu21-4452, 2021.

EGU21-7831 | vPICO presentations | ESSI2.10

  Detecting  rice inundation status for water saving and methane emission mitigation measures using Sentinel-1 & ALOS-2/PALSAR-2 Data

Hironori Arai, Thuy Le Toan, Wataru Takeuchi, Kei Oyoshi, Hoa Phan, Lam Dao Nguyen, Tamon Fumoto, and Kazuyuki Inubushi

Approximately 90% of the world total paddies area and annual output of the rice production are concentrated in monsoon Asia, which has no more land/water resources for further expansion of cultivation. Most rice grows under lowland conditions where currently facing to the fresh water scarcity due to sea-water intrusion accelerated by sea-level rise and land-subsidence, and decelerating freshwater supply by upstream-dam construction. Since the rice production also requires large amount of water (3,000-5,000 L kg-1 rice), water-saving irrigation practice (e.g., Alternate Wetting and Drying, a.k.a., AWD) is desirable to be implemented in this region to save the water-demand sustainably, and irrigation status need to be evaluated for the decision making on sustainable food security. In addition to the significance of AWD’s role as an adaptation to drought risks, AWD also has a potential to act as an important mitigation-measure by reducing methane emission from paddy soils. This function is very important since rice cropping is responsible for approximately 11% of global anthropogenic CH4 emissions, and rice has the highest greenhouse gas intensity among the main food crops.

In order to implement AWD in Asian rice paddies as a mitigation-measure based on a carbon pricing scheme, it is important to evaluate the spatial distribution of AWD paddy fields in the target region. For the detection of AWD-fields versus continuously-flooding fields, it is essential to develop method using EO data to detect soil inundation under rice plants at various growth stages.  In this study, ALOS-2/PALSAR-2 and Sentinel-1 data were used to combine the penetration capacity of L-band SAR data with C-band data capacity to monitor rice growth status with their high temporal resolution.

The study was conducted in triple rice cropping systems in the Vietnamese Mekong delta (5 sites: Thot Not in Can Tho city; Chau Thanh, Cho Moi, Thoai Son and Tri Ton in An Giang Province, where AWD field campaign was conducted from 2012 to 2017. EO data consisted of ALOS-2/PALSAR-2 every 14 days in 2017/2018 in An Giang province at high resolution observation mode (3-6m resolution) and ScanSAR observation mode (25-100m resolution) every 42 days over the Mekong delta.

As the result of the classification using the dual-polarization ALOS-2/PALSAR-2 data, soil inundation status could be detected during various rice growth stages. To evaluate rice productivity and GHG emissions from rice fields, we developed a simulation system based on the DeNitrification-Decomposition (DNDC) model which can assimilate PALSAR-2 inundation map and ground observed GHG -flux and rice growth status data on a pixel basis. For spatial extension, rice map, together with rice calendar (sowing date, rice growth status), required as inputs by DNDC are provided by the GeoRice project, based on the use of Sentinel-1 6-day time series. This paper presents the performance of multi-sensor data fusion to realize sustainable agricultural management by mitigating the GHGs emission while maintaining or improving regional fresh water use efficiency for stable food production under climate change pressure.

How to cite: Arai, H., Le Toan, T., Takeuchi, W., Oyoshi, K., Phan, H., Nguyen, L. D., Fumoto, T., and Inubushi, K.:   Detecting  rice inundation status for water saving and methane emission mitigation measures using Sentinel-1 & ALOS-2/PALSAR-2 Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7831, https://doi.org/10.5194/egusphere-egu21-7831, 2021.

The split-spectrum method (SSM) can largely isolate and correct for the ionospheric contribution in the L-band interferometric synthetic aperture radar (InSAR). The standard SSM is performed on the assumption of only the first-order ionospheric dispersive effect, which is proportional to the total electron content (TEC). It is also known that during extreme atmospheric events, either originated from the ionosphere or in the troposphere, other dispersive effects do exist and potentially provide new insights into the dynamics of the atmosphere, but there have been few detection reports of such signals by InSAR. We apply L-band InSAR into heavy rain cases and examine the applicability and limitation of the standard SSM. Since no events such as earthquakes to cause surface deformation took place, the non-dispersive component is apparently attributable to the large amount of water vapor associated with heavy rain, whereas there are spotty anomalies in the dispersive component that are closely correlated with the heavy rain area. The ionosonde and Global Navigation Satellite System (GNSS) rate of total electron content index (ROTI) map both show little anomalies during the heavy rain, which suggests few ionospheric disturbances. Therefore, we interpret that the spotty anomalies in the dispersive component of the standard SSM during heavy rain are originated not in the ionosphere but the troposphere. While we can consider two physical mechanisms, one is runaway electron avalanche and the other is the scattering due to rain, comparison with the observations from the ground-based lightning detection network and rain gauge data, we conclude that the rain scattering interpretation is spatiotemporally favorable. We further propose a formulation to examine if another dispersive phase than the first-order TEC effect is present and apply it to the heavy rain cases as well as two extreme ionospheric sporadic-E events. Our formulation successfully isolates the presence of another dispersive phase during heavy rain that is in positive correlation with the local rain rate. Furthermore, our formulation is also able to detect the occurrence of higher-order ionospheric effects during Sporadic-E cases.

How to cite: Setiawan, N. and Furuya, M.: Tropospheric Dispersive Phase Anomalies during Heavy Rain Detected by L-band InSAR and Their Interpretation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13944, https://doi.org/10.5194/egusphere-egu21-13944, 2021.

This research is related to the JAXA 6th Research Announcement for the Advanced Land
Observing Satellite-2 (ALOS-2) project "Improved Sea Ice Parameter Estimation with L-Band SAR (ISIPELS)".
In the study ALOS-2/PALSAR-2 dual-polarized Horizontal-transmit-Horizontal-receive/
Horizontal-transmit-Vertical-receive (HH/HV) ScanSAR mode L-band  Synthetic Aperture Radar (SAR) imagery
over an Arctic study area were evaluated for their suitability for operational sea ice monitoring.
The SAR data consisting of about 140 HH/HV ScanSAR ALOS-2/PALSAR-2 images were acquired during the winter 2017.
These L-band SAR data were studied for estimation of different sea ice parameters:
sea ice concentration, sea ice thickness, sea ice type, sea ice drift. Also some comparisons with nearly
coincident C-band data over the same study area have been made. The results indicate that L-band
SAR data from ALOS-2/PALSAR-2 are very useful for estimating the studied sea ice parameters and equally good
or better than using the conventional operational dual-polarized C-band SAR satellite data.

 

How to cite: Karvonen, J.: ALOS-2/PALSAR-2 dual-polarized L-band data for sea ice parameter estimation and sea ice classification , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3920, https://doi.org/10.5194/egusphere-egu21-3920, 2021.

EGU21-2892 | vPICO presentations | ESSI2.10 | Highlight

High spatial and temporal resolution L- and C-band Synthetic Aperture Radar data analysis from the yearlong MOSAiC expedition

Malin Johansson, Suman Singha, Gunnar Spreen, Stephen Howell, Shin-ichi Sobue, and Malcolm Davidson

In the yearlong MOSAIC expedition (2019-2020) R/V Polarstern drifted with sea ice through the Arctic Ocean, with the goal to continually monitor changes in the coupled ocean-ice-atmosphere system throughout the seasons. A substantial amount of synthetic aperture radar (SAR) satellite images overlapping the campaign was collected. Here, we investigate the change in polarimetric features over sea ice from the freeze up to the advanced melt season using fully polarimetric L-band images from the ALOS-2 PALSAR-2 and fully polarimetric C-band images from the RADARSAT-2 satellite SAR sensors.

Three different sea ice types are investigated, young ice, level first year ice and deformed first and second-year ice. Areas of deformed and level sea ice were observed in the vicinity of R/V Polarstern and these areas are included whenever possible in the yearlong time series.

Comparing the different sea ice types, we observe that during the freezing season there is a larger difference in the co-polarization channels between smooth and deformed ice in L-band compared to C-band. Similar to earlier findings we observe larger differences between young ice and deformed ice backscatter values in the L-band data compared to the C-band data. Moreover, throughout the year the HV-backscatter values show larger differences between level and deformed sea ice in L-band than C-band. The L-band data variability is significantly smaller for the level sea ice compared to the deformed sea ice, and this variability was also smaller than that observed for the overlapping C-band data. Thus L-band data could be more suitable to reliable separate deformed from level sea ice areas.   

Within the L-band images a noticeable shift towards higher backscatter values in early melt season compared to the freezing season for all polarimetric channels is observed, though no such strong trend is found in the C-band data. The change in backscatter values is first noticeable in the C-band images and later followed by a change in the L-band images, probably caused by their different penetration depth and volume scattering sensitivities. This change also results in a smaller backscatter variability.

The polarization difference (PD; VV-HH on a linear scale) show a seasonal dependency for the smooth and deformed sea ice within the L-band data, whereas for the C-band data no such trend is observed. For the L-band data were the PD variability for all ice classes reasonably small for the freezing season, with a significant shift towards larger variability during the early melt season, though during this time period the mean PD values remained similar. However, once the temperatures reached above 0°C both the variability and the mean values increased significantly.

Overall, our results demonstrate that the C- and L-band data are complementary to one another and that through their slightly different dependencies on season and sea ice types, a combination of the two frequencies can aid improved sea ice classification. The availability of a high spatial and temporal resolution dataset combined with in-situ information ensures that seasonal changes can be fully explored.

How to cite: Johansson, M., Singha, S., Spreen, G., Howell, S., Sobue, S., and Davidson, M.: High spatial and temporal resolution L- and C-band Synthetic Aperture Radar data analysis from the yearlong MOSAiC expedition, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2892, https://doi.org/10.5194/egusphere-egu21-2892, 2021.

Synthetic aperture radar (SAR) has become an essential component in ocean remote sensing due it’s high sensitivity to sea surface dynamics and its high spatial resolution. The ALOS-2 SAR data are underutilized for ocean surface wind and current retrieval. Although the primary goals of the ALOS-2 mission are focused on land applications, the extension of the satellite scenes over the coastal areas offers an opportunity for ocean applications. The underutilization of ALOS-2 data is mainly due to the fact that at low radar frequencies, e.g. L-band, the sensitivity of the radar scattering coefficient to wind speed and the sensitivity of the Doppler frequency shift to sea surface velocity is lower than at higher frequencies, e.g. C- and X-band. This is also due to the fact that most of ALOS-2 images are acquired in HH or HV polarization while the VV polarization is often preferred in ocean applications due the higher signal to noise ratio.

The wind speed is retrieved from Sentinel-1 and ALOS-2 using the existing empirical C- and L-band geophysical model functions. For Sentinel-1, the Doppler frequency shift provided in the OCN product is used. For ALOS-2, the Doppler frequency shift is estimated from the single look complex data using the pulse-pair processing method. The estimated Doppler shift converted to the surface radial velocity and the velocity is calibrated using land as a reference. The estimated L-band Doppler shift and surface velocity is compared to the C-band Doppler shift provided in the Sentinel-1 OCN product. Due the difference in the local time of ascending node (about 6 hours at the equator) of the two satellites, a direct pixel-by-pixel comparison is not possible, i.e. the wind and surface current can not be assumed to be constant during such a large time difference. Thus, the retrieved wind from each sensor is compared separately to model data and in-situ observations.

In this paper, the quality of the wind speed retrieved from the L-band SAR (ALOS-2) in coastal areas is assessed and compared to the C-band SAR (Sentinel-1). In addition, the feasibility of the surface current retrieval from the L-band Doppler frequency shift is investigated and also compared to Sentinel-1. Examples will be shown and discussed. This opens an opportunity for synergy between L-band and C-band SAR missions to increase the spatial and temporal coverage, which is one of the main limitations of SAR application in ocean remote sensing.

How to cite: Elyouncha, A. and Eriksson, L. E. B.: Assessment of the sea surface wind and current retrieval from ALOS-2 and Sentinel-1 SAR data over coastal areas, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7490, https://doi.org/10.5194/egusphere-egu21-7490, 2021.

EGU21-3916 | vPICO presentations | ESSI2.10 | Highlight

Use of L- and C-Band SAR Satellites for Sea Ice and Iceberg Monitoring (LC-ICE)

Wolfgang Dierking and Malcolm Davidson

In support of ESA's Mission Advisory Group for ROSE-L (Radar Observing System for Europe at L-band), a project team consisting of members of operational ice services and the International Ice Charting Working Group,  the International Ice Patrol, and groups from universities and research institutes is investigating the benefits of using data from L-band SAR in addition to C-band SAR imagery for separating different sea ice classes and detecting icebergs. The tasks are: (1) a critical assessment of the current state-of-the-art in sea ice monitoring and iceberg detection, (2) matching C- and L-band SAR images acquired with temporal gaps of several hours, (3) tests and assessments of the practical use of L-band images in the operational mapping services, and (4) comparison of classification accuracies that can be achieved at C-band, L-band, and a combination of both, based on the results of automated segmentation and classification algorithms. Based on the suggestions of operational ice centers, data have been collected since April 2019 over six test sites for the Northern Hemisphere: Fram Strait, Belgica Bank, northern and southern parts of Greenland, Baffin Bay and Labrador Sea. The SAR images are acquired by Sentinel-1 at Extra Wide and Interferomeric Wide Swath modes, by RADARSAT-2 at ScanSAR mode, and by ALOS-2 PALSAR-2 at Wide Beam and Fine Beam modes. The PALSAR-2 data are provided through the 2019 to 2022 mutual cooperation project between ESA and JAXA on using SAR data in earth sciences and applications. The presentation - with contributions from project partners - will focus on the conclusions from the literature review, assessments of operational ice services regarding the gain they find in using L-band SAR images supplementary to routinely analyzed C-band imagery, and preliminary results of automated classification. 

How to cite: Dierking, W. and Davidson, M.: Use of L- and C-Band SAR Satellites for Sea Ice and Iceberg Monitoring (LC-ICE), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3916, https://doi.org/10.5194/egusphere-egu21-3916, 2021.

EGU21-11066 | vPICO presentations | ESSI2.10

Snow Water Equivalent retrieval using L- & C-band InSAR.

Jorge Jorge Ruiz, Juha Lemmetyinen, Anna Kontu, and Jouni Pulliainen

Interferometric Synthetic Aperture Radar (InSAR) imagery is a promising technique for retrieving Snow Water Equivalent (SWE). It exploits the relation of the interferometric phase to the amount and density of the snow in the radar signal path, leading to a quasi-linear relation with SWE (Guneriussen et al., 2001; Leinss et al., 2015). Here, we analyze timeseries of Sentinel-1 and ALOS-2 interferometric image pairs, collected over a test site in Sodankylä, Northern Finland, during the winter of 2019-2020. The satellite imagery is complemented by tower-based SAR observations using SodSAR (Sodankylä SAR) a 1-10GHz fully polarimetric SAR instrument. Typical satellite visit times (7 and 14 days) are compared with the 12-hour temporal resolution provided by SodSAR. Interferometric pairs from the three sensors are generated, and the interferograms are used to estimate the increase in SWE between the image acquisitions. Retrieved SWE is compared with measurements of an in-situ SWE scale, as well as manual ground observations made in the area. Coherence conservation and its relation with various meteorological events are also analyzed.

How to cite: Jorge Ruiz, J., Lemmetyinen, J., Kontu, A., and Pulliainen, J.: Snow Water Equivalent retrieval using L- & C-band InSAR., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11066, https://doi.org/10.5194/egusphere-egu21-11066, 2021.

Millions of lakes and ponds occupy large areas of the Arctic discontinuous and continuous permafrost zones. During most of the year, the surfaces of these lakes remain covered by a thick layer of ice. Synthetic Aperture Radar (SAR) data have shown to be useful for studying the ice on Arctic lakes, especially for monitoring lake ice phenology and the grounding state of the ice (ice frozen to the lakebed versus floating lake ice). Significant backscatter is often observed from the floating ice regime in C-band due to scattering on a rough ice-water interface.

Recent research has revealed features of anomalously low backscatter in Sentinel-1 C-band SAR imagery on some of the West Siberian lakes that likely belong to the floating ice regime. These anomalies are characterized by prominent shapes and sizes and seem to expand throughout late winter and/or spring. It is currently assumed that some of these features are related to strong emissions of natural gas (methane from hydrocarbon reservoirs), making it important to assess their origin in detail and understand the associated mechanisms. However, in-situ data are still missing.

Here, we assess the potential of the combined use of C-band Sentinel-1 (freely available) and L-band ALOS PALSAR-2 data  (available through JAXA PI agreement #3068002) to study the backscatter anomalies. We highlight the differences between observed backscatter from the two sensors with respect to different surface types (ground-fast lake ice, floating lake ice and anomalies) and investigate backscatter differences between frozen and melting conditions. Further, polarimetric classification is performed on L-band PALSAR-2 imagery, which reveals differences in scattering mechanisms between anomalies and floating lake ice.

How to cite: Pointner, G. and Bartsch, A.: The potential of using Sentinel-1 and ALOS PALSAR-2 data for characterizing West Siberian lake ice backscatter anomalies, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11931, https://doi.org/10.5194/egusphere-egu21-11931, 2021.

EGU21-12736 | vPICO presentations | ESSI2.10

Comparison of L- and C-Band SAR data in the Saar Mining District, Germany

Andre C. Kalia, Volker Spreckels, and Thomas Lege

The interferometric utilization of Synthetic Aperture Radar data from L-band and C-band has an important role for the monitoring of land surface deformations like former evaluations have proven [1]. Meanwhile several multi-sensor ground-stations are available, equipped with bi-directional artificial corner-reflectors (CR) and permanent GNSS stations, attached to fine leveling baselines. The long wavelength of L-band SAR missions like ALOS-2 (λ = 22.9 cm) provides highly coherent interferograms, but here large-sized CR are required e.g. for absolute motion calibration. SAR missions with shorter wavelengths, like the C-band onboard the Sentinel-1 mission (λ = 5.6 cm) provide, in general, less coherent interferograms, but a smaller CR size is sufficient. In order to assess the capabilities of L- and C-band SAR data the impulse response function will be calculated at corner-reflector sites and the coherence will be estimated in rural areas of the Saar test site.

The test site is located in the Saar-Lorraine coal basin at the French-German border, a nowadays post-mining district with highly urbanized settlements as well as large stretches of forested and rural areas. The area is characterized by century long active deep mining – mainly for hard coal – including extensive groundwater management measures. Here, the active coal mining started in the 18th century and ended in 2006 (Lorraine) and 2012 (Saar) [2]. Meanwhile some of the underground mines got progressively flooded. As a consequence surface uplift occurred and is expected to be ongoing in the near future [3]. For a 12 by 14 km area in the Saar district dense and highly accurate leveling campaigns have been performed bi-annually since 2013. Thus, besides good knowledge of subsurface geology and mining activities also precise in-situ measurements of the ground motion are available. The recent and ongoing surface deformations will be monitored using multiple methods including a network of CR at multi-sensor ground stations [4] and publicly accessible Persistent Scatterer Interferometry datasets from the Sentinel-1 based Ground Motion Service Germany [5].

In late 2020 first ALOS-2 acquisitions of the Saar area from the ESA-JAXA cooperation were made available to the authors. The ALOS-2 data are evaluated and placed in relation to Sentinel-1 acquisitions. Finally, an outlook on the possible complementary use of geodetic and C- and L-band data in the Saar district as well as for other mining areas in Germany is given.

[1] Wegmueller et al. 2005: Monitoring of mining induced surface deformation using L-band SAR interferometry. IGARSS 2005; DOI: 10.1109/IGARSS.2005.1526447

[2] Corbel et al. 2017: Coal mine flooding in the Lorraine-Saar basin: experience from the French mines. IMWA 2017. https://www.imwa.info/docs/imwa_2017/IMWA2017_Corbel_161.pdf

[3] Heitfeld-Schetelig 2016: Gutachten zu den Bodenbewegungen im Rahmen des stufenweisen Grubenwasseranstiegs in den Wasserprovinzen Reden und Duhamel. http://www.bid.rag.de/bid/PDFs/SA//GWA_Reden_Duhamel/3_IHS_Bodenbewegungen/IHS_Saar_Gelaendehebungen_WH_Reden_Duhamel_2016_04_20.pdf

[4] Spreckels et al. 2020: GNSS, Nivellement und Radar – einheitliche Multisensor-Standorte als Referenzpunkte zur Überwachung von Bodenbewegungen. Geomonitoring 2020. DOI: 10.15488/9351

[5] BGR, 2021: https://bodenbewegungsdienst.bgr.de

How to cite: Kalia, A. C., Spreckels, V., and Lege, T.: Comparison of L- and C-Band SAR data in the Saar Mining District, Germany, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12736, https://doi.org/10.5194/egusphere-egu21-12736, 2021.

ESSI3.3 – The evolving Open and FAIR ecosystem for Solid Earth and Environmental sciences: challenges, opportunities, and other adventures

The major societal challenges—ensuring a sustainable planet and ecosystems, with food, energy, water, health, and quality of life provided equitably—depend on convergent science grounded in the Earth and space sciences and broadly open, shared, and trusted (e.g., FAIR) data. Such data already provide enormous benefits (e.g, weather prediction; hazards avoidance and mitigation; precision navigation). In addition to being needed for these solutions, the integrity and trust in science and thus the solutions follows directly from open FAIR data. But many barriers hinder widespread practices and adoption. A number of concerned stakeholders are working on the technology and practices needed for FAIR workflows, and thanks to these efforts, the technical pieces for solutions are mostly in place. But a larger coordinated effort is needed around in particular (i) supporting the infrastructure needed globally, and (ii) developing the research culture and practices needed for universal FAIR data. The first challenge includes recognizing that science is now international and thus international FAIR data culture is essential. This requires greater urgent attention by the larger science stakeholders: societies, universities and research institutions, funders, and governments.

How to cite: Hanson, B.: The Imperative of Open, Shared, Trusted (FAIR) Scientific Data: Accelerating for the Future, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13323, https://doi.org/10.5194/egusphere-egu21-13323, 2021.

EGU21-8052 | vPICO presentations | ESSI3.3 | Highlight

Advancing the FAIRness and Openness of Earth system science in Europe

Andreas Petzold, Ari Asmi, Katrin Seemeyer, Angeliki Adamaki, Alex Vermeulen, Daniele Bailo, Keith Jeffery, Helen Glaves, Zhiming Zhao, Markus Stocker, and Margareta Hellström

Focused environmental research projects and continuously operating research infrastructures (RIs) designed for monitoring all subdomains of the Earth system contribute to global observing systems and serve as crucial information sources for environmental scientists in their quest for understanding and interpreting the complex Earth System and contribute to global observing systems. The EU funded ENVRI-FAIR project [1] builds on the Environmental Research Infrastructure (ENVRI) community that includes principal European producers and providers of environmental research data and services.

ENVRI-FAIR targets the development and implementation of both technical frameworks and policy solutions that make subdomain boundaries irrelevant for environmental scientists and prepare Earth system science for the new Open Science paradigm. Cross-discipline harmonization and standardization activities, together with the implementation of joint data management and access structures at the RI level, facilitate the strategic coordination of observation systems required for truly interdisciplinary science. ENVRI-FAIR will ultimately create the open access ENVRI-Hub delivering environmental data and services provided by the contributing environmental RIs.

The architecture and functionalities of the ENVRI-Hub are driven by the applications, use cases and user needs, and will be based on three main pillars: (1) the ENVRI Knowledge Base as the human interface to the ENVRI ecosystem; (2) the ENVRI Catalogue as the machine-actionable interface to the ENVRI ecosystem; and (3) subdomain and cross-domain use cases as demonstrators for the capabilities of service provision among ENVRIs and across Science Clusters. The architecture is designed in anticipation of interoperation with the European Open Science Cloud (EOSC) and is intended to act as a key platform for users and developers planning to include ENVRI services in their workflows.

The ENVRI community objectives of sharing FAIRness experience, technologies and training as well as research products and services will be realized by means of the ENVRI-Hub. The architecture, design features, technology developments and associated policies will highlight this example of how ENVRI-FAIR is promoting FAIRness, openness and multidisciplinarity of an entire scientific area by joint developments and implementation efforts.

Acknowledgment: ENVRI-FAIR has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824068.

[1] Petzold, A., Asmi, A., Vermeulen, A., Pappalardo, G., Bailo, D., Schaap, D., Glaves, H. M., Bundke, U., and Zhao, Z.: ENVRI-FAIR - Interoperable environmental FAIR data and services for society, innovation and research, 15th IEEE International Conference on eScience 2019, 1-4, doi: http://doi.org/10.1109/eScience.2019.00038, 2019.

How to cite: Petzold, A., Asmi, A., Seemeyer, K., Adamaki, A., Vermeulen, A., Bailo, D., Jeffery, K., Glaves, H., Zhao, Z., Stocker, M., and Hellström, M.: Advancing the FAIRness and Openness of Earth system science in Europe, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8052, https://doi.org/10.5194/egusphere-egu21-8052, 2021.

EGU21-4737 | vPICO presentations | ESSI3.3

IPCC Data Distribution Centre: FAIR data from Climate Research to Mitigation Policy

Martin Juckes, Martina Stockhause, Robert S Chen, and Xiaoshi Xing

The Data Distribution Centre (DDC) of the Intergovernmental Panel on Climate Change provides a range of services to support the IPCC Assessment Process. The role of the DDC has evolved considerably since it was established in 1997, responding to the expanding range and complexity of the data products involved in the IPCC assessment process. The role of the IPCC assessments has also evolved from considering whether anthropomorphic climate change might have unwelcome consequences and how those consequences would vary under different socio-economic scenarios to reporting on the likely outcome of different global policy options.

The DDC works both with datasets which underpin the key conclusions from the assessment and, increasingly, with data products generated by the scientists engaged in the assessment.

Applying FAIR data principles to data products being produced in the highly constrained context of the assessment process brings many challenges. Working with the Technical Support Units of the IPCC Working Groups and the IPCC Task Group, the IPCC DDC has helped to create a process that not only captures information needed to document data products but supports the consistent and clear description of figures and tables within the report.

How to cite: Juckes, M., Stockhause, M., Chen, R. S., and Xing, X.: IPCC Data Distribution Centre: FAIR data from Climate Research to Mitigation Policy, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4737, https://doi.org/10.5194/egusphere-egu21-4737, 2021.

EGU21-12560 | vPICO presentations | ESSI3.3

Applying FAIRness evaluation approaches to (meta)data preserved at the World Data Center for Climate (WDCC): results, lessons learned, recommendations

Karsten Peters-von Gehlen, Andrej Fast, Daniel Heydebreck, Heinke Höck, Amandine Kaiser, Andrea Lammert, and Hannes Thiemann

The perceived community demand for research data repositories to provide services ensuring that stored data comply with the FAIR principles requires transparent evaluation of such services. In previous work, the long term archiving service WDCC1 (World Data Centre for Climate) at DKRZ (German Climate Computing Center, Hamburg) underwent an even-handed self-assessment along the published FAIR principles and the results are published on the DKRZ homepage2.

Here, we present results of an overhaul of the previous WDCC FAIRness-assessment by subjecting datasets archived in WDCC to a number of now available objective FAIR assessment approaches which are available as questionnaires or fully-automated web applications3,4,5. In these approaches, FAIRness is assessed using so-called metrics or maturity indicators. While the terminology is more a choice of the test provider - e.g. the term ‘metric’ may be off-putting for some - both give quantitative results. First tests show that (meta)data archived in WDCC seem to attain a higher level of FAIRness when evaluated using questionnaires compared to the results obtained from fully-automated applications. Further work is needed to substantiate this finding.

We learn that while neither one of the two evaluation approaches is ideal, they both show merit. Questionnaires – answered by knowledgeable repository staff – capture domain- and repository-specific aspects of FAIRness, like the use of controlled vocabularies in the datasets, granularity of archived datasets, reuse documentation or clear assessment of local data access protocols. However, the human-performed evaluation does not capture machine-actionability in terms of FAIR. This aspect is – naturally – very well assessed by automatic evaluation approaches, but the results strongly depend on the way the tests for FAIR metrics/maturity indicators are implemented. However, automatic tests often only assess metadata FAIRness, lack domain-specific

FAIRness indicators or yield failed tests if a repositories’ technical properties, e.g. the specification of authentication procedures for data access, are not compatible with what an automatic procedure is built to test for.

Therefore, since WDCC has an over 30 year long history of preserving climate-science related data with a focus on reusability by the community (and beyond), FAIRness evaluations based on human-actionable questionnaires show a high degree of FAIRness. We further learn that there is an urgent need for specifically-designed automatic FAIR testing approaches taking into account domain-specific data standards and structures. Especially the availability of atmospheric and climate science related FAIR metrics/maturity indicators is very limited. We thus recommend compilations of the latter and we will aim at contributing to this effort.

In our contribution, we specifically showcase strong as well as weak aspects of the WDCC service in terms of FAIRness and report on our measures to increase domain-specific FAIRness of WDCC and present recommendations for establishing FAIR indicators for (meta)data common to the Earth System Science community. We will make the results of our assessment openly available on the WDCC homepage as well as produce a corresponding Open Access peer-reviewed publication.

 

References:

1https://cera-www.dkrz.de

2https://cera-www.dkrz.de/WDCC/ui/cerasearch/info?site=fairness

3https://www.rd-alliance.org/node/60731/outputs 

4https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/ 

5https://www.fairsfair.eu/f-uji-automated-fair-data-assessment-tool

How to cite: Peters-von Gehlen, K., Fast, A., Heydebreck, D., Höck, H., Kaiser, A., Lammert, A., and Thiemann, H.: Applying FAIRness evaluation approaches to (meta)data preserved at the World Data Center for Climate (WDCC): results, lessons learned, recommendations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12560, https://doi.org/10.5194/egusphere-egu21-12560, 2021.

EGU21-14438 | vPICO presentations | ESSI3.3

Initiating FAIR geothermal data in Indonesia

Dasapta Erwin Irawan

One of the main keys to scientific development is data availability. Not only the data is easily discovered and downloaded, there's also needs for the data to be easily reused. Geothermal researchers, research institutions and industries are the three main stakeholders to foster data sharing and data reuse. Very expensive deep well datasets as well as advanced logging datasets are very important not only for exploitation purposes but also for the community involved eg: for regional planning or common environmental analyses. In data sharing, we have four principles of F.A.I.R data. Principle 1 Findable: data uploaded to open repository with proper data documentations and data schema, Principle 2 Accessible: removed access restrictions such as user id and password for easy downloads. In case of data from commercial entities, embargoed data is permitted with a clear embargo duration and data request procedure, Principle 3 Interoperable: all data must be prepared in a manner for straightforward data exchange between platforms, Principle 4 Reusable: all data must be submitted using common conventional file format, preferably text-based file (eg `csv` or `txt`) therefore it can be analyzed using various software and hardware. The fact that geothermal industries are packed with for-profit motivations and capital intensive would give even more reasons to embrace data sharing. It would be a good way for them to share their role in supporting society. The contributions from multiple stakeholders are the most essential part in science development. In the context of the commercial industry, data sharing is a form of corporate social responsibility (CSR). It shouldn't be defined only as giving out funding to support local communities.

Keywords: open data, FAIR data, data sharing 

 

How to cite: Irawan, D. E.: Initiating FAIR geothermal data in Indonesia, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14438, https://doi.org/10.5194/egusphere-egu21-14438, 2021.

EGU21-3922 | vPICO presentations | ESSI3.3

NFDI4Earth

Hannes Thiemann, Peter Bräsicke, Markus Reichstein, Claus Weiland, Dominik Hezel, Miguel Mahecha, and Lars Bernard

NFDI4Earth (www.nfdi4earth.de) is proposed as the consortium of the German NFDI (National Research Data Infrastructure) to address the digital needs of researchers in Earth System Sciences (ESS). The NFDI4Earth consortium has been created in a bottom-up process and comprises currently 58 members from German universities, research institutions, infrastructure providers, public authorities and different research organizations.

The large number and diversity of observational, analytical, and model data sets in very high spatial, temporal and thematic resolution, confronts the ESS with a strongly increasing amount of data in great heterogeneity and of inherent complexity.  Earth system processes constantly change on various time scales and strongly influence each other. Describing and evaluating these processes urgently requires efficient workflows and extremely powerful data analytic frameworks like datacubes as well as appropriate levels of harmonizing related data services and their underlying standards. Research data are currently managed by an unstructured  plethora of services that are scattered, heterogeneous and often only project-based without a long-term perspective. A variety of measures and services become bundled under the umbrella of NFDI4Earth in a one-stop service framework. With a common approach to openness and FAIRness, they form a united, sustainable and coherent solution.

In addition to existing links between German and international partners in ESS, NFDI4Earth will establish itself as a single point of contact and the voice for German Earth system scientists in both existing and emerging networks and alliances. NFDI4Earth is for example already striving to establish linkages with federative e-infrastructures like the European Open Science Cloud (EOSC) at an early stage.

 

How to cite: Thiemann, H., Bräsicke, P., Reichstein, M., Weiland, C., Hezel, D., Mahecha, M., and Bernard, L.: NFDI4Earth, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3922, https://doi.org/10.5194/egusphere-egu21-3922, 2021.

EGU21-9401 | vPICO presentations | ESSI3.3

Harmonizing heterogeneous multi-proxy data from Arctic lake sediment records 

Gregor Pfalz, Bernhard Diekmann, Johann-Christoph Freytag, and Boris K. Biskaborn

Lake systems play a central role in broadening our knowledge about future trends in the Arctic, as their sediments store information on interactions between climate change, lake ontogeny, external abiotic sediment input, and biodiversity changes. In order to make reliable statements about future lake trajectories, we need sound multi-proxy data from different lakes across the Arctic. Various studies using data from repositories already showed the effectiveness of multi-proxy, multi-site investigations (e.g., Kaufman et al., 2020; PAGES 2k Consortium, 2017). However, there are still datasets from past coring expeditions to Arctic lake systems that are neither included in any of these repositories nor subject to any particular standard. When working with such data from heterogeneous sources, we face the challenge of dealing with data of different format, type, and structure. It is therefore necessary to transform such data into a uniform format to ensure semantic and syntactic comparability. In this talk, we present an interdisciplinary approach by transforming research data from different lake sediment cores into a coherent framework. Our approach adapts methods from the database field, such as developing entity-relationship (ER) diagrams, to understand the conceptual structure of the data independently of the source. Based on this knowledge, we developed a conceptual data model that allows scientists to integrate heterogeneous data into a common database. During the talk, we present further steps to prepare datasets for multi-site statistical investigation. To test our approach, we compiled and transformed a collection of published and unpublished paleolimnological data of Arctic lake systems into our proposed format. Additionally, we show our results from conducting a comparative analysis on a set of acquired data, hereby focusing on comparing total organic carbon and bromine content. We conclude that our harmonized dataset enables numerical inter-proxy and inter-lake comparison despite strong initial heterogeneity.

 

[1]   D. S. Kaufman et al., “A global database of Holocene paleotemperature records,” Sci. Data, vol. 7, no. 115, pp. 1–34, 2020.

[2]   PAGES 2k Consortium, “A global multiproxy database for temperature reconstructions of the Common Era,” Sci. Data, vol. 4, no. 170088, pp. 1–33, 2017.

How to cite: Pfalz, G., Diekmann, B., Freytag, J.-C., and Biskaborn, B. K.: Harmonizing heterogeneous multi-proxy data from Arctic lake sediment records , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9401, https://doi.org/10.5194/egusphere-egu21-9401, 2021.

Today, open data policies are better understood by scientists and writing a data management plan is part of every Natural Environment Research Council (NERC) project submission. But that means that scientists are expecting more and more from their data publication or data requests: they want interactive maps, they want more complex data systems, they want to query data and publish them rapidly.

At the UK Polar Data Centre (PDC, https://www.bas.ac.uk/data/uk-pdc/), the datasets are very diverse, reflecting the multidisciplinary nature of polar science. Geophysics datasets include bathymetry, aerogravity, aeromagnetics and airborne radar depth soundings. Encouraging reuse and increasing the value of data is at the core of PDC’s mission. Data published by the PDC are used in a large variety of scientific research projects internationally. For instance, the significant datasets from seabed multibeam coverage of the Southern Ocean enables the British Antarctic Survey to be a major contributor to multiple projects such as International Bathymetric Chart of the Southern Ocean (IBCSO) and Seabed 2030. The wide coverage of airborne radar echo sounding over Antarctica is crucial for the SCAR BEDMAP3 project which aims to produce new map of Antarctic ice thickness and bed topography for the international glaciology and geophysical community.

Over the last year, procedures to preserve, archive and distribute these data have been revised and updated to comply with the requirements of CoreTrustSeal. But we are still looking for new technologies, tools, open-source software that will help us bring interactivity to our datasets and reach the expectations of scientists.

How to cite: Fremand, A.: Managing Geophysics datasets: Challenges and perspectives from the UK Polar Data Centre, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2546, https://doi.org/10.5194/egusphere-egu21-2546, 2021.

EGU21-16054 | vPICO presentations | ESSI3.3

Advancing the Geosciences through Open Standards

Siri Jodha Khalsa

Data is the lifeblood of the geosciences. Furthermore, the acquisition, processing and interpretation of data all depend on established specifications describing the systems and procedures that were used in producing, describing and distributing that data. It can be said that technical standards underpin the entire scientific endeavour. This is becoming ever truer in the era of Big Data and Open, Transdisciplinary Science. It takes the dedicated efforts of many individuals to create a viable standard. This presentation will describe the experiences and status of standards development activities related to geoscience remote sensing technologies which are being carried out under the auspices of the IEEE Geoscience and Remote Sensing Society (GRSS).

A Standards Development Organization (SDO) exists to provide the environment, rules and governance necessary to facilitate the fair and equitable development of standards, and to assist in the distribution and maintenance of the resulting standards. The GRSS sponsors projects with the IEEE Standards Association (IEEE-SA), which, like other SDOs such as ISO and OGC, has well-defined policies and procedures that help ensure the openness and integrity of the standards development process. Each participant in a standards working group typically brings specific interests as a producer, consumer or regulator of a product, process or service. Creating an environment that makes it possible to find consensus among competing interests is a primary role of an SDO. I will share some of the insights gained from the six standards projects that the GRSS has initiated which involve hyperspectral imagers, the spectroscopy of soils, synthetic aperture radar, microwave radiometers, GNSS reflectometry, and radio frequency interference in protected geoscience bands.

How to cite: Khalsa, S. J.: Advancing the Geosciences through Open Standards, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16054, https://doi.org/10.5194/egusphere-egu21-16054, 2021.

EGU21-15783 | vPICO presentations | ESSI3.3

A Standards-based Data Catalogue integrating scientific, community-based and citizen science data across the Arctic

Torill Hamre, Finn Danielsen, Michael Køie Poulsen, and Frode Monsen

INTAROS is a Horizon 2020 research and innovation project developing an integrated Arctic Observation System by extending, improving, and unifying existing systems in the different regions of the Arctic. INTAROS integrates distributed repositories hosting data from ocean, atmosphere, cryosphere and land, including scientific, community-based monitoring (CBM) and citizen science (CS) data. Throughout the project, INTAROS has been working closely with several local communities and citizen science programs across the Arctic, to develop strategies and methods for ingestion of data into repositories enabling the communities to maintain and share data. A number of these CBM and CS data collections have been registered in the INTAROS Data Catalogue. Some of these collections are hosted and sustained by large international programs such as PISUNA, eBird, Secchi Disk Study and GLOBE Observer. Registration in the INTAROS Data Catalogue contributes to making these important data collections better known in a wider community of users with a vested interest in the Arctic. It also enables sharing of metadata through open standards for inclusion in other Arctic data systems. This catalogue is a key component in INTAROS, enabling users to search for data across the targeted spheres to assess their usefulness in applications and geographic areas. The catalogue is based on a world-leading system for data management, the Comprehensive Knowledge Archive Network (CKAN). With rich functionality offered out of the box combined with a flexible extension mechanism, CKAN allows for quickly setting up a fully functional data catalogue. The CKAN open-source community offers numerous extensions that can be used as-is or adapted to implement customised functionality for specific user communities. To hold additional metadata elements requested by the partners we modified the standard database schema of CKAN. The presentation will focus on the current capabilities and plans for sustaining and enhancing the INTAROS Data Catalogue.

How to cite: Hamre, T., Danielsen, F., Køie Poulsen, M., and Monsen, F.: A Standards-based Data Catalogue integrating scientific, community-based and citizen science data across the Arctic, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15783, https://doi.org/10.5194/egusphere-egu21-15783, 2021.

EGU21-14899 | vPICO presentations | ESSI3.3

Curating geosciences data in the Earth, Space and Environmental Sciences – new developments of GFZ Data Services

Florian Ott, Kirsten Elger, and Damian Ulbricht

GFZ Data Services is a domain repository for geosciences data that assigns digital object identifier (DOI) to data and scientific software since 2004. Hosted at the GFZ German Research Centre for Geosciences (GFZ), the repository has a focus on the curation of long-tail data on one hand, but also provides DOI minting services for several global monitoring networks/observatories in geodesy and geophysics (e.g. INTERMAGNET; IAG Services ICGEM, IGETS, IGS; GEOFON) and collaborative projects (TERENO, EnMAP, GRACE, CHAMP). Furthermore, as Allocating Agent for IGSN, the globally unique persistent identifier for physical samples, GFZ is providing IGSN minting services for physical samples.

GFZ Data Services increases the interoperability of long-tail data through (1) the provision of comprehensive domain-specific data description via standardised and machine-readable metadata with controlled domain vocabularies; (2) complementing the metadata with comprehensive and standardised technical data descriptions or reports; and (3) by embedding the research data in wider context by providing cross-references through Persistent Identifiers (DOI, IGSN, ORCID, Fundref) to related research products (text, data, software) and people or institutions involved.

Visibility of the data is established through registration of the metadata at DataCite and the dissemination of metadata in standard protocols. The DOI Landing Pages embed metadata in Schema.org to facilitate discovery through internet search engines like the Google Dataset Search. In addition, we feed links of data and related research products into Scholix, which allows to link data publications and scholarly literature, even when the data are published years after the article.

The new Website of GFZ Data Services has further developed from a searchable data portal (only) to an information point for data publications and data management. This includes information on metadata, data formats, the data publication workflow, FAQ, links to different versions of our metadata editor and downloadable data description templates. Specific data publication guidance is complemented by more general information on data management, like a data management roadmap for PhD students, and links to the data catalogue of GFZ Data Services, the IGSN catalogue of GFZ and RI@GFZ – the data and research infrastructure search portal of GFZ.

Since October 2020, GFZ is a DataCite member. This membership will enable and promote active participation in the current and future venues of technological and service-oriented developments related to the persistent identification of research outputs.

How to cite: Ott, F., Elger, K., and Ulbricht, D.: Curating geosciences data in the Earth, Space and Environmental Sciences – new developments of GFZ Data Services, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14899, https://doi.org/10.5194/egusphere-egu21-14899, 2021.

EGU21-13796 | vPICO presentations | ESSI3.3

Linking Domain Repositories to Build Cyberinfrastructure for Interdisciplinary Critical Zone Research

Jeffery S. Horsburgh, Kerstin Lehnert, and Jerad Bales

Critical Zone science studies the system of coupled chemical, biological, physical, and geological processes operating together across all scales to support life at the Earth's surface (Brantley et al., 2007). In 2020, the  U.S. National Science Foundation funded 10 Critical Zone Collaborative Network awards. These 5-year projects will collaboratively work to answer scientific questions relevant to understanding processes in the Critical Zone such as the effects of urbanization on Critical Zone processes; Critical Zone function in semi-arid landscapes and the role of dust in sustaining these ecosystems; processes in deep bedrock and their relationship to Critical Zone evolution; the recovery of the Critical Zone from disturbances such as fire and flooding; and changes in the coastal Critical Zone related to rising sea level. In order to support community data collection, access, and archival  for the Critical Zone Network community, the development of new cyberinfrastructure (CI) is now underway that leverages prior investments in domain-specific data repositories that are already operational and delivers data services to established communities. The goal is to create the infrastructure required for managing, curating, disseminating, and preserving data from the new network of Critical Zone Cluster projects, along with legacy datasets from the existing Critical Zone Observatory Network, including digital management of physical samples. This CI will have a distributed architecture that links existing data facilities and services, including HydroShare, EarthChem, SESAR (System for Earth Sample Registration), and eventually other systems like OpenTopography as needed, via a central CZ Hub that provides tools and services for simplified data submission, integrated data discovery and access, and links to computational resources for data analysis and visualization in support of CZ synthesis efforts. Our goal is to make data, samples, and software collected by the CZ Network Cluster projects Findable, Accessible, Interoperable, and Reusable following the FAIR guiding principles for scientific data management and stewardship, by taking advantage of existing, FAIR compliant, domain-specific data repositories. This collaboration among domain repositories to deliver integrated data services for an interdisciplinary science program will provide a template for future development of integrated interdisciplinary data services.

How to cite: Horsburgh, J. S., Lehnert, K., and Bales, J.: Linking Domain Repositories to Build Cyberinfrastructure for Interdisciplinary Critical Zone Research, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13796, https://doi.org/10.5194/egusphere-egu21-13796, 2021.

EGU21-16356 | vPICO presentations | ESSI3.3

re3data COREF – Enhancing the re3data service as a community-driven and trustworthy resource for research data repositories and portals

Nina Weisweiler, Kirsten Elger, Robert Ulrich, Michael Witt, Lea Maria Ferguson, Maxi Kindling, Gabriele Kloska, Nguyen Thanh Binh, Rouven Schabinger, Dorothea Strecker, Margarita Trofimenko, and Paul Vierkant

re3data is the global registry for research data repositories. As of January 2021, the service lists over 2620 digital repositories across all scientific disciplines and provides an extensive description of repositories based on a detailed metadata schema (https://doi.org/10.2312/re3.008). A variety of funders, publishers, and scientific organizations around the world refer to re3data within their guidelines and policies, recommending the service to researchers looking for appropriate repositories for storage and discovery of research data. With over 750 entries the field of geosciences is one of the most strongly represented subject groups in the registry.

The re3data COREF project (Community Driven Open Reference for Research Data Repositories) started in January 2020 and receives funding from the German Research Foundation (DFG) for 36 months. With its main focus on the current project the presentation will outline the further professionalization of re3data and the provision of reliable and individualizable descriptions of research data repositories. This includes updates and revisions of the metadata schema, the advancement of the technical infrastructure as well as an enhanced overall (technical) service model concept to embed and connect the service within the research data landscape as a community-driven source and reference for trustworthy repositories.

In addition, outcomes from the first re3data COREF stakeholder survey and workshop held in November 2020 will be presented, introducing diverse use cases of the re3data service and examples for the reuse of its metadata. The presentation will address how re3data currently interlinks with external parties and how more advanced options for easier and trustworthy integration of third-party information can be facilitated.

How to cite: Weisweiler, N., Elger, K., Ulrich, R., Witt, M., Ferguson, L. M., Kindling, M., Kloska, G., Binh, N. T., Schabinger, R., Strecker, D., Trofimenko, M., and Vierkant, P.: re3data COREF – Enhancing the re3data service as a community-driven and trustworthy resource for research data repositories and portals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16356, https://doi.org/10.5194/egusphere-egu21-16356, 2021.

EGU21-3648 | vPICO presentations | ESSI3.3

Citation and credit: The role of researchers, journals, and repositories to ensure data, software and samples are linked to publications with proper attribution.  

Shelley Stall, Helen Glaves, Brooks Hanson, Kerstin Lehnert, Erin Robinson, and Lesley Wyborn

The Earth, space, and environmental sciences have made significant progress in awareness and implementation of policy and practice around the sharing of data, software, and samples.  In specific, the Coalition for Publishing Data in the Earth and Space Sciences (https://copdess.org/) brings together data repositories and journals to discuss and address common challenges in support of more transparent and discoverable research and the supporting data.  Since the inception of COPDESS in 2014 and the completion of the Enabling FAIR Data Project in 2019, work has continued on the improvement of availability statements for data and software as well as corresponding citations.  

As the broad research community continues to make progress around data and software management and sharing, COPDESS is focused on several key efforts. These include 1) supporting authors in identifying the most appropriate data repository for preservation, 2) validating that all manuscripts have data and software availability statements, 3) ensuring data and software citations are properly included and linked to the publication to support credit, 4) encouraging adoption of best practices. 

We will review the status of these current efforts around data and software sharing, the important role that repositories and researchers have to ensure that automated credit and attribution elements are in place, and the recent publications on software citation guidance from the FORCE11 Software Implementation Working Group.

How to cite: Stall, S., Glaves, H., Hanson, B., Lehnert, K., Robinson, E., and Wyborn, L.: Citation and credit: The role of researchers, journals, and repositories to ensure data, software and samples are linked to publications with proper attribution.  , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3648, https://doi.org/10.5194/egusphere-egu21-3648, 2021.

EGU21-1299 | vPICO presentations | ESSI3.3

FAIR, Open and Free does not mean no restrictions

Keith Jeffery

FAIR, open and free are rarely user correctly to describe access to assets.  In fact, assets - expected to be or described as FAIR, open and free - are subject to many restrictions.  The major ones are:

(1) Security: to protect the asset from unavailability and any process from corruption, related to curation.  Security breaches may be criminal.

(2) Privacy: to protect any personal data within or about the asset. The General Data Protection Legislation is highly relevant here and severe punishments are available.

(3) Rights and licences: the asset may be subject to claimed rights (such as copright or database right or even patenting) and also to licensing which may be more or less restrictive;

(4) Authorisation: within an Authentication, Authorisation, Accounting Infrastructure (AAAI), authorisation of authenticated user access in a given user role (owner, manager...) to assets in appropriate modes (read, update...) possibly within a certain time period and subject to asset licensing is only permitted;

(5) Terms and Conditions: the system controlling the assets may have associated terms and conditions of use including - but not restricted to - liability, user behaviour, use of cookies.

In EPOS we are drawing together all these aspects into an integrated policy-driven set of mechanisms in the system including rich metadata, policy and licence documents, informed consent at the user interface and an AAAI system based on the recommendaions of AARC (https://aarc-project.eu/ ).

 

How to cite: Jeffery, K.: FAIR, Open and Free does not mean no restrictions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1299, https://doi.org/10.5194/egusphere-egu21-1299, 2021.

EGU21-10817 | vPICO presentations | ESSI3.3

Why and how does CDGP limit access to some deep geothermal data

Mathieu Turlure, Marc Schaming, Jean Schmittbuhl, and Marc Grunberg

The Data Centre for Deep Geothermal Energy (CDGP – Centre de Données de Géothermie Profonde, https://cdgp.u-strasbg.fr) was launched in 2016 by the LabEx G-EAU-THERMIE PROFONDE - now ITI GeoT, https://iti-geot.unistra.fr/ - to preserve, archive and distribute data acquired on geothermal sites in Alsace. Since the beginning of the project, specific procedures are followed to respect international requirements for data management. In particular, FAIR recommendations are used to distribute Findable, Accessible, Interoperable and Reusable data “As Open as Possible, as Closed as Necessary”.

CDGP distributes data originating from academic institutions as well as industrial partners. The former are obviously open and disseminated without restriction, to fulfil Open Science requirements. The latter are nevertheless less opened, depending on the access restrictions given by the data owner. Up to now, the industrial data may be open, restricted to academic, distributed case-by-case (after owner’s agreement), or closed. Metadata are fully open. The access rights are also pushed to the EPOS TCS-AH platform (https://tcs.ah-epos.eu).

CDGP implemented an Authentication, Authorization and Accounting Infrastructure (AAAI) to handle the distribution rules. Business category is verified at least for academics to grant access. Datasets are provided (or denied) automatically if possible. If necessary, the user’s request is forwarded to the provider who can accept or disallow access. Reports listing datasets distributed to users are sent to providers every six months. This AAAI is build to earn and keep data providers’ trust, as well as to publicized data.

CDGP is trying to broaden the number of open datasets. There are questions on access restrictions to some vintage industrial data of Soultz-sous-Forêts, since some of them where acquired with public European funding. Also, industrial data from Vendenheim area where several felt earthquakes occurred (2019, 2020), currently not available, may become partly accessible since some exploration was done for “scientific purpose” and that expertise studies are required to understand the induced seismicity.

How to cite: Turlure, M., Schaming, M., Schmittbuhl, J., and Grunberg, M.: Why and how does CDGP limit access to some deep geothermal data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10817, https://doi.org/10.5194/egusphere-egu21-10817, 2021.

ESSI3.7 – Free and Open Source Software (FOSS) and Cloud-based Technologies to Facilitate Collaborative Science

EGU21-1614 | vPICO presentations | ESSI3.7

A new distributed data analysis framework for better scientific collaborations

Philipp S. Sommer, Viktoria Wichert, Daniel Eggert, Tilman Dinter, Klaus Getzlaff, Andreas Lehmann, Christian Werner, Brenner Silva, Lennart Schmidt, and Angela Schäfer

A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.

We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own infrastructure. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed “close to the data” while using the institutional infrastructure where the eligible data set is stored.

With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.

This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.  (Helmholtz Association of German Research Centres, HGF).

How to cite: Sommer, P. S., Wichert, V., Eggert, D., Dinter, T., Getzlaff, K., Lehmann, A., Werner, C., Silva, B., Schmidt, L., and Schäfer, A.: A new distributed data analysis framework for better scientific collaborations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1614, https://doi.org/10.5194/egusphere-egu21-1614, 2021.

EGU21-4432 | vPICO presentations | ESSI3.7

Assembling a geoscience information portal from pieces of the open source software jigsaw puzzle

Vincent Fazio, Carsten Friedrich, Rini Angreani, Pavel Golodoniuc, John Hille, Alex Hunt, LingBo Jiang, Jens Klump, Geoffrey Squire, Peter Warren, Ulrich Engelke, Stuart Woodman, and Sam Bradley

As open source geospatial mapping toolkits and platforms continue to develop and mature, the developers of web portals using these solutions need to regularly review and revaluate their technology choices in order to stay up to date and provide the best possible experience and functionality to their users. We are currently undergoing such a refresh with our AuScope Discovery Portal, Virtual Geophysics Laboratory, and the AuScope 3D Geological Models Portal. The task of deciding which solutions to utilise as part of the upgrade process is not to be underestimated. Our main evaluation criteria include the ability to support commonly used map layer formats and web service protocols, support for 3D display capabilities, community size and activity, ease of adding custom display and scientific workflow / processing widgets, cost and benefits of integration with existing components and maintainability into the future. We are beginning a journey to update and integrate our portals’ functionality and will outline the decision process and conclusions of our investigations as well as the detailed evaluation of web based geospatial solutions against our functional and operational criteria.

How to cite: Fazio, V., Friedrich, C., Angreani, R., Golodoniuc, P., Hille, J., Hunt, A., Jiang, L., Klump, J., Squire, G., Warren, P., Engelke, U., Woodman, S., and Bradley, S.: Assembling a geoscience information portal from pieces of the open source software jigsaw puzzle, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4432, https://doi.org/10.5194/egusphere-egu21-4432, 2021.

EGU21-6164 | vPICO presentations | ESSI3.7

Inishell 2.0: Semantically driven automatic GUI generation for scientific models

Mathias Bavay, Michael Reisecker, Thomas Egger, and Daniela Korhammer

As numerical model developers, we have experienced first hand how most users struggle with the configuration of the models, leading to numerous support requests. Such issues are usually mitigated by offering a Graphical User Interface (GUI) that flattens the learning curve. This requires however a significant investment for the model developer as well as a specific skill set. Moreover, this does not fit with the daily duties of model developers. As a consequence, when a GUI has been created -- usually within a specific project and often relying on an intern -- the maintenance either constitutes a major burden or is not performed. This also tends to limit the evolution of the numerical models themselves, since the model developers try to avoid having to change the GUI.

To circumvent that problem, we have developed Inishell [1], a C++/Qt application based on an XML description of the inputs required by the numerical model that generates a GUI on the fly. This makes maintenance of the GUI very simple and enables users to easily get an up-to-date GUI for configuring the numerical model. The first version of this tool was written almost ten years ago and showed that the concept works very well for our own surface processes models. A full rewrite offering a more modern interface and extended capabilities is presented here.

 

[1] Bavay, M., Reisecker, M., Egger, T., and Korhammer, D., “Inishell 2.0: Semantically driven automatic GUI generation for scientific models”, Geosci. Model Dev. Discuss. [preprint], https://doi.org/10.5194/gmd-2020-339, in review, 2020.

How to cite: Bavay, M., Reisecker, M., Egger, T., and Korhammer, D.: Inishell 2.0: Semantically driven automatic GUI generation for scientific models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6164, https://doi.org/10.5194/egusphere-egu21-6164, 2021.

Routing on a road network requires geographical points on the road network that correspond best to the addresses of the given origin and destination, here called snapping points. The technique to determine such snapping points is also called offline map matching. Conventional routing machines use the shortest perpendicular distance from a building’s centroid to the road network for this purpose. However, in some cases, this technique leads to suboptimal results when the access to a building is not reachable from the road segment with the shortest perpendicular distance. We used open-source data — multispectral images, OpenStreetMap data, Light Detection and Ranging (LiDAR) data — to perform a cost-distance analysis and determined the most likely access to buildings. Therefore, we assumed that the path to the building shows less vegetation cover, minimal slope of the terrain and avoids building footprints. Our results are validated based on a predetermined Ideal Snapping Area for different weightings of the parameters vegetation, slope und building footprints. We also compared our results with a conventional routing machine (Open Source Route Machine - ) that uses the perpendicular distance. The validation-rate of our approach is up to 90%, depending on the weighting of chosen parameters, whereas the conventional routing machine shows a validation-rate of 81%. The optimized snapping points can be used to determine enhanced stop locations in passenger transport to improve services such as door-to-door transportation (e.g. demand-responsive transport).

How to cite: Hahn, A., Frühling, W., and Schlüter, J.: Using open-source high resolution remote sensing data to determine the access to buildings in the context of passenger transport, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9408, https://doi.org/10.5194/egusphere-egu21-9408, 2021.

EGU21-9602 | vPICO presentations | ESSI3.7

Analyzing large-scale Earth Observation data repositories made simple with OpenEO Platform

Edzer Pebesma, Patrick Griffiths, Christian Briese, Alexander Jacob, Anze Skerlevaj, Jeroen Dries, Gilberto Camara, and Matthias Mohr

The OpenEO API allows the analysis of large amounts of Earth Observation data using a high-level abstraction of data and processes. Rather than focusing on the management of virtual machines and millions of imagery files, it allows to create jobs that take a spatio-temporal section of an image collection (such as Sentinel L2A), and treat it as a data cube. Processes iterate or aggregate over pixels, spatial areas, spectral bands, or time series, while working at arbitrary spatial resolution. This pattern, pioneered by Google Earth Engine™ (GEE), lets the user focus on the science rather than on data management.

The openEO H2020 project (2017-2020) has developed the API as well as an ecosystem of software around it, including clients (JavaScript, Python, R, QGIS, browser-based), back-ends that translate API calls into existing image analysis or GIS software or services (for Sentinel Hub, WCPS, Open Data Cube, GRASS GIS, GeoTrellis/GeoPySpark, and GEE) as well as a hub that allows querying and searching openEO providers for their capabilities and datasets. The project demonstrated this software in a number of use cases, where identical processing instructions were sent to different implementations, allowing comparison of returned results.

A follow-up, ESA-funded project “openEO Platform” realizes the API and progresses the software ecosystem into operational services and applications that are accessible to everyone, that involve federated deployment (using the clouds managed by EODC, Terrascope, CreoDIAS and EuroDataCube), that will provide payment models (“pay per compute job”) conceived and implemented following the user community needs and that will use the EOSC (European Open Science Cloud) marketplace for dissemination and authentication. A wide range of large-scale cases studies will demonstrate the ability of the openEO Platform to scale to large data volumes.  The case studies to be addressed include on-demand ARD generation for SAR and multi-spectral data, agricultural demonstrators like crop type and condition monitoring, forestry services like near real time forest damage assessment as well as canopy cover mapping, environmental hazard monitoring of floods and air pollution as well as security applications in terms of vessel detection in the mediterranean sea.

While the landscape of cloud-based EO platforms and services has matured and diversified over the past decade, we believe there are strong advantages for scientists and government agencies to adopt the openEO approach. Beyond the absence of vendor/platform lock-in or EULA’s we mention the abilities to (i) run arbitrary user code (e.g. written in R or Python) close to the data, (ii) carry out scientific computations on an entirely open source software stack, (iii) integrate different platforms (e.g., different cloud providers offering different datasets), and (iv) help create and extend this software ecosystem. openEO uses the OpenAPI standard, aligns with modern OGC API standards, and uses the STAC (SpatioTemporal Asset Catalog) to describe image collections and image tiles.

How to cite: Pebesma, E., Griffiths, P., Briese, C., Jacob, A., Skerlevaj, A., Dries, J., Camara, G., and Mohr, M.: Analyzing large-scale Earth Observation data repositories made simple with OpenEO Platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9602, https://doi.org/10.5194/egusphere-egu21-9602, 2021.

EGU21-9632 | vPICO presentations | ESSI3.7

Easing and promoting the application of ML and AI in earth system sciences - introducing the KI:STE platform

Thomas Seidler, Norbert Schultz, Dr. Markus Quade, Christian Autermann, Dr. Benedikt Gräler, and PD Dr. Markus Abel

Earth system modeling is virtually impossible without dedicated data analysis. Typically, data are big and due to the complexity of the system, adequate tools for the analysis lie in the domain of machine learning or artificial intelligence. However, earth system specialists have other expertise than developing and deploying state-of-the art programming code which is needed to efficiently use modern software frameworks and computing resources. In addition, Cloud and HPC infrastructure are frequently needed to run analyses with data beyond Tera- or even Petascale volume, and corresponding requirements on available RAM, GPU and CPU sizes. 

Inside the KI:STE project (www.kiste-project.de), we extend the concepts of an existing project, the Mantik-platform (www.mantik.ai), such that handling of data and algorithms is facilitated for earth system analyses while abstracting technical challenges such as scheduling and monitoring of training jobs and platform specific configurations away from the user.

The principles for design are collaboration and reproducibility of algorithms from the first data load to the deployment of a model to a cluster infrastructure. In addition to the executive part where code is developed and deployed, the KI:STE project develops a learning platform where dedicated topics in relation to earth system science are systematically and pedagogically presented.

In this presentation, we show the architecture and interfaces of the KI:STE platform together with a simple example.

How to cite: Seidler, T., Schultz, N., Quade, Dr. M., Autermann, C., Gräler, Dr. B., and Abel, P. Dr. M.: Easing and promoting the application of ML and AI in earth system sciences - introducing the KI:STE platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9632, https://doi.org/10.5194/egusphere-egu21-9632, 2021.

EGU21-12467 | vPICO presentations | ESSI3.7

Swissforages: the Free and Open-Source Borehole Data Management System

Milan Antonovic, Massimiliano Cannata, Nils Oesterling, and Sabine Brodhag

Most of the time boreholes data, particularly those collected in the past, are in the form of static data reports that describe the stratigraphy and the related characteristics; these data types are generally available as paper documents, or static files like .pdf of images (.ai). While very informative, these documents are not searchable, not interoperable nor easily reusable, since they require a non negligible time for data integration. Sometime, data are archived into database. This certainly improve the find-ability of the data and its accessibility but still do not address the interoperability requirement and therefore, combining data from different sources remain a problematic task. To enable FAIR borehole data and facilitate the different entities (public or private) management Swisstopo (www.swisstopo.ch) has funded the development of a Web application named Borehole Data Management System (BDMS) [1] that adopt the borehole data model () [2] implemented by the Swiss Geological Survey. From the first beta release (2019) several improvements to the platform has been implemented leading to the last official release of the platform (v1.0.2) officially available on www.swissforages.ch. The latest released features includes:

  • Borehole document storage
  • Interface customization
  • Improved access & authorization managemnt
  • External WMS/WMTS background map support
  • User feedbacks form
  • Handling of personalized and versioned terms of service
  • Enhanced bulk data import
  • Minor enhancements and bug fixes

 

How to cite: Antonovic, M., Cannata, M., Oesterling, N., and Brodhag, S.: Swissforages: the Free and Open-Source Borehole Data Management System, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12467, https://doi.org/10.5194/egusphere-egu21-12467, 2021.

EGU21-4031 | vPICO presentations | ESSI3.7

New open source tools for MARSIS: providing access to SEG-Y data format for 3D analysis.

Giacomo Nodjoumi, Luca Guallini, Roberto Orosei, Luca Penasa, and Angelo Pio Rossi

The objective of this work is to present a new Free and Open-Source Software (FOSS) to read and convert to multiple data formats data acquired by the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS) instrument on board Mars Express (MEX) orbiting Mars since 2005.

MARSIS is an orbital synthetic aperture radar sounder that operates with dual-frequency between 1.3 and 5.5 MHz and wavelengths between 230 and 55 m for subsurface sounding. The Experiment Data Record (EDR) and Reduced Data Record (RDR) datasets are available for download on public access platforms such as the Planetary Science Archive fo ESA and the PDS-NASA Orbital Data Explorer (ODE).

These datasets have been widely used for different research, focused to study the subsurface of the red planet up to a depth of a few kilometres, and especially for studying ice caps and looking for subsurface ice and water deposits, producing relevant results. (Lauro et al., 2020; Orosei et al., 2020)

The Python tool presented here is capable of reading common data types used to distribute MARSIS dataset and then converting into multiple data formats. Users can interactively configure data source, destination, pre-processing and type of outputs among:

  • Geopackages: for GIS software, is a single self-contained file containing a layer in which are stored all parameters for each file processed.
  • Numpy array dump: for fast reading and analysis of original data for both frequencies.
  • PNG images: for fast inspections, created for each frequency, and saved. Image pre-processing filters, such as image-denoising, standardization and normalization, can be selected by user.
  • SEG-Y: for analysing data with seismic interpretation and processing software, see e.g. OpendTect, consist of a SEG-Y file for each frequency.

SEG-Y capability is the most relevant feature, since is not present in any of other FOSS tool and give to researchers the possibility to visualize radargrams in advanced software, specific for seismic interpretation and analysis, making it possible to interpret the data in a fully three-dimensional environment.

This tool, available on zenodo (Nodjoumi, 2021), has been developed completely in Python 3, relying only on open-source libraries, compatible with principal operating systems and with parallel processing capabilities, granting easy scalability and usability across a wide range of computing machines. It is also highly customizable since it can be expanded adding processing steps before export or new types of output. An additional module to ingest data directly into PostgreSQL/PostGIS and a module to interact directly with ACT-REACT interface of data platforms are under development.

Acknowledgments:

This study is within the Europlanet 2024 RI, and it has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871149. 

References:

Lauro, S. E. et al. (2020) ‘Multiple subglacial water bodies below the south pole of Mars unveiled by new MARSIS data’, doi: 10.1038/s41550-020-1200-6.

Nodjoumi, G. (2021) 'MARSIS-xDR-READER', doi: 10.5281/zenodo.4436199

Orosei, R. et al. (2020) ‘The global search for liquid water on mars from orbit: Current and future perspectives’, doi: 10.3390/life10080120.

How to cite: Nodjoumi, G., Guallini, L., Orosei, R., Penasa, L., and Rossi, A. P.: New open source tools for MARSIS: providing access to SEG-Y data format for 3D analysis., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4031, https://doi.org/10.5194/egusphere-egu21-4031, 2021.

EGU21-2053 | vPICO presentations | ESSI3.7

Development of an interactive Cloud-based seismic network modelling application on a common Geophysical Processing Toolkit platform

Pavel Golodoniuc, Januka Attanayake, Abraham Jones, and Samuel Bradley

Detecting and locating earthquakes relies on seismic events being recorded by a number of deployed seismometers. To detect earthquakes effectively and accurately, seismologists must design and install a network of seismometers that can capture small seismic events in the sub-surface.

A major challenge when deploying an array of seismometers (seismic array) is predicting the smallest earthquake that could be detected and located by that network. Varying the spacing and number of seismometers dramatically affects network sensitivity and location precision and is very important when researchers are investigating small-magnitude local earthquakes. For cost reasons, it is important to optimise network design before deploying seismometers in the field. In doing so, seismologists must accurately account for parameters such as station locations, site-specific noise levels, earthquake source parameters, seismic velocity and attenuation in the wave propagation medium, signal-to-noise ratios, and the minimum number of stations required to compute high-quality locations.

AuScope AVRE Engage Program team has worked with researchers from the seismology team at the University of Melbourne to better understand their solution for optimising seismic array design to date: an analytical method called SENSI that has been developed by Tramelli et al. (2013) to design seismic networks, including the GipNet array deployed to monitor seismicity in the Gippsland region in Victoria, Australia. The underlying physics and mechanics of the method are straightforward, and when applied sensibly, can be used as a basis for the design of seismic networks anywhere in the world. Our engineers have built an application leveraging a previously developed Geophysical Processing Toolkit (GPT) as an application platform and harnessed the scalability of a Cloud environment provided by the EASI Hub, which minimised the overall development time. The GPT application platform provided the groundwork for a web-based application interface and enabled interactive visualisations to facilitate human-computer interaction and experimentation.

How to cite: Golodoniuc, P., Attanayake, J., Jones, A., and Bradley, S.: Development of an interactive Cloud-based seismic network modelling application on a common Geophysical Processing Toolkit platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2053, https://doi.org/10.5194/egusphere-egu21-2053, 2021.

EGU21-12152 | vPICO presentations | ESSI3.7

New cloud-based tool to Dynamically Manage Urban Resilience: the Fresnel Platform for Greater Paris

Guillaume Drouen, Daniel Schertzer, and Ioulia Tchiguirinskaia

As cities are put under greater pressure from the threat of impacts of climate change, in particular the risk of heavier rainfall and flooding, there is a growing need to establish a hierarchical form of resilience in which critical infrastructures can become sustainable. The main difficulty is that geophysics and urban dynamics are strongly nonlinear with an associated, extreme variability over a wide range of space-time scales.

The polarimetric X-band radar at the ENPC’s campus (East of Paris) introduced a paradigm change in the prospects of environmental monitoring in Ile-de France. The radar is operated since May 2015 and has several characteristics that makes it of central importance for the environmental monitoring of the region.

Based on the radar data and other scientific mesurement tools, the platform for greater Paris was developped in participative co-creation, and in scientific collaboration with the world leader industrial in water management. As the need for data accessibility, a fast and reliable infrastructure were major requirements from the scientific community, the platform was build as a cloud-based solution. It provides scientific weather specialists, as well as water manager,  a fast and steady platform accessible from their web browser on desktop and mobile displays.

It was developped using free and open sources librairies, it is rooted on an integrated suite of modular components based on an asynchronous event-driven JavaScript runtime environment. It includes a comprehensive and (real-time) accessible database and also provides tools to analyse historical data on different time and geographic scales around the greater Paris.

The Fresnel SaaS (Sofware as a Service) cloud-based platform is an example of nowadays IT tools to dynamically enhance urban resilience. Developments are still in progress, in constant request and feedback loops from the scientific and professional world.

How to cite: Drouen, G., Schertzer, D., and Tchiguirinskaia, I.: New cloud-based tool to Dynamically Manage Urban Resilience: the Fresnel Platform for Greater Paris, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12152, https://doi.org/10.5194/egusphere-egu21-12152, 2021.

EGU21-14441 | vPICO presentations | ESSI3.7

Cloud-based Research Data Infrastructures Integrating In-Situ and Remote Sensing Data

Simon Jirka, Benedikt Gräler, Matthes Rieke, and Christian Autermann

For many scientific domains such as hydrology, ocean sciences, geophysics and social sciences, geospatial observations are an important source of information. Scientists conduct extensive measurement campaigns or operate comprehensive monitoring networks to collect data that helps to understand and to model current and past states of complex environment. The variety of data underpinning research stretches from in-situ observations to remote sensing data (e.g., from the European Copernicus programme) and contributes to rapidly increasing large volumes of geospatial data.

However, with the growing amount of available data, new challenges arise. Within our contribution, we will focus on two specific aspects: On the one hand, we will discuss the specific challenges which result from the large volumes of remote sensing data that have become available for answering scientific questions. For this purpose, we will share practical experiences with the use of cloud infrastructures such as the German platform CODE-DE and will discuss concepts that enable data processing close to the data stores. On the other hand, we will look into the question of interoperability in order to facilitate the integration and collaborative use of data from different sources. For this aspect, we will give special consideration to the currently emerging new generation of standards of the Open Geospatial Consortium (OGC) and will discuss how specifications such as the OGC API for Processes can help to provide flexible processing capabilities directly within Cloud-based research data infrastructures.

How to cite: Jirka, S., Gräler, B., Rieke, M., and Autermann, C.: Cloud-based Research Data Infrastructures Integrating In-Situ and Remote Sensing Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14441, https://doi.org/10.5194/egusphere-egu21-14441, 2021.

EGU21-15302 | vPICO presentations | ESSI3.7

OCRE: started the funding opportunities for the European research community for using OCRE’s procured Cloud and Earth Observation commercial services.

José Manuel Delgado Blasco, Antonio Romeo, David Heyns, Natassa Antoniou, and Rob Carrillo

The OCRE project, an H2020 funded by the European Commission, aims to increase the usage of Cloud and EO services by the European research community by putting available EC funds 9.5M euro, aiming to removing the barriers regarding the service discovery and providing services free-at-the-point-of-the-user.

OCRE started to grant EU research projects for using OCRE’s procured Cloud commodity and EO services through respective open calls in 2019-2020. Additionally, in 2021 additional open calls are foreseen also for projects willing to receive funds for using EO services procured by the OCRE project. Also, a permanent open call for individual researchers is foreseen.

During 2020, OCRE also funded, through another open call, EU projects dealing with research related to COVID-19 and they were the first projects that started the usage of the available commodity services. Additionally, in 2020, the OCRE project closed and awarded EU service providers for the provision of cloud and commodity services and, in early 2021, the Dynamic Purchasing System (DPS) for the procurement of EO services will be opened.

Additionally, during 2020 an External Advisory Board (EAB) was created to assist OCRE in the project awarding process. The EAB is formed by recognized experts from different domains providing OCRE with the balanced knowledge needed to ensure transparency and equality in such an important process.

This presentation will provide an overview of the possibilities offered by OCRE to researchers interested in boosting their activities using commercial cloud services.

How to cite: Delgado Blasco, J. M., Romeo, A., Heyns, D., Antoniou, N., and Carrillo, R.: OCRE: started the funding opportunities for the European research community for using OCRE’s procured Cloud and Earth Observation commercial services., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15302, https://doi.org/10.5194/egusphere-egu21-15302, 2021.

EGU21-2442 | vPICO presentations | ESSI3.7

Transfer Data from NetCDF on Hierarchical Storage to Zarr on Object Storage: CMIP6 Climate Data Use Case

Marco Kulüke, Fabian Wachsmann, Georg Leander Siemund, Hannes Thiemann, and Stephan Kindermann

This study provides a guidance to data providers on how to transfer existing NetCDF data from a hierarchical storage system into Zarr to an object storage system.

In recent years, object storage systems became an alternative to traditional hierarchical file systems, because they are easily scalable and offer faster data retrieval, as compared to hierarchical storage systems.

Earth system sciences, and climate science in particular, handle large amounts of data. These data usually are represented as multi-dimensional arrays and traditionally stored in netCDF format on hierarchical file systems. However, the current netCDF-4 format is not yet optimized for object storage systems. NetCDF data transfers from an object storage can only be conducted on file level which results in heavy download volumes. An improvement to mitigate this problem can be the Zarr format, which reduces data transfers, due to the direct chunk and meta data access and hence increases the input/output operation speed in parallel computing environments.

As one of the largest climate data providers worldwide, the German Climate Computing Center (DKRZ) continuously works towards efficient ways to make data accessible for the user. This use case shows the conversion and the transfer of a subset of the Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data archive from netCDF on the hierarchical file system into Zarr to the OpenStack object store, known as Swift, by using the Zarr Python package. Conclusively, this study will evaluate to what extent Zarr formatted climate data on an object storage system is a meaningful addition to the existing high performance computing environment of the DKRZ.

How to cite: Kulüke, M., Wachsmann, F., Siemund, G. L., Thiemann, H., and Kindermann, S.: Transfer Data from NetCDF on Hierarchical Storage to Zarr on Object Storage: CMIP6 Climate Data Use Case, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2442, https://doi.org/10.5194/egusphere-egu21-2442, 2021.

EGU21-3205 | vPICO presentations | ESSI3.7

SWIRRL API for provenance-aware and reproducible workspaces. The EPOS and IS-ENES approach.

Alessandro Spinuso, Friedrich Striewski, Ian van der Neut, Mats Veldhuizen, Tor Langeland, Christian Page, and Daniele Bailo

Modern interactive tools for data analysis and visualisation are designed to expose their functionalities as a service through the web. We present an open source web API (SWIRRL) that allows Science Gateways to easily integrate such tools in their websites and re-purpose them to their users. The API, developed in the context of the ENVRIFair and IS-ENES3 EU projects, deals on behalf of the clients with the underlying complexity of allocating and managing resources within a target container orchestration platform on the cloud. By combining storage and third parties' tools, such as JupyterLab and the Enlighten visualisation software, the API creates dedicated working sessions on-demand. Thanks to the API’s staging workflows, SWIRRL sessions can be populated with data of interest collected from external data providers. The system is designed to offer customisation and reproducibility thanks to the recording of provenance, which is performed for each method of the API’s affecting the session. This is implemented by combining a PROV-Templates catalogue and a graph database, which are deployed as independent microservices. Notebooks can be customised with new or updated libraries, and the provenance of such changes is then exposed to users via the SWIRRL interactive JupyterLab extension. Here, users can control different types of reproducibility actions. For instance, they can restore the libraries and data used within the notebook in the past, as well as creating snapshots of the running environment. This allows users to share and rebuild full Jupyter workspaces, including raw data and user generated methods. Snapshots are stored to Git as Binder repositories, thereby compatible with  mybinder.org. Finally, we will discuss how SWIRRL is and will be adopted by existing portals for Climate analysis (Climate4Impact) and for Solid Earth Science (EPOS), where advanced data discovery capabilities are combined with customisable, recoverable and reproducible workspaces.

How to cite: Spinuso, A., Striewski, F., van der Neut, I., Veldhuizen, M., Langeland, T., Page, C., and Bailo, D.: SWIRRL API for provenance-aware and reproducible workspaces. The EPOS and IS-ENES approach., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3205, https://doi.org/10.5194/egusphere-egu21-3205, 2021.

EGU21-5489 | vPICO presentations | ESSI3.7

ESM-Tools Version 5.0: A modular infrastructure for stand-alone and coupled Earth System Modelling (ESM)

Dirk Barbi, Miguel Andrés-Martínez, Deniz Ural, Luisa Cristini, Paul Gierz, and Nadine Wieters

During the last two decades, modern societies have gradually understood the urge to tackle the climate change challenge, and consequently, a growing number of national and international initiatives have been launched with the aim of better understanding the Earth System. In this context, Earth System Modelling (ESM) has rapidly expanded, leading to a large number of research groups targeting the many components of the system at different scales and with different levels of interactions between components. This has led to the development of increasing number of models, couplings, versions tuned to address different scales or scenarios, and model-specific compilation and operating procedures. This operational complexity makes the implementation of multiple models excessively time consuming especially for less experienced modellers.

ESM-Tools is an open-source modular software written in Python, aimed to overcome many of the difficulties associated to the operation of ESMs. ESM-Tools allows for downloading, compiling and running a wide range of ESM models and coupled setups in the most important HPC facilities available in Germany. It currently supports multiple models for ocean, atmosphere, biochemistry, ice sheet, isostatic adjustment, hydrology, and land-surface, and six ocean-atmosphere and two ice-sheet-ocean-atmosphere coupled setups, through two couplers (included modularly through ESM-Interface). The tools are coded in Python while all the component and coupling information is contained in easy-to-read YAML files. The front-end user is required to provide only a short script written in YAML format, containing the experiment specific definitions. This user-friendly interface makes ESM-Tools a convenient software for training and educational purposes. Simultaneously, its modularity and the separation between the component-specific information and tool scripts facilitates the implementation and maintenance of new components, couplings and versions. ESM-Tools team of scientific programmers provides also user support, workshops and detailed documentation. The ESM-Tools were developed within the framework of the project Advance Earth System Model Capacity, supported by Helmholtz Association and has become one of the main pillars of the German infrastructure for Climate Modelling.

How to cite: Barbi, D., Andrés-Martínez, M., Ural, D., Cristini, L., Gierz, P., and Wieters, N.: ESM-Tools Version 5.0: A modular infrastructure for stand-alone and coupled Earth System Modelling (ESM), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5489, https://doi.org/10.5194/egusphere-egu21-5489, 2021.

EGU21-10831 | vPICO presentations | ESSI3.7

Coupled earth system modeling on heterogeneous HPC architectures with ParFlow in the Terrestrial Systems Modeling Platform

Jaro Hokkanen, Stefan Kollet, Jiri Kraus, Andreas Herten, Markus Hrywniak, and Dirk Pleiter

Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Implementations that simultaneously result in a good performance and developer productivity while keeping the codebase adaptable and well maintainable in the long-term are of high importance. ParFlow, a widely used hydrologic model, achieves these attributes by hiding the architecture-dependent code in preprocessor macros (ParFlow embedded Domain Specific Language, eDSL) and leveraging NVIDIA's Unified Memory technology for memory management. The implementation results in very good weak scaling with up to 26x speedup when using four NVIDIA A100 GPUs per node compared to using the available 48 CPU cores. Good weak scaling is observed using hundreds of nodes on the new JUWELS Booster system at the Jülich Supercomputing Centre, Germany. Furthermore, it is possible to couple ParFlow with other earth system compartment models such as land surface and atmospheric models using the OASIS-MCT coupler library, which handles the data exchange between the different models. The ParFlow GPU implementation is fully compatible with the coupled implementation with little changes to the source code. Moreover, coupled simulations offer interesting load-balancing opportunities for optimal usage of the existing resources. For example, running ParFlow on GPU nodes, and another application component on CPU-only nodes, or efficiently distributing the CPU and GPU resources of a single node between the different application components may result in the best usage of heterogeneous architectures.

How to cite: Hokkanen, J., Kollet, S., Kraus, J., Herten, A., Hrywniak, M., and Pleiter, D.: Coupled earth system modeling on heterogeneous HPC architectures with ParFlow in the Terrestrial Systems Modeling Platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10831, https://doi.org/10.5194/egusphere-egu21-10831, 2021.

EGU21-12209 | vPICO presentations | ESSI3.7

Accelerating I/O in ESMs using on demand filesystems

Stefan Versick, Thomas Fischer, Ole Kirner, Tobias Meisel, and Jörg Meyer

Earth System Models (ESM) got much more demanding over the last years. Modelled processes got more complex and more and more processes are considered in models. In addition resolutions of the models got higher to improve accuracy of predictions. This requires faster high performance computers (HPC) and better I/O performance. One way to improve I/O performance is to use faster file systems. Last year we showed the impact of the ad-hoc file system on the performance of the ESM EMAC. An ad-hoc file system is a private parallel file system which is created on-demand for an HPC job using the node-local storage devices, in our case solid-state-disks (SSD). It only exists during the runtime of the job. Therefore output data have to be moved to a permanent file system before the job has finished. Performance improvements are due to the use of SSDs in case of small chunks of I/O or a high amount of I/O operations per second. Another reason for a performace boost is because the running job can exclusively access the file system. To get a better overview in which cases ESMs benefit from using ad-hoc file systems we repeated our performance tests with further ESMs with different I/O strategies. In total we now analyzed EMAC (parallel netcdf), ICON2.5 (netcdf with asynchronous I/O), ICON2.6 (netcdf with Climate Data Interface (CDI) library) and OpenGeoSys (parallel VTU).

How to cite: Versick, S., Fischer, T., Kirner, O., Meisel, T., and Meyer, J.: Accelerating I/O in ESMs using on demand filesystems, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12209, https://doi.org/10.5194/egusphere-egu21-12209, 2021.

EGU21-14517 | vPICO presentations | ESSI3.7

Making Cyclone Tracking accessible to end users for Climate Research and Applications

Christian Pagé, Maarten Plieger, Wim Som de Cerff, Alessandro Spinuso, Rosa Filgueira, Malcolm Atkinson, Chrysoula Themeli, Iraklis Klampanos, and Vangelis Karkaletsis

Climate impact and adaptation measures are becoming urgent to be put in place and anticipated. During the past years, climate change effects have been producing adverse conditions in many parts of the world, with significant societal and financial impacts. Advanced analysis tools are needed to process ensembles of simulations of the future climate, in order to generate useful and tailored products for end users.

An example of a complex analysis tool used in climate research and adaptation studies is a tool to follow storm tracks. In the context of climate change, it is important to know how storm tracks will change in the future, in both their frequency and intensity. Storms can cause significant societal impacts, hence it is important to assess future patterns. Having access to this type of complex analysis tool is very useful, and integrating them with front-ends like the IS-ENES climate4impact (C4I) can enable the use of those tools by a larger number of researchers and end users.

Integrating this type of complex tool is not an easy task. It requires significant development effort, especially if one of the objectives is also to adhere to FAIR principles. The DARE Platform enables research developers to faster develop the implementations of scientific workflows more rapidly. This work presents how such a complex analysis tool has been implemented to be easily integrated with the C4I platform. The DARE Platform also provides easy access to e-infrastructure services like EUDAT B2DROP, to store intermediate or final results and powerful provenance-powered tools to help researchers manage their work and data.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements N°824084 and N°777413.

How to cite: Pagé, C., Plieger, M., Som de Cerff, W., Spinuso, A., Filgueira, R., Atkinson, M., Themeli, C., Klampanos, I., and Karkaletsis, V.: Making Cyclone Tracking accessible to end users for Climate Research and Applications, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14517, https://doi.org/10.5194/egusphere-egu21-14517, 2021.

The Free and Open Source Software (FOSS) ecosystem around Geographic Information System (GIS) is currently seeing rapid growth – similar to FOSS ecosystems in other scientific disciplines. At the same time, the need of broad programming and software development skills appears to become a common theme for potential (scientific) users. There is a rather clear boundary between what can be done with Graphical User Interface applications such as QGIS only on the one hand side and contemporary software libraries on the other hand side – if one actually has the required skillet to use the latter. Practical experience shows that more and more types of research require far more than just rudimentary software development skills. Those can be hard to acquire and distract from the actual scientific work at hand. For instance the installation, integration and deployment of much desired software libraries from the field of high-performance computing (HPC) for e.g. general-purpose computing on graphics processing units (GPGPU) or computations on clusters or cloud resources is very often becoming an obstacle on its own. Recent advances in packaging and deployment systems around popular programming language ecosystems such as Python enable a new kind of thinking, however. Desktop GUI applications can now much more easily be combined with the mentioned type of libraries, which lowers the entry barrier to HPC applications and the handling of large quantities of data drastically. This work aims at providing an overview of the state of the art in this field and showcasing possible techniques.

How to cite: Ernst, S. M.: On combining GUI desktop GIS with computer clusters & cloud resources, the role of programming skills and the state of the art in GUI driven GIS HPC applications, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16185, https://doi.org/10.5194/egusphere-egu21-16185, 2021.

EGU21-4895 | vPICO presentations | ESSI3.7

era5cli: the command line interface to ERA5 data

Stef Smeets, Jaro Camphuijsen, Niels Drost, Fakhereh Alidoost, Bouwe Andela, Berend Weel, Peter Kalverla, Ronald van Haren, Klaus Zimmermann, Jerom Aerts, and Rolf Hut

With the release of the ERA5 dataset, worldwide high-resolution reanalysis data became available with open access for public use. The Copernicus CDS (Climate Data Store) offers two options for accessing the data: a web interface and a Python API. Consequently, automated downloading of the data requires advanced knowledge of Python and a lot of work. To make this process easier, we developed era5cli

The command line interface tool era5cli enables automated downloading of ERA5 using a single command. All variables and options available in the CDS web form are now available for download in an efficient way. Both the monthly and hourly dataset are supported. Besides automation, era5cli adds several useful functionalities to the download pipeline.

One of the key options in era5cli is to spread one download command over multiple CDS requests, resulting in higher download speeds. Files can be saved in both GRIB and NETCDF format with automatic, yet customizable file names. The info command lists correct names of the available variables and pressure levels for 3D variables. For debugging purposes and testing the dryrun option can be selected to return only the CDS request. An overview of all available options, including instructions on how to configure your CDS account, is available in our documentation. Recent developments include support for ERA5 back extension and ERA5-Land. The source code for era5cli is available on https://github.com/eWaterCycle/era5cli.

How to cite: Smeets, S., Camphuijsen, J., Drost, N., Alidoost, F., Andela, B., Weel, B., Kalverla, P., van Haren, R., Zimmermann, K., Aerts, J., and Hut, R.: era5cli: the command line interface to ERA5 data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4895, https://doi.org/10.5194/egusphere-egu21-4895, 2021.

EGU21-16484 | vPICO presentations | ESSI3.7

ECMWF's data archive and dissemination services migration to the Bologna Data Center. 

Sébastien Denvil, Manuel Fuentes, Matthew Manoussakis, Sebastien Villaume, Tiago Quintino, Simon Smart, and Baudouin Raoult
CMWF is the European Centre for Medium-Range Weather Forecasts. We are both a research institute and a 24/7 operational service, producing global numerical weather predictions and other data for our Member and Co-operating States and the broader community. The Centre has one of the largest supercomputer facilities and meteorological data archives in the world.
 
ECMWF is about to migrate his 400+ PB of data to his new data centre in Bologna while continuing its operations. We will present and discuss challenges and opportunities that this migration offers in terms of evolution of operation practices.
The planning, the evolution, and the transition periods of the ECMWF Data Handling System migration to Bologna will be presented.
 
The migration must occur while preserving ECMWF’s product generation and archive services, ensuring appropriate levels of quality of service. The planning and testing of a continuity plan of operations for operational forecasts, member states time critical suites, Copernicus suites (ERA5, CAMS C3S seasonal and alike), and research suites will be presented. This continuity plan of operation relies on the full identification and traceability of the data flow involves during critical operations. Indeed, it is not economically viable to keep the 400 PB online during all the migration period.
 
A completely redesigned data services deployment and testing mechanism will be use in the Bologna Data Center. Automation will be paramount in this context as the need is to redeploy entirely and from scratch all our services. This journey will be presented, and challenges inherent to software defined infrastructure and services will be discussed.

How to cite: Denvil, S., Fuentes, M., Manoussakis, M., Villaume, S., Quintino, T., Smart, S., and Raoult, B.: ECMWF's data archive and dissemination services migration to the Bologna Data Center. , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16484, https://doi.org/10.5194/egusphere-egu21-16484, 2021.

ESSI3.9 – Managing Geochemical Data from Field to Lab to Publication to Archive

EGU21-16420 | vPICO presentations | ESSI3.9

Managing Open and FAIR Data in Geochemistry: Where are we a decade after the Editors Roundtable?

Steven L Goldstein, Kerstin Lehnert, and Albrecht W Hofmann

The ultimate goal of research data management is to achieve the long-term utility and impact of data acquired by research projects. Proper data management ensures that all researchers can validate and replicate findings, and reuse data in the quest for new discoveries. Research data need to be open, consistently and comprehensively documented for meaningful evaluation and reuse following domain-specific guidelines, and available for reuse via public data repositories that make them Findable, persistently Accessible, Interoperable, and Reusable (FAIR).

In the early 2000’s, the development of geochemical databases such as GEOROC and PetDB underscored that the reporting and documenting practices of geochemical data in the scientific literature were inconsistent and incomplete. The original data could often not be recovered from the publications, and essential information about samples, analytical procedures, data reduction, and data uncertainties was missing, thus limiting meaningful reuse of the data and reproducibility of the scientific findings. In order to avoid that such poor scientific practice might potentially damage the health of the entire discipline, we launched the Editors Roundtable in 2007, an initiative to bring together editors, publishers, and database providers to implement consistent publication practices for geochemical data. Recognizing that mainstream scientific journals were the most effective agents to rectify problems in data reporting and implement best practices, members of the Editors Roundtable created and signed a policy statement that laid out ‘Requirements for the Publication of Geochemical Data’ (Goldstein et al. 2014, http://dx.doi.org/10.1594/IEDA/100426). This presentation will examine the impact of this initial policy statement, assess the current status of best practices for geochemical data management, and explore what actions are still needed. 

While the Editors Roundtable policy statement led to improved data reporting practices in some journals, and provided the basis for data submission policies and guidelines of the EarthChem Library (ECL), data reporting practices overall remained inconsistent and inadequate. Only with the formation of the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS, www.copdess.org), which extended the Editors Roundtable to include publishers and data facilities across the entire Earth and Space Sciences, along with the subsequent AGU project ‘Enabling FAIR Data’, has the implementation of new requirements by publishers, funders, and data repositories progressed and led to significant compliance with the FAIR Data Principles. Submission of geochemical data to open and FAIR repositories has increased substantially. Nevertheless, standard guidelines for documenting geochemical data and standard protocols for exchanging geochemical data among distributed data systems still need to be defined, and structures to govern such standards need to be identified by the global geochemistry community. Professional societies such as the Geochemical Society, the European Association of Geochemistry, and the International Association of GeoChemistry can and should take a leading role in this process.

How to cite: Goldstein, S. L., Lehnert, K., and Hofmann, A. W.: Managing Open and FAIR Data in Geochemistry: Where are we a decade after the Editors Roundtable?, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16420, https://doi.org/10.5194/egusphere-egu21-16420, 2021.

EGU21-10569 | vPICO presentations | ESSI3.9

SciDataMover: Moving Geochemistry Data from the Lab through to Publication

Ryan Fraser, Samuel Boone, Alexander Prent, Jens Klump, and Guido Aben

The SciDataMover platform is a discipline- and scale-agnostic, lightweight, open source Data Movement Platform that transfers data, coupled with metadata from laboratories to shared workspaces then to repositories. The SciDataMover Platform leverages lightweight existing technologies that have a demonstrated capacity to be sustainably managed and can be affordably maintained.

Despite significant investments in analytical instruments in Australian research laboratories relevant to earth sciences and particularly geochemistry, there has been underinvestment in storage and efficient, lossless transfer of data from ‘Private’ lab instruments to ‘Collaboration’ domains where researchers can analyse and share data, and then persist it to trusted ‘Publication’ domains where researchers can persistently store the data that supports their scholarly publications.

SciDataMover is a FAIR data movement platform that enables data from instruments to move in a scalable and sustainable manner and comprises:

1) a data service to transfer data/metadata directly from instruments
2) collaboration areas to process, refine, standardise and share this data
3) a mechanism to transfer data supporting publications to a trusted repository (e.g., domain, institutional).

The Platform, being built off existing components will enable researchers to have readily available access to laboratory data when and where they need it, along with the ability to collaborate with colleagues even during a pandemic where physical distancing is required. The benefits of SciDataMover are long term persistence of laboratory-generated data (at various stages from minimally processed to final published form), greater collaboration efficiency and enhanced scientific reproducibility.

How to cite: Fraser, R., Boone, S., Prent, A., Klump, J., and Aben, G.: SciDataMover: Moving Geochemistry Data from the Lab through to Publication, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10569, https://doi.org/10.5194/egusphere-egu21-10569, 2021.

EGU21-13832 | vPICO presentations | ESSI3.9

Implementing the Sparrow laboratory data system in multiple subdomains of geochronology and geochemistry

Daven Quinn, Benjamin Linzmeier, Kurt Sundell, George Gehrels, Simon Goring, Shaun Marcott, Stephen Meyers, Shanan Peters, Jake Ross, Mark Schmitz, Bradley Singer, and John Williams

Data sharing between laboratories is critical for building repeatable, comparable, and robust geochronology and geochemistry workflows. Meanwhile, in the broader geosciences, there is an increasing need for standardized access to aggregated geochemical data tied to basic geological context. Such data can be used to enrich sample and geochemical data repositories (e.g., EarthChem, Geochron.org, publisher archives), align geochemical context with other datasets that capture global change (e.g., Neotoma, the Paleobiology Database), and calibrate digital Earth models (e.g., Macrostrat) against geochronology-driven assessments of geologic time.

A typical geochemical lab manages a large archive of interpreted data; standardizing and contributing data products to community-level archives entails significant manual work that is not usually undertaken. Furthermore, without widely accepted  interchange formats, this effort must be repeated for each intended destination.

Sparrow (https://sparrow-data.org), in development by a consortium of geochronology labs, is a standardized system designed to support labs’ efforts to manage, contextualize, and share their geochemical data. The system augments existing analytical workflows with tools to manage metadata (e.g., projects, sample context, embargo status) and software interfaces for automated data exchange with community facilities. It is extensible for a wide variety of geochemical methods and analytical processes.

In this update, we will report on the implementation of Sparrow in the Arizona Laserchron Center detrital zircon facility, and how that lab is using the system to capture geological context across its data archive. We will review similar integrations underway with U-Pb, 40Ar/39Ar, SIMS, optically stimulated luminescence, thermochronology, and cosmogenic nuclide dating. We will also discuss preliminary efforts to aggregate the output of multiple chronometers to refine age calibrations for the Macrostrat stratigraphic model.

How to cite: Quinn, D., Linzmeier, B., Sundell, K., Gehrels, G., Goring, S., Marcott, S., Meyers, S., Peters, S., Ross, J., Schmitz, M., Singer, B., and Williams, J.: Implementing the Sparrow laboratory data system in multiple subdomains of geochronology and geochemistry, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13832, https://doi.org/10.5194/egusphere-egu21-13832, 2021.

EGU21-3595 | vPICO presentations | ESSI3.9

Advancing Data Curation and Archiving: an Application of Coding to Lab Management in the Geosciences

Tierney Latham, Catherine Beck, Bruce Wegter, and Ahra Wu

Increases in technology have rapidly advanced the capabilities and ubiquity of scientific instrumentation. Coupled with the demand for increased transparency and reproducibility in science, these advances have necessitated new systems of data management and archival practices. Laboratories are working to update their methods of data curation in line with these evolving best-practices, moving data from often disorderly private domains to publicly available, collaborative platforms. At the Hamilton Isotope Laboratory (HIL) of Hamilton College, the isotope ratio mass spectrometer (IRMS) is utilized across STEM disciplines for a combination of student, faculty, and course-related research, including both internal and external users. With over 200 sets of analytical runs processed in the past five years, documenting instrument usage and archiving the data produced is crucial to maintaining a state-of-the-art facility. However, previous to this project, the HIL faced significant barriers to proper data curation, storage, and accessibility including: a) data files were produced with variable format and nomenclature; b) data files were difficult to interpret without explanation from the lab technician; c) key metadata tying results to respective researchers and projects were missing; d) accessibility to data was limited due to storage on an individual computer; and e) data curation was an intellectual responsibility and burden for the lab technician. Additionally, as the HIL is housed within an undergraduate institution, the high rate of turnover for lab groups created additional barriers to the preservation of long-term, institutional knowledge, as students worked with the HIL for a year or less. These factors necessitate the establishment of new data management practices to ensure accessibility and longevity of scientific data and metadata. In this project, 283 Excel files of previously recorded data generated by the HIL IRMS were modified and cleaned to prepare data for submission to EarthChem, a public repository for geochemical data. Existing Excel files were manually manipulated, several original R code scripts were generated and employed, and procedures were established to backtrace projects and collect key metadata. Most critically, a new internal system of data collection was established with standardized nomenclature and framework. For future usage of the IRMS, data will be exported directly into a template compatible with EarthChem, thereby removing barriers for principal investigators (PIs) and research groups to archive their data in the public domain upon completion of their projects and publications.

How to cite: Latham, T., Beck, C., Wegter, B., and Wu, A.: Advancing Data Curation and Archiving: an Application of Coding to Lab Management in the Geosciences, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3595, https://doi.org/10.5194/egusphere-egu21-3595, 2021.

EGU21-6147 | vPICO presentations | ESSI3.9

Petrological microscopy data workflow – an example from Cap de Creus, NE Spain

Richard Wessels, Thijmen Kok, Hans van Melick, and Martyn Drury

Publishing research data in a Findable, Accessible, Interoperable, and Reusable (FAIR) manner is increasingly valued and nowadays often required by publishers and funders. Because experimental research data provide the backbone for scientific publications, it is important to publish this data as FAIRly as possible to enable reuse and citation of the data, thereby increasing the impact of research.

The structural geology group at Utrecht University is collaborating with the EarthCube-funded StraboSpot initiative to develop (meta)data schemas, templates and workflows, to support researchers in collecting and publishing petrological and microstructural data. This data will be made available in a FAIR manner through the EPOS (European Plate Observing System) data publication chain (https://epos-msl.uu.nl/).

The data workflow under development currently includes: a) collecting structural field (meta)data compliant with the StraboSpot protocols, b) creating thin sections oriented in three dimensions by applying a notch system (Tikoff et al., 2019), c) scanning and digitizing thin sections using a high-resolution scanner, d) automated mineralogy through EDS on a SEM, and e) high-resolution geochemistry using a microprobe. The purpose of this workflow is to be able to track geochemical and structural measurements and observations throughout the analytical process.

This workflow is applied to samples from the Cap de Creus region in northeast Spain. Located in the axial zone of the Pyrenees, the pre-Cambrian metasediments underwent HT-LP greenschist- to amphibolite-facies metamorphism, are intruded by pegmatitic bodies, and transected by greenschist-facies shear zones. Cap de Creus is a natural laboratory for studying the deformation history of the Pyrenees, and samples from the region are ideal to test and refine the data workflow. In particular, the geochemical data collected under this workflow is used as input for modelling the bulk rock composition using Perple_X.    

In the near future the workflow will be complimented by adding unique identifiers to the collected samples using IGSN (International Geo Sample Number), and by incorporating a StraboSpot-developed application for microscopy-based image correlation. This workflow will be refined and included in the broader correlative microscopy workflow that will be applied in the upcoming EXCITE project, an H2020-funded European collaboration of electron and x-ray microscopy facilities and researchers aimed at structural and chemical imaging of earth materials. 

How to cite: Wessels, R., Kok, T., van Melick, H., and Drury, M.: Petrological microscopy data workflow – an example from Cap de Creus, NE Spain, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6147, https://doi.org/10.5194/egusphere-egu21-6147, 2021.

EGU21-16405 | vPICO presentations | ESSI3.9

Collecting geochemical data of deep formation fluids for Geothermal Fluid Atlas for Europe

Katrin Kieling, Simona Regenspurg, Károly Kovács, Zsombor Fekete, Alberto Sánchez Miravalles, Tamás Madarász, and Éva Hartai

Most problems in deep geothermal operations are related to the chemistry of the geothermal fluid, which might cause deleterious physical and chemical reactions such as degassing and mineral precipitation or corrosion. However, data related the fluid properties are still scarce, largely as a consequence of the difficulty in determining these properties at in situ geothermal conditions, and the fact that those data are scattered across countries and often the “property” of commercial operators of geothermal power plants.

The EU H2020 project REFLECT aims to collect existing and new data on geothermal fluids across Europe through field measurements, detailed lab experiments simulating in situ conditions, and by calculations. These data will be implemented in case-specific predictive models simulating reactions at geothermal sites, as well as in a European geothermal Fluid Atlas.

To harmonize the metadata information for different fluid samples, REFLECT partners plan to register IGSNs (International Geo Sample Numbers) for fluid and reservoir rock samples collected and analysed within the project. The IGSN is a unique sample identifier, i.e. it is the equivalent to a DOI for publications. It was originally developed for drill cores and extended for various sample types, including fluid samples (seawater, river or lake water, hydrothermal fluids, porewater). Registration of fluid and rock samples with an IGSN will help to allow making the data accessible and re-usable even if the fluid sample itself is destroyed.

All data produced and collected within REFLECT form the base of the European Geothermal Fluid Atlas, which will include query and filtering tools to explore the database with a GIS based map visualization. The Atlas makes the data accessible to the geothermal community and the general public. The aim is to create a database, which can easily be integrated into other databases, such that the Fluid Atlas can be an addition to already existing initiatives of geological data collection.

How to cite: Kieling, K., Regenspurg, S., Kovács, K., Fekete, Z., Sánchez Miravalles, A., Madarász, T., and Hartai, É.: Collecting geochemical data of deep formation fluids for Geothermal Fluid Atlas for Europe, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16405, https://doi.org/10.5194/egusphere-egu21-16405, 2021.

EGU21-14458 | vPICO presentations | ESSI3.9

Preserving High Value Legacy Collections for Future Research – The McNaughton Collection

Eleanore Blereau, Amanda Bellenger, and Brent McInnes

During his long career in ionprobe geochemistry, Professor Neal McNaughton built up an impressive collection of samples. Professor McNaughton served as SHRIMP geochronologist for the Centre of Global Metallogeny at the University of Western Australia (1994-2005), the Western Australia Centre for Exploration Targeting (2005-2007), and the John de Laeter Centre (JdLC) at Curtin University (2007-2019), and upon his retirement he donated his collection of epoxy mounted samples to the GSWA. This collection of over 1000 mounts containing over 4000 samples is full of irreplaceable samples, representing over 20 years of geochronological research and development on the SHRIMP II in the JdLC. The collection is a highly valuable resource for future geochemical and geochronological research however, the entire collection lacked a digital footprint. When this project started there was a distinct lack of a unified approach for geoscience metadata or a template for preserving such a collection. In a jointly funded effort by AuScope, GSWA and Curtin University a digital sample catalogue of the collection with digitised materials was successfully created. We operated under the FAIR data principals and utilised International Geo Sample Numbers (IGSNs) as persistent identifiers to create the most impactful, accessible and visible product. The final catalogue, associated metadata and digital materials are now publicly available online on a number of digital platforms such as Research Data Australia and GSWA’s GeoVIEW.WA and the mounts are able to be borrowed from GSWA for future analysis. These efforts allowed the preservation of physical materials for future loans and analysis as well as visibility in our digital age. We will outline the template and workflow utilised by this project that can be used to preserve similarly high value collections and by current facilities, universities and researchers in their ongoing research, as well as insights for future efforts.

How to cite: Blereau, E., Bellenger, A., and McInnes, B.: Preserving High Value Legacy Collections for Future Research – The McNaughton Collection, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14458, https://doi.org/10.5194/egusphere-egu21-14458, 2021.

EGU21-15037 | vPICO presentations | ESSI3.9

How to turn kilos of mud into megabytes of data? 10 years of efforts in curating lake sediment cores and their associated results

Fabien Arnaud, Cécile Pignol, Bruno Galabertier, Xavier Crosta, Isabelle Billy, Elodie Godinho, Karim Bernardet, Pierre Sabatier, Anne-Lise Develle, Rosalie Bruel, Julien Penguen, Pascal Calvat, Pierre Stéphan, and Mathias Rouan

Here we present a series of connected efforts aiming at curating sediment cores and their related data. Far to be isolated, these efforts were conducted within national structured projects and led to the development of digital solutions and good practices in-line with international standards and practices.

Our efforts aimed at ensuring FAIR-compatible practices (Plomp, 2020; Wilkinson et al., 2016) throughout the life cycle of sediment cores, from fieldwork to published data. We adopted a step-by-step, bottom-up strategy to formalize a dataflow, mirroring our workflow. We hence created a fieldwork mobile application (CoreBook) to gather information during coring operations and inject them toward the French national virtual core repository “Cyber-Carothèque Nationale” (CCN). At this stage, the allocation of an international persistent unique identifier was crucial and we naturally chose the IGSN.

Beyond the traceability of samples, the curation of analysis data remains challenging. Most international repository (e.g. NOAA palaeo-data, PANGAEA) have taken the problem from the top by offering facilities to display published dataset with persistant unique identifier (DOI). Yet, those data are only a fraction of the gross amount of acquired data. Moreover, those repositories have very low requirements when it comes to the preservation and display of metadata, in particular analytical parameters, but also fieldwork data which are essential for data reusability. Finally, these repositories do not permit to get a synoptic view on the several strata of analyses that have been conducted on the same core through different research programs and publications. A partial solution is proposed by the eLTER metadata standard DEIMS, which offers a discovery interface of rich metadata. In order to bridge the gap between generalist data repositories and samples display systems (such as CCN, but also IMLGS, to cite an international system), we developed a data repository and visualizer dedicated to the re-use of lake sediment cores, samples and sampling locations (ROZA Retro-Observatory of the Zone Atelier). This system is still a prototype but opens yet interesting perspectives.

Finally, the digital evolution of science allows the worldwide diffusion of data processing freewares. In that framework, we developed “Serac” an open-source R package to establish radionuclide-based age models following the most common sedimentation hypotheses (serac,). By implementing within this R package the input of a rich metadata file that gathers links to IGSN and other quality metadata, we are linking fieldwork metadata, the physical storage of the core and the analytical metadata. Indeed, Serac also stores data processing procedure in a standardized way.. We hence think that the development of such softwares could help in the spreading of good practices in data curation and favour the use of unique identifiers.

By tackling all aspects of data creation and curation throughout a lake sediment core life cycle, we are now able to propose a theoretical model of data curation for this particular type of sample that could serve as the sole for further developments of integrated data curation systems.

How to cite: Arnaud, F., Pignol, C., Galabertier, B., Crosta, X., Billy, I., Godinho, E., Bernardet, K., Sabatier, P., Develle, A.-L., Bruel, R., Penguen, J., Calvat, P., Stéphan, P., and Rouan, M.: How to turn kilos of mud into megabytes of data? 10 years of efforts in curating lake sediment cores and their associated results, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15037, https://doi.org/10.5194/egusphere-egu21-15037, 2021.

EGU21-16550 | vPICO presentations | ESSI3.9

AusGeochem and Big Data Analytics in Low-Temperature Thermochronology

Samuel Boone, Fabian Kohlmann, Moritz Theile, Wayne Noble, Barry Kohn, Stijn Glorie, Martin Danišík, and Renjie Zhou

The AuScope Geochemistry Network (AGN) and partners Lithodat Pty Ltd are developing AusGeochem, a novel cloud-based platform for Australian-produced geochemistry data from around the globe. The open platform will allow laboratories to upload, archive, disseminate and publish their datasets, as well as perform statistical analyses and data synthesis within the context of large volumes of publicly funded geochemical data. As part of this endeavour, representatives from four Australian low-temperature thermochronology laboratories (University of Melbourne, University of Adelaide, Curtin University and University of Queensland) are advising the AGN and Lithodat on the development of low-temperature thermochronology (LTT)-specific data models for the relational AusGeochem database and its international counterpart, LithoSurfer. These schemas will facilitate the structured archiving of a wide variety of thermochronology data, enabling geoscientists to readily perform LTT Big Data analytics and gain new insights into the thermo-tectonic evolution of Earth’s crust.

Adopting established international data reporting best practices, the LTT expert advisory group has designed database schemas for the fission track and (U-Th-Sm)/He methods, as well as for thermal history modelling results and metadata. In addition to recording the parameters required for LTT analyses, the schemas include fields for reference material results and error reporting, allowing AusGeochem users to independently perform QA/QC on data archived in the database. Development of scripts for the automated upload of data directly from analytical instruments into AusGeochem using its open-source Application Programming Interface are currently under way.

The advent of a LTT relational database heralds the beginning of a new era of Big Data analytics in the field of low-temperature thermochronology. By methodically archiving detailed LTT (meta-)data in structured schemas, intractably large datasets comprising 1000s of analyses produced by numerous laboratories can be readily interrogated in new and powerful ways. These include rapid derivation of inter-data relationships, facilitating on-the-fly age computation, statistical analysis and data visualisation. With the detailed LTT data stored in relational schemas, measurements can then be re-calculated and re-modelled using user-defined constants and kinetic algorithms. This enables analyses determined using different parameters to be equated and compared across regional- to global scales.

The development of this novel tool heralds the beginning of a new era of structured Big Data in the field of low-temperature thermochronology, improving laboratories’ ability to manage and share their data in alignment with FAIR data principles while enabling analysts to readily interrogate intractably large datasets in new and powerful ways.

How to cite: Boone, S., Kohlmann, F., Theile, M., Noble, W., Kohn, B., Glorie, S., Danišík, M., and Zhou, R.: AusGeochem and Big Data Analytics in Low-Temperature Thermochronology, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16550, https://doi.org/10.5194/egusphere-egu21-16550, 2021.

EGU21-3363 | vPICO presentations | ESSI3.9

Ordination analyses in sedimentology, geochemistry and paleoenvironment - current trends and recommendations 

Or Mordechay Bialik, Emilia Jarochowska, and Michal Grossowicz

Ordination is a family of multivariate exploratory data analysis methods. With the advent of high-throughput data acquisition protocols, community databases, and multiproxy studies, the use of ordination in Earth sciences has snowballed. As data management and analytical tools expand, this growing body of knowledge opens new possibilities of meta-analyses and data-mining across studies. This requires the analyses to be chosen adequately to the character of Earth science data, including pre-treatment consistent with the precision and accuracy of the variables, as well as appropriate documentation. To investigate the current situation in Earth sciences, we surveyed 174 ordination analyses in 163 publications in the fields of geochemistry, sedimentology and palaeoenvironmental reconstruction and monitoring. We focussed on studies using Principal Component Analysis (PCA), Non-Metric Multidimensional Scaling (NMDS) and Detrended Correspondence Analysis (DCA).

PCA was the most ubiquitous type of analysis (84%), with the other two accounting for ca. 12% each. Of 128 uses of PCA, only 5 included a test for normality, and most of these cases were not applied or documented correctly. Common problems include: (1) not providing information on the dimensions of the analysed matrix (16% cases); (2) using a larger number of variables than observations (24 cases); (3) not documenting the distance metric used in NMDS (55% cases); and (4) lack of information on the software used (38% cases). The majority (53%) of surveyed studies did not provide the data used for analysis at all and a further 35% provided data sets in a format that does not allow immediate, error-free reuse, e.g. as data table directly in the article text or in PDF appendix. The “golden standard” of placing a curated data set in an open access repository was followed only by 6 (3%) of the analyses. Among analyses which reported using code-based statistical environments such as R Software, SAS or SPSS, none provided the code that would allow reproducing the analyses.

Geochemical and Earth science data sets require expert knowledge which should support analytical decisions and interpretations. Data analysis skills attract students to Earth sciences study programmes and offer a viable research alternative when field- or lab-based work is limited. However, many study curricula and publishing process have not yet endorsed this methodological progress, leading to situations where mentors, reviewers and editors cannot offer quality assurance for the use of ordination methods. We provide a review of solutions and annotated R Software code for PCA, NMDA and DCA of geochemical data sets in the freeware R Software environment, encouraging the community to reuse and further develop a reproducible ordination workflow.

How to cite: Bialik, O. M., Jarochowska, E., and Grossowicz, M.: Ordination analyses in sedimentology, geochemistry and paleoenvironment - current trends and recommendations , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3363, https://doi.org/10.5194/egusphere-egu21-3363, 2021.

EGU21-10344 | vPICO presentations | ESSI3.9

Managing geochemical data within the U.S. Geological Survey: An overview of policies and approaches from the perspective of the Energy Resources Program

Justin Birdwell, Christina DeVera, Katherine French, Steve Groves, Gregory Gunther, Aaron Jubb, Toni Rozinek, Augusta Warden, and Michele Wolf

The mission of the U.S. Geological Survey (USGS) Energy Resources Program (ERP) is to provide unbiased scientific information to stakeholders by conducting and disseminating research into energy-related issues mandated by the Administration or Congress or guided by ERP and USGS leadership. USGS Fundamental Science Practices (FSP) form the foundation for these efforts, representing a set of consistent procedures, ethical requirements, and operational principles that direct how research activities are conducted to ensure the highest standard of scientific integrity and transparency. Policies created to meet the goals of FSP guide how work is performed and how resulting information products are curated through the development, review, and approval processes. Though FSP have been a core part of the USGS mission since its inception, several new policies have been developed and implemented over the last decade related to data generation, management, and distribution to make practices, particularly those involving laboratory-generated geochemical data, more standardized and consistent across the USGS’ different scientific mission areas.

The ERP has been at the forefront of implementing these policies, particularly those that relate to laboratory-based science. For example, a new USGS-wide Quality Management System (QMS) was initially rolled out in ERP laboratories. QMS quality assurance requirements for laboratories were developed to ensure generation of data of known and documented quality and to support a culture of continuous improvement. QMS requirements include controls on sample receipt, login, and storage; documentation of data generation methods and standard operating procedures for sample preparation and analysis; and quality control procedures around equipment calibration and maintenance and data acceptance criteria. Many of the requirements are currently being met in the Petroleum Geochemistry Research Laboratory (PGRL) through the use of a laboratory information management system (LIMS) which provides a centralized storage location for data recording, reduction, review, and reporting. Samples processed by PGRL are identified from login to reporting by a unique lab-assigned number. Data are reviewed by the analyst, a secondary reviewer, and the laboratory manager before being accepted or considered qualified to address issues identified during analysis. A similar documentation approach is also applied to new research methods, experimental work, or modifications of existing processes.

Once reported to a submitter, geochemistry data are then interpreted and incorporated into USGS reports and other outside publications that are tracked using a single information product data system (IPDS). IPDS facilitates management of the internal review and approval processes for USGS information products. For geochemistry studies, data releases containing machine-readable laboratory-generated results along with associated metadata documentation typically accompany publications and have their own review and approval process. Once generated, data releases are given unique digital object identifiers for citation and access persistence, stored in Science Base, a Trusted Digital Repository for USGS products, and are made accessible through the USGS Science Data Catalog (https://data.usgs.gov). This collection of systems makes it possible for ERP personnel to collect, manage, and track geochemical data and facilitate the timely delivery of high-quality scientific publications and datasets to the public and support decision makers to manage domestic natural resources.

How to cite: Birdwell, J., DeVera, C., French, K., Groves, S., Gunther, G., Jubb, A., Rozinek, T., Warden, A., and Wolf, M.: Managing geochemical data within the U.S. Geological Survey: An overview of policies and approaches from the perspective of the Energy Resources Program, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10344, https://doi.org/10.5194/egusphere-egu21-10344, 2021.

EGU21-7876 | vPICO presentations | ESSI3.9

How to publish your data with the EPOS Multi-scale Laboratories

Geertje ter Maat, Otto Lange, and Martyn Drury and the EPOS TCS Multi-scale Laboratories Team

EPOS (the European Plate Observing System) is a pan-European e-infrastructure framework with the goal of improving and facilitating the access, use, and re-use of Solid Earth science data. The EPOS Thematic Core Service Multi-scale Laboratories (TCS MSL) represent a community of European Solid Earth sciences laboratories including high-temperature and high-pressure experimental facilities, electron microscopy, micro-beam analysis, analogue tectonic and geodynamic modelling, paleomagnetism, and analytical laboratories. 

Participants and collaborating laboratories from Belgium, Bulgaria, France, Germany, Italy, Norway, Portugal, Spain, Switzerland, The Netherlands, and the UK are already represented within the TCS MSL. Unaffiliated European Solid Earth sciences laboratories are welcome and encouraged to join the growing TCS MSL community.

Laboratory facilities are an integral part of Earth science research. The diversity of methods employed in such infrastructures reflects the multi-scale nature of the Earth system and is essential for the understanding of its evolution, for the assessment of geo-hazards, and the sustainable exploitation of geo-resources.

Although experimental data from these laboratories often provide the backbone for scientific publications, they are often only available as images, graphs or tables in the text or as supplementary information to research articles. As a result, much of the collected data remains unpublished, not searchable or even inaccessible, and often only preserved in the short term.

The TCS MSL is committed to making Earth science laboratory data Findable, Accessible, Interoperable, and Reusable (FAIR). For this purpose, the TCS MSL encourages the community to share their data via DOI-referenced, citable data publications. To facilitate this and ensure the provision of rich metadata, we offer user-friendly tools, plus the necessary data management expertise, to support all aspects of data publishing for the benefit of individual lab researchers via partner repositories. Data published via TCS MSL are described with the use of sustainable metadata standards enriched with controlled vocabularies used in geosciences. The resulting data publications are also exposed through a designated TCS MSL online portal that brings together DOI-referenced data publications from partner research data repositories (https://epos-msl.uu.nl/). As such, efforts have already been made to interconnect new data (metadata exchange) with previous databases such as MagIC (paleomagnetic data in Earthref.org), and in the future, we expect to enlarge and improve this practice with other repositories. 

How to cite: ter Maat, G., Lange, O., and Drury, M. and the EPOS TCS Multi-scale Laboratories Team: How to publish your data with the EPOS Multi-scale Laboratories, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7876, https://doi.org/10.5194/egusphere-egu21-7876, 2021.

EGU21-16459 | vPICO presentations | ESSI3.9

EarthChem Communities: Building Geochemical Data Best Practices with Researcher Engagement

Lucia Profeta, Kerstin Lehnert, Lulin Song, and Juan David Figueroa

Acquisition and use of geochemical data are pervasive in the Earth, Environmental and Planetary Sciences as they are fundamental to our understanding of past, present, and future processes in natural systems, from the interior of the Earth to its surface environments on land, in the oceans, and in the air, to the entire solar system. Accordingly, the range of research communities that generate and use geochemical data is quite extensive. Data practices and workflows for processing, reporting, sharing, and using data are numerous and distinct for different research communities. Furthermore, the type of data generated is highly diverse with respect to analyzed parameters, analyzed materials, analytical techniques and instrumentation, as well as volume, size, and format. This makes it difficult to define generally applicable best practices and standards for geochemical data that the entire range of geochemical data communities will adopt. While it is technically possible to describe and encode the large variety of geochemical measurements in a consistent, unifying way provided by the Observations and Measurements conceptual model (https://www.ogc.org/standards/om), communities need to build consensus around specifics in data formats, metadata, and vocabularies, and most importantly, they need to ‘own’ the best practices to ensure adoption. 

EarthChem is a data facility for geochemistry, funded by the US National Science Foundation since 2006, to develop and operate community-driven services that support the discovery, access, preservation, reusability, and interoperability of geochemical data. EarthChem has a long record of engaging with the global research community to develop and promote data best practices for geochemistry by, for example, initiating and helping to organize the Editors Roundtable (Goldstein et al. 2014, http://dx.doi.org/10.1594/IEDA/100426). In recent years, as researchers have become increasingly aware of the benefits and requirements of FAIR data management, EarthChem has supported research communities wanting to establish consistent data formats and rich metadata for better findability and reproducibility of specific data types acquired and used within these communities. EarthChem now works with community advisers to build consensus around data best practices, provide resources for researchers to comply with these best practices, and streamline data submission and data access for these communities. EarthChem provides Community web pages as spaces to explain community-specific best practices, offer downloadable data templates, and link to customized community portals for data submission and access. EarthChem is in the process of defining guidelines and policies that will ensure that the best practices and data templates promoted by an EarthChem Community are indeed community endorsed. By making sure that the community-specific best practices align with more general data standards such as the elements of the O&M conceptual data model or the use of globally unique identifiers for samples, EarthChem Communities can advance overarching data best practices and standards that will improve reusability of geochemical data and data exchange among distributed databases. Initial EarthChem Communities include Tephra, Clumped Isotopes, and Experimental Petrology. Additional communities such as GeoHealth and Laser Induced Breakdown Spectroscopy are currently in an exploratory stage.

How to cite: Profeta, L., Lehnert, K., Song, L., and Figueroa, J. D.: EarthChem Communities: Building Geochemical Data Best Practices with Researcher Engagement, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16459, https://doi.org/10.5194/egusphere-egu21-16459, 2021.

EGU21-2521 | vPICO presentations | ESSI3.9

Automation of (meta-)data workflows from field to data repository

Linda Baldewein, Ulrike Kleeberg, and Lars Möller

In Earth and environmental sciences data analyzed from field samples are a significant portion of all research data, oftentimes collected under significant costs and non-reproducibly. If important metadata is not immediately secured and stored in the field, the quality and re-usability of the resulting data will be diminished.  

At the Helmholtz Coastal Data Center (HCDC) a metadata and data workflow for biogeochemical data has been developed over the last couple of years to ensure the quality and richness of metadata and enable that the final data product will be FAIR. It automates and standardizes the data transfer from the campaign planning stage, through sample collection in the field, analysis and quality control to the storage into databases and the publication in repositories.

Prior to any sampling campaign, the scientists are equipped with a customized app on a tablet that enables them to record relevant metadata information, such as the date and time of sampling, the involved scientists and the type of sample collected. Each sample and station already receives a unique identifier at this stage. The location is directly retrieved from a high-accuracy GNSS receiver connected to the tablet. This metadata is transmitted via mobile data transfer to the institution’s cloud storage.

After the campaign, the metadata is quality checked by the field scientists and the data curator and stored in a relational database. Once the samples are analyzed in the lab, the data is imported into the database and connected to the corresponding metadata using a template. Data DOIs are registered for finalized datasets in close collaboration with the World Data Center PANGAEA. The data sets are discoverable through their DOIs as well as through the HCDC data portal and the API of the metadata catalogue service.

This workflow is well established within the institute, but is still in the process of being refined and becoming more sophisticated and FAIRer. For example, an automated assignment of International Geo Sample Numbers (IGSN) for all samples is currently being planned.

How to cite: Baldewein, L., Kleeberg, U., and Möller, L.: Automation of (meta-)data workflows from field to data repository, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2521, https://doi.org/10.5194/egusphere-egu21-2521, 2021.

EGU21-14628 | vPICO presentations | ESSI3.9

The AuScope Geochemistry Network and the AusGeochem geochemistry data platform

Alexander Prent, Hayden Dalton, Samuel Boone, Guillaume Florin, Yoann Greau, Brent McInnes, Andrew Gleadow, Suzanne O'Reilly, Barry Kohn, Erin Matchan, Olivier Alard, Tim Rawling, Fabian Kohlmann, Moritz Theile, and Wayne Noble

The AuScope Geochemistry Network (AGN, www.auscope.org.au/agn) was established in 2019 in response to a community expressed desire for closer collaboration and coordination of activities between Australian geochemistry laboratories. Its aims include: i) promotion of capital and operational investments in new, advanced geochemical infrastructure; (ii) supporting increased end user access to laboratory facilities and research data; (iii) fostering collaboration and professional development via online tools, training courses and workshops. Over the last six months, the AGN has coordinated a monthly webinar series to engage the geoscience community, promote FAIR data practices and foster new collaborations. These webinars were recorded for future use and can be found at: www.youtube.com/channel/UC0zzzc6_mrJEEdCS_G4HYgg.

A primary goal of the AGN is to make the networks’ laboratory geochemistry data, from around the globe, discoverable and accessible via development of an online data platform called AusGeochem (www.auscope.org.au/ausgeochem). Geochemical data models for SHRIMP U-Pb, Fission Track, U-Th/He, LA-ICP-MS U-Pb/Lu-Hf and Ar-Ar are being developed using international best practice and are informed by expert advisory groups consisting of members from various institutes and laboratories within Australia. AusGeochem is being designed to provide an online data service for analytical laboratories and researchers where sample and analytical data can be uploaded (privately) for processing, synthesis and secure dissemination to collaborators. Researcher data can be retained in a private space but studied within the context of other publicly available data. Researchers can also generate unique international geo sample numbers (IGSNs) for their samples via a build in link to the Australian Research Data Commons IGSN registry. 

AusGeochem supports FAIR data practices by providing researchers with the ability to include links to their AusGeochem registered data in research publications, providing a potential opportunity for AusGeochem to become a trusted data repository.

How to cite: Prent, A., Dalton, H., Boone, S., Florin, G., Greau, Y., McInnes, B., Gleadow, A., O'Reilly, S., Kohn, B., Matchan, E., Alard, O., Rawling, T., Kohlmann, F., Theile, M., and Noble, W.: The AuScope Geochemistry Network and the AusGeochem geochemistry data platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14628, https://doi.org/10.5194/egusphere-egu21-14628, 2021.

EGU21-13940 | vPICO presentations | ESSI3.9

Linking data systems into a collaborative pipeline for geochemical data from field to archive

Kerstin Lehnert, Daven Quinn, Basil Tikoff, Douglas Walker, Sarah Ramdeen, Lucia Profeta, Shanan Peters, and Jonathan Pauli

Management of geochemical data needs to consider the sequence of phases in the lifecycle of these data from field to lab to publication to archive. It also needs to address the large variety of chemical properties measured; the wide range of materials that are analyzed; the different ways, in which these materials may be prepared for analysis; the diversity of analytical techniques and instrumentation used to obtain analytical results; and the many ways used to calibrate and correct raw data, normalize them to standard reference materials, and otherwise treat them to obtain meaningful and comparable results. In order to extract knowledge from the data, they are then integrated and compared with other measurements, formatted for visualization, statistical analysis, or model generation, and finally cleaned and organized for publication and deposition in a data repository. Each phase in the geochemical data lifecycle has its specific workflows and metadata that need to be recorded to fully document the provenance of the data so that others can reproduce the results.

An increasing number of software tools are developed to support the different phases of the geochemical data lifecycle. These include electronic field notebooks, digital lab books, and Jupyter notebooks for data analysis, as well as data submission forms and templates. These tools are mostly disconnected and often require manual transcription or copying and pasting of data and metadata from one tool to the other. In an ideal world, these tools would be connected so that field observations gathered in a digital field notebook, such as sample locations and sampling dates, can be seamlessly send to an IGSN Allocating Agent to obtain a unique sample identifier with a QR code with a single click. The sample metadata would be readily accessible for the lab data management system that allows the researchers to capture information about the sample preparation, and that connects to the instrumentation to capture instrument settings and the raw data. The data would then be seamlessly accessed by data reduction software, visualized, and further compared to data from global databases that can be directly accessed. Ultimately, a few clicks will allow the user to format the data for publication and archiving.

Several data systems that support different stages in the lifecycle of samples and sample-based geochemical data have now come together to explore the development of standardized interfaces and APIs and consistent data and metadata schemas to link their systems into an efficient pipeline for geochemical data from the field to the archive. These systems include StraboSpot (www.strabospot.org; data system for digital collection, storage, and sharing of both field and lab data), SESAR (www.geosamples.org; sample registry and allocating agent for IGSN), EarthChem (www.earthchem.org; publishers and repository for geochemical data), Sparrow (sparrow-data.org; data system to organize analytical data and track project- and sample-level metadata), IsoBank (isobank.org; repository for stable isotope data), and MacroStrat (macrostrat.org; collaborative platform for geological data exploration and integration).

How to cite: Lehnert, K., Quinn, D., Tikoff, B., Walker, D., Ramdeen, S., Profeta, L., Peters, S., and Pauli, J.: Linking data systems into a collaborative pipeline for geochemical data from field to archive, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13940, https://doi.org/10.5194/egusphere-egu21-13940, 2021.

ESSI4.2 – Innovations in Scientific Data Visualization

EGU21-7859 | vPICO presentations | ESSI4.2

Interactive 3-D visual analysis of ERA 5 data: improving diagnostic indices for Marine Cold Air Outbreaks

Marcel Meyer, Iuliia Polkova, and Marc Rautenhaus

We present the application of interactive 3-D visual analysis techniques using the open-source meteorological visualization framework Met.3D [1] for investigating ERA5 reanalysis data. Our focus lies on inspecting atmospheric conditions favoring the development of extreme weather events in the Arctic. Marine Cold Air Outbreaks (MCAOs) and Polar Lows (PLs) are analyzed with the aim of improving diagnostic indices for capturing extreme weather events in seasonal and climatological assessments. We adopt an integrated workflow starting with the interactive visual exploration of single MCAO and PL events, using an extended version of Met.3D, followed by the design and testing of new diagnostic indices in a climatological assessment. Our interactive visual exploration provides insights into the complex 3-D shape and dynamics of MCAOs and PLs. For instance, we reveal a slow wind eye of a PL that extends from the surface up into the stratosphere. Motivated by the interactive visual analysis of single cases of MCAOs, we design new diagnostic indices, which address shortcomings of previously used indices, by capturing the vertical extent of the lower-level static instability induced by MCAOs. The new indices are tested by comparison with observed PLs in the Barents and the Nordic Seas (as reported in the STARS data set). Results show that the new MCAO index introduced here has an important advantage compared with previously used MCAO indices: it is more successful in indicating the times and locations of PLs. We thus propose the new index for further analyses in seasonal climate predictions and climatological studies. The methods for interactive 3-D visual data analysis presented here are made freely available for public use as part of the open-source tool Met.3D. We thereby provide a generic tool that can be used for investigating atmospheric processes in ERA5 data by means of interactive 3-D visual data analysis. Met.3D can be used, for example, during an initial explorative phase of scientific workflows, as a complement to standard 2-D plots, and for detailed meteorological case-analyses in 3-D.


[1] http://met3d.wavestoweather.de, https://collaboration.cen.uni-hamburg.de/display/Met3D/

How to cite: Meyer, M., Polkova, I., and Rautenhaus, M.: Interactive 3-D visual analysis of ERA 5 data: improving diagnostic indices for Marine Cold Air Outbreaks, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7859, https://doi.org/10.5194/egusphere-egu21-7859, 2021.

EGU21-8801 | vPICO presentations | ESSI4.2

Visualising and experiencing geological flows in Virtual Reality

Emmanuel Delage, Benjamin Van Wyk de Vries, Meven Philippe, Susan Conway, Costanza Morino, Nelida Manrique Llerena, Rigoberto Aguilar Contreras, Yhon Soncco, Þorsteinn Sæmundsson, and Jón Kristinn Helgason

Resilience to natural hazards depends on a person's ability to envision an event and its consequences. While real life experience is precious, a real event experience is rare, and sometimes fatal. So, virtual reality provides a way to getting that experience more frequently and without the inconvenience of demise. Virtual reality can also enhance an event to make it more visible, as often things happen in bad weather, at night or in other inconvenient moments.

The 3DTeLC software (an output from an ERASMUS+ project, http://3dtelc.lmv.uca.fr/) can handle high-resolution 3D topographic models and the user can study natural hazard phenomena with geological tools in virtual reality. Topography acquired from drone or plane acquisitions, can be made more accessible to researchers, public and stakeholders. In the virtual environment a person can interact with the scene from the first person, drone or plane point of view and can do geological interpretation at different visualization scales. Immersive and interactive visualization is an efficient communication tool (e.g. Tibaldi et al 2019 – Bulletin of Volcanology DOI: https://dx.doi.org/10.1007/s00445-020-01376-6).

We have taken the 3DTeLC workflow and integrated a 2.5D flow simulation programme (VOLCFLOW-C). The dynamic outputs from VOLCFLOW-C are superimposed into a single visualization using a new tool developed from scratch, which we call VRVOLC. This coupled visualization adds dynamic and realistic understanding of events like lahars, lava flows, landslides and pyroclastic flows. We present two examples of this, one developed on the Digital Terrain Model of Chachani Volcano, Arequipa Peru, to assist with flood and lahar visualisation (in conjunction with INGEMMET, UNESCO IGCP project 692 Geoheritage for Resilience and Cap 20-25 Clermont Risk). And another with an Icelandic debris slide that occurred in late 2014 possibly related to permafrost degradation (in conjunction with the ANR PERMOLARDS project).

We thank out 3DTeCL colleagues, without which this would not be possible, and acknowledge financial support for the PERMOLARDS project from French National Research Agency (ANR-19-CE01-0010), and this is part of UNESCO IGCP 692 Geoheritage for Resilience.

How to cite: Delage, E., Van Wyk de Vries, B., Philippe, M., Conway, S., Morino, C., Manrique Llerena, N., Aguilar Contreras, R., Soncco, Y., Sæmundsson, Þ., and Kristinn Helgason, J.: Visualising and experiencing geological flows in Virtual Reality, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8801, https://doi.org/10.5194/egusphere-egu21-8801, 2021.

EGU21-10505 | vPICO presentations | ESSI4.2

Automatic Recognition of Power Line Harmonic Radiation Observed by the EFD On board the ZH-1 Satellite

ying Han, jing Yuan, qiao Wang, dehe Yang, and xuhui Sun

The power line harmonic generated by human activities can be found from the vast amount of the data observed by EFD on board the ZH-1 satellite. To study the human activities and remove the nonnegligible amount of interferences in the study of ionospheric precursors of earthquakes, we are desperate for finding the power line harmonic from the vast amount of data Hence, a novel automatic power line recognition method is proposed. Firstly, we utilize fourier transform on EFD data to obtain the power spectral density(PSD). Secondly, it is well known that harmonic radiation from power lines presents one or more horizontal linear characteristics on the PSD image and the color of the line is close to the color of the background in the image.In order to highlight the color contrast between the line and the background, we transform the PSD image from the RGB to the HSV color space and utilize the Saturation compoment of the HSV space as the object image.To obtain the edge regions, we process the object image with canny techniques. Finally, we use the Hough transform to detect the power line from the edge regions. To evaluate the proposed method, the experiment is performed for the dataset composed of 100 PSD images and each PSD image includes several interference lines. And the experimental result verifies the effectiveness of the proposed method with an accuracy of 86%.

How to cite: Han, Y., Yuan, J., Wang, Q., Yang, D., and Sun, X.: Automatic Recognition of Power Line Harmonic Radiation Observed by the EFD On board the ZH-1 Satellite, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10505, https://doi.org/10.5194/egusphere-egu21-10505, 2021.

EGU21-11208 | vPICO presentations | ESSI4.2 | Highlight

The “Scientific colour map” Initiative: Version 7 and its new additions

Fabio Crameri, Grace Shephard, and Philip Heron
  • Does visualisation hinder scientific progress?
  • Is visualisation widely misused to tweak data?
  • Is visualisation intentionally used for social exclusion?
  • Is visualisation taken seriously by academic leaders?

Using scientifically-derived colour palettes is a big step towards making it obsolete to even ask such brutal questions. Their perceptual uniformity leaves no room to highlight artificial boundaries, or hide real ones. Their perceptual order visually transfers data effortlessly and without delay. Their colour-vision deficient friendly nature leaves no reader left wondering. Their black-and-white readability leaves no printer accused of being not good enough. It is, indeed, the true nature of the data that is displayed to all viewers, in every way.

The “Scientific colour map” initiative (Crameri et al., 2020) provides free, citable colour palettes of all kinds for download for an extensive suite of software programs, a discussion around data types and colouring options, and a handy how-to guide for a professional use of colour combinations. Version 7 of the Scientific colour maps (Crameri, 2020) makes crucial new additions towards fairer and more effective science communication available to the science community.

Crameri, F., G.E. Shephard, and P.J. Heron (2020), The misuse of colour in science communication, Nature Communications, 11, 5444.

Crameri, F. (2020). Scientific colour maps. Zenodo. http://doi.org/10.5281/zenodo.1243862

How to cite: Crameri, F., Shephard, G., and Heron, P.: The “Scientific colour map” Initiative: Version 7 and its new additions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11208, https://doi.org/10.5194/egusphere-egu21-11208, 2021.

EGU21-11680 | vPICO presentations | ESSI4.2

Interactive visualization and topology-based analysis of large-scale time-varying remote-sensing data: challenges and opportunities

Riccardo Fellegara, Markus Flatken, Francesco De Zan, and Andreas Gerndt

Over the last few years, the amount of large and complex data in the public domain has increased enormously and new challenges arose in the representation, analysis and visualization of such data. Considering the number of space missions that provided and will provide remote sensing data, there is still the need of a system that can be dispatched in several remote repositories and being accessible from a single client of commodity hardware.

To tackle this challenge, at the DLR Institute for Software Technology we have defined a dual backend frontend system, enabling the interactive analysis and visualization of large-scale remote sensing data. The basis for all visualization and interaction approaches is CosmoScout VR, a visualization tool developed internally at DLR, and publicly available on Github, that allows the visualization of complex planetary data and large simulation data in real-time. The dual component of this system is based on an MPI framework, called Viracocha, that enables the analysis of large data remotely, and allows the efficient network usage about sending compact and partial results for interactive visualization in CosmoScout as soon as they are computed.

A node-based interface is defined within the visualization tool, and this lets a domain expert to easily define customized pipelines for processing and visualizing the remote data. Each “node” of this interface is either linked with a feature extraction module, defined in Viracocha, or to a rendering module defined directly in CosmoScout. Being this interface completely customizable by a user, multiple pipelines can be defined over the same dataset to enhance even more the visualization feedback for analysis purposes.

Being an ongoing project, on top of these tools, as a novel strategy in EO data processing and visualization, we plan to define and implement strategies based on Topological Data Analysis (TDA). TDA is an emerging set of technique for processing the data considering its topological features. These include both the geometric information associated to a point, as well all the non-geometric scalar values, like temperature and pressure, to name a few, that can be captured during a monitoring mission. One of the major theories behind TDA is Discrete Morse Theory, that, given a scalar value, is used to define a gradient on such function, extract the critical points, identify the region-of-influence of each critical point, and so on. This strategy is parameter free and enables a domain scientist to process large datasets without a prior knowledge of it.

An interesting research question, that it will be investigated during this project is the correlation of changes of critical points at different time steps, and the identification of deformation (or changes) across time in the original dataset.

How to cite: Fellegara, R., Flatken, M., De Zan, F., and Gerndt, A.: Interactive visualization and topology-based analysis of large-scale time-varying remote-sensing data: challenges and opportunities, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11680, https://doi.org/10.5194/egusphere-egu21-11680, 2021.

EGU21-12491 | vPICO presentations | ESSI4.2

Immersive Visualization of Ocean Data in a Game Engine

Felicia Brisc and Nuno Serra

Virtual Reality is expanding rapidly in many academic and industry areas as an important tool to represent 3D objects, while graphics hardware is becoming increasingly accessible. Conforming to these trends, we present an immersive VR environment created to help earth scientists and other users to visualize and study ocean simulation data and processes. Besides scientific exploration, we hope our environment will become a helpful tool in education and outreach. We combined a 1-year 3km MITgcm simulation with daily temporal resolution and a bathymetry digital elevation model in order to visualize the evolution of  Northeast Atlantic eddies enclosed by warm and salty Mediterranean Water. Our approach leverages the advanced rendering algorithms of a game engine in order to enable users to move around freely, interactively play the simulation and observe the changes and evolution of eddies in real time.

How to cite: Brisc, F. and Serra, N.: Immersive Visualization of Ocean Data in a Game Engine, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12491, https://doi.org/10.5194/egusphere-egu21-12491, 2021.

EGU21-15612 | vPICO presentations | ESSI4.2

Visualization of uncertainty in air quality ensemble forecasts

Angelika Heil and Augustin Colette

EGU21-14761 | vPICO presentations | ESSI4.2

Visualizing Sedimentary Condensation , Dilution, and Erosion using Shiny Apps

Niklas Hohmann and Emilia Jarochowska

Fossil accumulations can be generated by (1) high input of organism remains or (2) by low sedimentation rates, reducing the volume of sediment between individual fossils. This creates a paradox, in which shell beds may form in environments with low biomass production. This effect of sedimentary condensation on fossil abundance is easy to understand, however, its implications are hard to grasp and visualize.

We present the shellbed condensator ( https://stratigraphicpaleobiology.shinyapps.io/shellbed_condensator/ ), a web application that allows to interactively visualize and animate the effects of sedimentary condensation and erosion on fossil abundance and proxies recorded by the sedimentary record. It is an adaptation of the seminal computer simulation by Kidwell (1985). The application is written in R Software and uses the shiny package for the construction of the web interface and the DAIME package for the sedimentological model (Hohmann, 2021). It allows creating stratigraphic expressions and age models for combinations of fossil input and sedimentation rates defined by the user.

To assess the utility of shiny apps for teaching purposes, we examine student understanding of sedimentary condensation after unsupervised studying and after unsupervised usage of the app. Due to their strong visual and interactive components, shiny apps are a powerful and versatile tool for science communication, teaching, self-study, the visualization of large datasets, and the promotion of scientific findings.

 

How to cite: Hohmann, N. and Jarochowska, E.: Visualizing Sedimentary Condensation , Dilution, and Erosion using Shiny Apps, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14761, https://doi.org/10.5194/egusphere-egu21-14761, 2021.

EGU21-15802 | vPICO presentations | ESSI4.2

Collaborative Visualization of 3D Geological Models in Augmented Reality

Björn Wieczoreck

Fully understanding a complex 3D geological model (such as triangulated irregular networks or boundary representations) requires a largely hands-on approach. The user needs direct access to the model and a way to manipulate it in 3D space, e.g. through rotation, to find the appropriate and most useful perspectives. Indirect means of presentation, e.g. via animation, can only give the user a vague idea of the model and all its details, especially with the growing amount of data incorporated. Additionally, discussing such a model with colleagues is often restricted by the space in front of the monitor of the system running the modeling software. And while the accessibility of such models has been improved, e.g. through access via ordinary web browsers, new technologies such as VR and AR could open up novel and improved ways for users to experience and share them.

Although VR has found its way into the mainstream, especially for entertainment, it continues to be a relatively inaccessible technology. The high upfront cost, the need to isolate oneself from the surrounding environment, and involved technical requirements detract from the end goal of improving the accessibility of 3D geological models. On the other hand, more and more common handheld devices such as smartphones and tablets support AR and thus lower the barrier of entry for a large number of people. To analyze the potential of AR for the presentation and discussion of 3D geological models, a mobile app has been developed.

Started as a prototype during a geoscience hackathon, the app has now been rewritten from scratch and was uploaded to the iOS App Store. During the conceptualization phase of the features, the immense potential already became apparent. The app itself allows users to download a number of 3D geological models to their device and explore them in AR. They then have the possibility to share this model with up to seven other peers in the same room. This means that every user will see the model in the same space and in the same state. As soon as one user changes e.g. the size or rotation of the model, the new state will be synchronized with every connected peer. Discussion is aided by a "pointing" and "highlighting" feature to assure that everyone is talking about the same model part. The models are either stored on the device or can be downloaded via internet. For now, the models are supplied by GiGa infosystem's GST Web, but additional sources are being explored.

The delivery of the app with this basic featureset invites first user feedback and allows for a better exploration of possible applications. For example, viable use cases of this app can be found in academia as an easier way to communicate 3D models to students, during conferences as a presentation platform to give peers a guided tour of a model, or in modeling where advanced features such as digital boreholes or cross-sections can help verify intermediate results.

How to cite: Wieczoreck, B.: Collaborative Visualization of 3D Geological Models in Augmented Reality, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15802, https://doi.org/10.5194/egusphere-egu21-15802, 2021.

CC BY 4.0