Content:

ESSI – Earth & Space Science Informatics

ESSI1.2 – Addressing Training Data Challenges to Accelerate Earth Science Machine Learning

EGU21-12065 | vPICO presentations | ESSI1.2

Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance

Benjamin Kellenberger, Devis Tuia, and Dan Morris

Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.

In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.

Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.

AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.

AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.

 

References

How to cite: Kellenberger, B., Tuia, D., and Morris, D.: Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12065, https://doi.org/10.5194/egusphere-egu21-12065, 2021.

EGU21-6853 | vPICO presentations | ESSI1.2

Curator: A No-Code Self-Supervised Learning and Active Labeling Tool to Create Labeled Image Datasets from Petabyte-Scale Imagery

Rudy Venguswamy, Mike Levy, Anirudh Koul, Satyarth Praveen, Tarun Narayanan, Ajay Krishnan, Jenessa Peterson, Siddha Ganju, and Meher Kasam

Machine learning modeling for Earth events at NASA is often limited by the availability of labeled examples. For example, training classifiers for forest fires or oil spills from satellite imagery requires curating a massive and diverse dataset of example forest fires, a tedious multi-month effort requiring careful review of over 196.9 million square miles of data per day for 20 years. While such images might exist in abundance within 40 petabytes of unlabeled satellite data, finding these positive examples to include in a training dataset for a machine learning model is extremely time-consuming and requires researchers to "hunt" for positive examples, like finding a needle in a haystack. 

We present a no-code open-source tool, Curator, whose goal is to minimize the amount of human manual image labeling needed to achieve a state of the art classifier. The pipeline, purpose-built to take advantage of the massive amount of unlabeled images, consists of (1) self-supervision training to convert unlabeled images into meaningful representations, (2) search-by-example to collect a seed set of images, (3) human-in-the-loop active learning to iteratively ask for labels on uncertain examples and train on them. 

In step 1, a model capable of representing unlabeled images meaningfully is trained with a self-supervised algorithm (like SimCLR) on a random subset of the dataset (that conforms to researchers’ specified “training budget.”). Since real-world datasets are often imbalanced leading to suboptimal models, the initial model is used to generate embeddings on the entire dataset. Then, images with equidistant embeddings are sampled. This iterative training and resampling strategy improves both balanced training data and models every iteration. In step 2, researchers supply an example image of interest, and the output embeddings generated from this image are used to find other images with embeddings near the reference image’s embedding in euclidean space (hence similar looking images to the query image). These proposed candidate images contain a higher density of positive examples and are annotated manually as a seed set. In step 3, the seed labels are used to train a classifier to identify more candidate images for human inspection with active learning. Each classification training loop, candidate images for labeling are sampled from the larger unlabeled dataset based on the images that the model is most uncertain about (p ≈ 0.5).

Curator is released as an open-source package built on PyTorch-Lightning. The pipeline uses GPU-based transforms from the NVIDIA-Dali package for augmentation, leading to a 5-10x speed up in self-supervised training and is run from the command line.

By iteratively training a self-supervised model and a classifier in tandem with human manual annotation, this pipeline is able to unearth more positive examples from severely imbalanced datasets which were previously untrainable with self-supervision algorithms. In applications such as detecting wildfires, atmospheric dust, or turning outward with telescopic surveys, increasing the number of positive candidates presented to humans for manual inspection increases the efficacy of classifiers and multiplies the efficiency of researchers’ data curation efforts.

How to cite: Venguswamy, R., Levy, M., Koul, A., Praveen, S., Narayanan, T., Krishnan, A., Peterson, J., Ganju, S., and Kasam, M.: Curator: A No-Code Self-Supervised Learning and Active Labeling Tool to Create Labeled Image Datasets from Petabyte-Scale Imagery, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6853, https://doi.org/10.5194/egusphere-egu21-6853, 2021.

Data preparation process generally consumes up to 80% of the Data Scientists time, with 60% of that being attributed to cleaning and labeling data.[1]  Our solution is to use automated pipelines to prepare, annotate, and catalog data. The first step upon ingestion, especially in the case of real world—unstructured and unlabeled datasets—is to leverage Snorkel, a tool specifically designed around a paradigm to rapidly create, manage, and model training data. Configured properly, Snorkel can be leveraged to temper this labeling bottle-neck through a process called weak supervision. Weak supervision uses programmatic labeling functions—heuristics, distant supervision, SME or knowledge base—scripted in python to generate “noisy labels”. The function traverses the entirety of the dataset and feeds the labeled data into a generative—conditionally probabilistic—model. The function of this model is to output the distribution of each response variable and predict the conditional probability based on a joint probability distribution algorithm. This is done by comparing the various labeling functions and the degree to which their outputs are congruent to each other. A single labeling function that has a high degree of congruence with other labeling functions will have a high degree of learned accuracy, that is, the fraction of predictions that the model got right. Conversely, single labeling functions that have a low degree of congruence with other functions will have low learned accuracy. Each prediction is then combined by the estimated weighted accuracy, whereby the predictions of the higher learned functions are counted multiple times. The result yields a transformation from a binary classification of 0 or 1 to a fuzzy label between 0 and 1— there is “x” probability that based on heuristic “n”, the response variable is “y”. The addition of data to this generative model multi-class inference will be made on the response variables positive, negative, or abstain, assigning probabilistic labels to potentially millions of data points. Thus, we have generated a discriminative ground truth for all further labeling efforts and have improved the scalability of our models. Labeling functions can be applied to unlabeled data to further machine learning efforts.
 
Once our datasets are labeled and a ground truth is established, we need to persist the data into our delta lake since it combines the most performant aspects of a warehouse with the low-cost storage for data lakes. In addition, the lake can accept unstructured, semi structured, or structured data sources, and those sources can be further aggregated into raw ingestion, cleaned, and feature engineered data layers.  By sectioning off the data sources into these “layers”, the data engineering portion is abstracted away from the data scientist, who can access model ready data at any time.  Data can be ingested via batch or stream. 
 
The design of the entire ecosystem is to eliminate as much technical debt in machine learning paradigms as possible in terms of configuration, data collection, verification, governance, extraction, analytics, process management, resource management, infrastructure, monitoring, and post verification. 

How to cite: Meil, J.: Programmatic Labeling of Dark Data for Artificial Intelligence in Spatial Informatics, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16326, https://doi.org/10.5194/egusphere-egu21-16326, 2021.

EGU21-15297 | vPICO presentations | ESSI1.2

Collecting training data to map forest management at global scale

Myroslava Lesiv, Dmitry Schepaschenko, Martina Dürauer, Marcel Buchhorn, Ivelina Georgieva, and Steffen Fritz

Spatially explicit information on forest management at a global scale is critical for understanding the current status of forests for sustainable forest management and restoration. Whereas remotely sensed based datasets, developed by applying ML and AI algorithms, can successfully depict tree cover and other land cover types, it has not yet been used to depict untouched forest and different degrees of forest management. We show for the first time that with sufficient training data derived from very high-resolution imagery a differentiation within the tree cover class of various levels of forest management is possible.

In this session, we would like to present our approach for labeling forest related training data by using Geo-Wiki application (https://www.geo-wiki.org/). Moreover, we would like to share a new open global training data set on forest management we collected from a series of Geo-Wiki campaigns. In February 2019, we organized an expert workshop to (1) discuss the variety of forest management practices that take place in different parts of the world; (2) generalize the definitions for the application at global scale; (3) finalize the Geo-Wiki interface for the crowdsourcing campaigns; and (4) build a data set of control points (or the expert data set), which we used later to monitor the quality of the crowdsourced contributions by the volunteers. We involved forest experts from different regions around the world to explore what types of forest management information could be collected from visual interpretation of very high-resolution images from Google Maps and Microsoft Bing, in combination with Sentinel time series and Normalized Difference Vegetation Index (NDVI) profiles derived from Google Earth Engine (GEE). Based on the results of this analysis, we expanded these campaigns by involving a broader group of participants, mainly people recruited from remote sensing, geography and forest research institutes and universities.

In total, we collected forest data for approximately 230 000 locations globally. These data are of sufficient density and quality and therefore could be used in many ML and AI applications for forests at regional and local scale.  We also provide an example of ML application, a remotely sensed based global forest management map at a 100 m resolution (PROBA-V) for the year 2015. It includes such classes as intact forests, forests with signs of human impact, including clear cuts and logging, replanted forest, woody plantations with a rotation period up to 15 years, oil palms and agroforestry. The results of independent statistical validation show that the map’s overall accuracy is 81%.

How to cite: Lesiv, M., Schepaschenko, D., Dürauer, M., Buchhorn, M., Georgieva, I., and Fritz, S.: Collecting training data to map forest management at global scale, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15297, https://doi.org/10.5194/egusphere-egu21-15297, 2021.

EGU21-10347 | vPICO presentations | ESSI1.2

gprMax: An Open Source Electromagnetic Simulator for Generating Big Data for Ground Penetrating Radar Applications

Craig Warren, Iraklis Giannakis, and Antonios Giannopoulos

Lack of well-labelled and coherent training data is the main reason why machine learning (ML) and data-driven interpretations are not established in the field of Ground-Penetrating Radar (GPR). Non-representative and limited datasets lead to non-reliable ML-schemes that overfit, and are unable to compete with traditional deterministic approaches. To that extent, numerical data can potentially complement insufficient measured datasets and overcome this lack of data, even in the presence of large feature spaces.

Using synthetic data in ML is not new and it has been extensively applied to computer vision. Applying numerical data in ML requires a numerical framework capable of generating synthetic but nonetheless realistic datasets. Regarding GPR, such a framework is possible using gprMax, an open source electromagnetic solver, fine-tuned for GPR applications [1], [2], [3]. gprMax is fully parallelised and can be run using multiple CPU’s and GPU’s. In addition, it has a flexible scriptable format that makes it easy to generate big data in a trivial manner. Stochastic geometries, realistic soils, vegetation, targets [3] and models of commercial antennas [4], [5] are some of the features that can be easily incorporated in the training data.

The capability of gprMax to generate realistic numerical datasets is demonstrated in [6], [7]. The investigated problem is assessing the depth and the diameter of rebars in reinforced concrete. Estimating the diameter of rebars using GPR is particularly challenging with no conclusive solution. Using a synthetic training set, generated using gprMax, we managed to effectively train ML-schemes capable of estimating the diameter of rebar in an accurate and efficient manner [6], [7]. The aforementioned case studies support the premise that gprMax has the potential to provide realistic training data to applications where well-labelled data are not available, such as landmine detection, non-destructive testing and planetary sciences.

References

[1] Warren, C., Giannopoulos, A. & Giannakis, I., (2016). gprMax: Open Source software to simulate electromagnetic wave propagation for Ground Penetrating Radar, Computer Physics Communications, 209, 163-170.

[2] Warren, C., Giannopoulos, A., Gray, A., Giannakis, I., Patterson, A., Wetter, L. & Hamrah, A., (2018). A CUDA-based GPU engine for gprMax: Open source FDTD, electromagnetic simulation software. Computer Physics Communications, 237, 208-218.

[3] Giannakis, I., Giannopoulos, A. & Warren, C. (2016). A realistic FDTD numerical modeling framework of Ground Penetrating Radar for landmine detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 9(1), 37-51.

[4] Giannakis, I., Giannopoulos, A. & Warren, C., (2018). Realistic FDTD GPR antenna models optimized using a novel linear/non-linear full waveform inversion. IEEE Transactions on Geoscience and Remote Sensing, 207(3), 1768-1778.

[5] Warren, C., Giannopoulos, A. (2011). Creating finite-difference time-domain models of commercial ground-penetrating radar antennas using Taguchi’s optimization method. Geophysics, 76(2), G37-G47

[6] Giannakis, I., Giannopoulos, A. & Warren, C. (2021). A Machine Learning Scheme for  Estimating the Diameter of Reinforcing Bars Using Ground Penetrating Radar. IEEE Geoscience and Remote Sensing Letters.

[7] Giannakis, I., Giannopoulos, A., & Warren, C. (2019). A machine learning-based fast-forward solver for ground penetrating radar with application to full-waveform inversion. IEEE Transactions on Geoscience and Remote Sensing. 57(7), 4417-4426.

How to cite: Warren, C., Giannakis, I., and Giannopoulos, A.: gprMax: An Open Source Electromagnetic Simulator for Generating Big Data for Ground Penetrating Radar Applications, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10347, https://doi.org/10.5194/egusphere-egu21-10347, 2021.

EGU21-6490 | vPICO presentations | ESSI1.2 | Highlight

It's a Bird it's a Plane it's a Meteor

Surya Ambardar, Siddha Ganju, and Peter Jenniskens

Meteor showers are some of the most dazzling and memorable events occuring in the night sky. Caused by bits of celestial debris from comets and asteroids entering Earth’s atmosphere at astronomical speeds, meteors are bright streaks of light in the night sky, sometimes called shooting stars. Those meteors are recorded, tracked and triangulated by low-light surveillance cameras in a project called CAMS: Cameras for Allsky Meteor Surveillance. CAMS offers insights into a universe of otherwise invisible solar system bodies, but that task has proven difficult due to the lack of automated supervision. Until recently, much of the data control was done by hand. Necessary to build supervised classification models,  labeled training data is essential because other man-made objects such as airplanes and satellites can be mistaken for meteors. To address this issue, we leverage one year's worth of meteor activity data from CAMS to provide weak supervision for over a decade of collected data, drastically reducing the amount of manual annotation necessary and expanding available labelled meteor training data.

 

Founded in 2010, CAMS aims to automate video surveillance of the night sky to validate the International Astronomical Union’s Working List of Meteor Showers, discover new meteor showers, and predict future meteor showers. Since 2010, CAMS has collected a decade's worth of night sky activity data in the form of astrometric tracks and brightness profiles, a year of which has been manually annotated. We utilize this one year's labelled data to train a high confidence LSTM meteor classifier to generate low confidence labels for the remaining decade’s worth of meteor data. Our classifier yields confidence levels for each prediction, and when the confidence lies above a statistically significant threshold, predicted labels can be treated as weak supervision for future training runs. Remaining predictions below the threshold can be manually annotated. Using a high threshold minimizes label noise and ensures instances are correctly labeled while considerably reducing the  amount of data that needs to be annotated. Weak supervision can be confirmed by checking date ranges and data distributions for known meteor showers to verify predicted labels.

 

To encourage discovery and distribution of training data and models, we additionally provide scripts to automate data ingestion and model training from raw camera data files. The data scripts handle processing of CAMS data, providing a pipeline to encourage open sharing and reproduction of our research. Additionally, we provide code for a LSTM classifier baseline model which can identify probable meteors. This baseline model script allows further exploration of CAMS data and an opportunity to experiment with other model types.  

 

In conclusion, our contributions are (1) a weak supervision method utilizing a year’s worth of labelled CAMS data to generate labels for a decade’s worth of data, along with (2) baseline data processing and model scripts to encourage open discovery and distribution. Our unique contributions expand access to labeled training meteor data and make the data globally and publicly accessible thorough daily generated maps of meteor shower activity posted at http://cams.seti.org/FDL/. 

How to cite: Ambardar, S., Ganju, S., and Jenniskens, P.: It's a Bird it's a Plane it's a Meteor, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6490, https://doi.org/10.5194/egusphere-egu21-6490, 2021.

EGU21-11957 | vPICO presentations | ESSI1.2

Conditional spatio-temporal random crop for weak labeled SAR datasets

Francesco Asaro, Gianluca Murdaca, and Claudio Maria Prati

This work presents a methodology to improve supervised learning of segmentation tasks for convolutional architectures in unbalanced and weak labeled synthetic aperture radar (SAR) dataset scenarios, which characterize the Earth Observation (EO) domain. The presented methodology exploits multitemporality and stochasticity to regularize training by reducing overfitting and thus improving validation and test performances.

Traditional precisely annotated datasets are made of patches extracted from a set of image-label pairs, often in a deterministic fashion. Through a set of experiments, we show that this approach is sub-optimal when using weak labels since it leads to early overfitting, mainly because weak labels only mark the simplest features of the target class.

The presented methodology builds up the dataset from a multitemporal stack of images aligned with the weakly labeled ground truth and samples the patches both in time and space. The patches are selected only if a given condition of the positive class frequency is met. We show learning improvements against the traditional methodology by applying our strategy to a benchmark task, which consists of training a typical deep convolutional network, Unet (Ronneberger et al, 2015), for the segmentation of water surfaces in SAR images.

The dataset sources are Sentinel-1, calibrated sigma zero, VV-VH polarized, single-look, intensity images for the inputs, and the Copernicus’s “Water and Wetness High Resolution Layer” for the weak labels. To avoid spatial autocorrelation phenomena, the training set covers the Low Countries (Belgium, the Netherlands, and Luxembourg), while the validation and test-set span the Padana plain area (Italy). The training dataset is built up according to the methodology, while the validation and test datasets are defined in a deterministic fashion as usual.

We show the beneficial effects of multitemporality, stochasticity, and conditional selection in three different sets of experiments, as well as in a combined one. In particular, we observe performance improvements in terms of the F-1 score, which increases together with the degree of multitemporality (number of images in the stack), as well as when stochasticity and conditional rules that compensate the under-representation of the positive class are added. Furthermore, we show that in the specific framework of SAR data, the introduction of multitemporality improves the learned representation of the speckle, thus implicitly optimizing the Unet for both the filtering and segmentation tasks. We prove this by comparing the number of looks of the input patch to that of the patch reconstructed before the classification layer.

Overall, in this framework, we show that solely using the presented training strategy, the classifier's performance improves up to 5% in terms of the F-1 score.

How to cite: Asaro, F., Murdaca, G., and Prati, C. M.: Conditional spatio-temporal random crop for weak labeled SAR datasets, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11957, https://doi.org/10.5194/egusphere-egu21-11957, 2021.

EGU21-1762 | vPICO presentations | ESSI1.2

RainBench: Enabling Data-Driven Precipitation Forecasting on a Global Scale

Christian Schroeder de Witt, Catherine Tong, Valentina Zantedeschi, Daniele De Martini, Alfredo Kalaitzis, Matthew Chantry, Duncan Watson-Parris, and Piotr Bilinski

Climate change is expected to aggravate extreme precipitation events, directly impacting the livelihood of millions. Without a global precipitation forecasting system in place, many regions – especially those constrained in resources to collect expensive ground station data – are left behind. To mitigate such unequal reach of climate change, a solution is to alleviate the reliance on numerical models (and by extension ground station data) by enabling machine-learning-based global forecasts from satellite imagery. Though prior works exist in regional precipitation nowcasting, there lacks work in global, medium-term precipitation forecasting. Importantly, a common, accessible baseline for meaningful comparison is absent. In this work, we present RainBench, a multi-modal benchmark dataset dedicated to advancing global precipitation forecasting. We establish baseline tasks and release PyRain, a data-handling pipeline to enable efficient processing of decades-worth of data by any modeling framework. Whilst our work serves as a basis for a new chapter on global precipitation forecasting from satellite imagery, the greater promise lies in the community joining forces to use our released datasets and tools in developing machine learning approaches to tackle this important challenge.

How to cite: Schroeder de Witt, C., Tong, C., Zantedeschi, V., De Martini, D., Kalaitzis, A., Chantry, M., Watson-Parris, D., and Bilinski, P.: RainBench: Enabling Data-Driven Precipitation Forecasting on a Global Scale, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1762, https://doi.org/10.5194/egusphere-egu21-1762, 2021.

EGU21-4683 | vPICO presentations | ESSI1.2

Improved Training for Machine Learning: The Additional Potential of Innovative Algorithmic Approaches.

Octavian Dumitru, Gottfried Schwarz, Mihai Datcu, Dongyang Ao, Zhongling Huang, and Mila Stillman

During the last years, much progress has been reached with machine learning algorithms. Among the typical application fields of machine learning are many technical and commercial applications as well as Earth science analyses, where most often indirect and distorted detector data have to be converted to well-calibrated scientific data that are a prerequisite for a correct understanding of the desired physical quantities and their relationships.

However, the provision of sufficient calibrated data is not enough for the testing, training, and routine processing of most machine learning applications. In principle, one also needs a clear strategy for the selection of necessary and useful training data and an easily understandable quality control of the finally desired parameters.

At a first glance, one could guess that this problem could be solved by a careful selection of representative test data covering many typical cases as well as some counterexamples. Then these test data can be used for the training of the internal parameters of a machine learning application. At a second glance, however, many researchers found out that a simple stacking up of plain examples is not the best choice for many scientific applications.

To get improved machine learning results, we concentrated on the analysis of satellite images depicting the Earth’s surface under various conditions such as the selected instrument type, spectral bands, and spatial resolution. In our case, such data are routinely provided by the freely accessible European Sentinel satellite products (e.g., Sentinel-1, and Sentinel-2). Our basic work then included investigations of how some additional processing steps – to be linked with the selected training data – can provide better machine learning results.

To this end, we analysed and compared three different approaches to find out machine learning strategies for the joint selection and processing of training data for our Earth observation images:

  • One can optimize the training data selection by adapting the data selection to the specific instrument, target, and application characteristics [1].
  • As an alternative, one can dynamically generate new training parameters by Generative Adversarial Networks. This is comparable to the role of a sparring partner in boxing [2].
  • One can also use a hybrid semi-supervised approach for Synthetic Aperture Radar images with limited labelled data. The method is split in: polarimetric scattering classification, topic modelling for scattering labels, unsupervised constraint learning, and supervised label prediction with constraints [3].

We applied these strategies in the ExtremeEarth sea-ice monitoring project (http://earthanalytics.eu/). As a result, we can demonstrate for which application cases these three strategies will provide a promising alternative to a simple conventional selection of available training data.

[1] C.O. Dumitru et. al, “Understanding Satellite Images: A Data Mining Module for Sentinel Images”, Big Earth Data, 2020, 4(4), pp. 367-408.

[2] D. Ao et. al., “Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X”, Remote Sensing, 2018, 10(10), pp. 1-23.

[3] Z. Huang, et. al., "HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Images", IEEE Transactions on Geoscience and Remote Sensing, 2020, pp.1-18.

How to cite: Dumitru, O., Schwarz, G., Datcu, M., Ao, D., Huang, Z., and Stillman, M.: Improved Training for Machine Learning: The Additional Potential of Innovative Algorithmic Approaches., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4683, https://doi.org/10.5194/egusphere-egu21-4683, 2021.

EGU21-12384 | vPICO presentations | ESSI1.2

AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data.

Alastair McKinstry, Oisin Boydell, Quan Le, Inder Preet, Jennifer Hanafin, Manuel Fernandez, Adam Warde, Venkatesh Kannan, and Patrick Griffiths

The ESA-funded AIREO project [1] sets out to produce AI-ready training dataset specifications and best practices to support the training and development of machine learning models on Earth Observation (EO) data. While the quality and quantity of EO data has increased drastically over the past decades, availability of training data for machine learning applications is considered a major bottleneck. The goal is to move towards implementing FAIR data principles for training data in EO, enhancing especially the finability, interoperability and reusability aspects.  To achieve this goal, AIREO sets out to provide a training data specification and to develop best practices for the use of training datasets in EO. An additional goal is to make training data sets self-explanatory (“AI-ready) in order to expose challenging problems to a wider audience that does not have expert geospatial knowledge. 

Key elements that are addressed in the AIREO specification are granular and interoperable metadata (based on STAC), innovative Quality Assurance metrics, data provenance and processing history as well as integrated feature engineering recipes that optimize platform independence. Several initial pilot datasets are being developed following the AIREO data specifications. These pilot applications include for example  forest biomass, sea ice detection and the estimation of atmospheric parameters.An API for the easy exploitation of these datasets will be provided.to allow the Training Datasets (TDS) to work against EO catalogs (based on OGC STAC catalogs and best practises from ML community) to allow updating and updated model training over time.

 

This presentation will present the first version of the AIREO training dataset specification and will showcase some elements of the best-practices that were developed. The AIREO compliant pilot datasets will be presented which are openly accessible and community feedback is explicitly encouraged. 



[1] https://aireo.net/

How to cite: McKinstry, A., Boydell, O., Le, Q., Preet, I., Hanafin, J., Fernandez, M., Warde, A., Kannan, V., and Griffiths, P.: AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12384, https://doi.org/10.5194/egusphere-egu21-12384, 2021.

ESSI1.6 – Spatio-temporal data science: advances in computational geosciences and innovative evaluation tools for weather and climate science

EGU21-13357 | vPICO presentations | ESSI1.6

Overcoming challenges in spatio-temporal modelling of large-scale (global) data

Aoibheann Brady, Jonathan Rougier, Yann Ziegler, Bramha Dutt Vishwakarma, Sam Royston, Stephen Chuter, Richard Westaway, and Jonathan Bamber

Modelling spatio-temporal data on a large scale presents a number of obstacles for statisticians and environmental scientists. Issues such as computational complexity, combining point and areal data, separation of sources into their component processes, and the handling of both large volumes of data in some areas and sparse data in others must be considered. We discuss methods to overcome such challenges within a Bayesian hierarchical modelling framework using INLA.

In particular, we illustrate the approach using the example of source-separation of geophysical signals both on a continental and global scale. In such a setting, data tends to be available both at a local and areal level. We propose a novel approach for integrating such sources together using the INLA-SPDE method, which is normally reserved for point-level data. Additionally, the geophysical processes involved are both spatial (time-invariant) and spatio-temporal in nature. Separation of such processes into physically sensible components requires careful modelling and consideration of priors (such as physical model outputs where data is sparse), which will be discussed. We also consider methods to overcome the computational costs of modelling on such a large scale, from efficient mesh design, to thinning/aggregating of data, to considering alternative approaches for inference. This holistic approach to modelling of large-scale data ensures that spatial and spatio-temporal processes can be sensibly separated into their component parts, without being prohibitively expensive to model.

How to cite: Brady, A., Rougier, J., Ziegler, Y., Vishwakarma, B. D., Royston, S., Chuter, S., Westaway, R., and Bamber, J.: Overcoming challenges in spatio-temporal modelling of large-scale (global) data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13357, https://doi.org/10.5194/egusphere-egu21-13357, 2021.

EGU21-1255 | vPICO presentations | ESSI1.6

Scientific workflow scheduling based on data transformation graph for remote sensing application

Zhuojing tian, Zhenchun huang, Yinong zhang, Yanwei zhao, En fu, and Shuying wang

Abstract: As the amount of data and computation of scientific workflow applications continue to grow, distributed and heterogeneous computing infrastructures such as inter-cloud environments provide this type of application with a great number of computing resources to meet corresponding needs. In the inter-cloud environment, how to effectively map tasks to cloud service providers to meet QoS(quality of service) constraints based on user requirements has become an important research direction. Remote sensing applications need to process terabytes of data each time, however frequent and huge data transmission across the cloud will bring huge performance bottlenecks for execution, and seriously affect the result of QoS constraints such as makespan and cost. Using a data transformation graph(DTG) to study the data transfer process of global drought detection application, the specific optimization strategy is obtained based on the characteristics of application and environment, and according to this, one inter-cloud workflow scheduling method based on genetic algorithm is proposed, under the condition of satisfying the user’s QoS constraints, the makespan the cost can be minimized. The experimental results show that compared with the standard genetic algorithm, random algorithm, random algorithm, and round-robin algorithm, the optimized genetic algorithm can greatly improve the scheduling performance of data computation-intensive scientific workflows such as remote sensing applications and reduce the impact of performance bottlenecks.

Keywords: scientific workflow scheduling; inter-cloud environment; remote sensing application; data transformation graph;

How to cite: tian, Z., huang, Z., zhang, Y., zhao, Y., fu, E., and wang, S.: Scientific workflow scheduling based on data transformation graph for remote sensing application, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1255, https://doi.org/10.5194/egusphere-egu21-1255, 2021.

EGU21-14423 | vPICO presentations | ESSI1.6

Biclustering for uncovering spatial-temporal patterns in telecom data

Nastasija Grujić, Sanja Brdar, Olivera Novović, Nikola Obrenović, Miro Govedarica, and Vladimir Crnojević

Understanding human dynamics is of crucial importance for managing human activities for sustainable development. According to the United Nations, 68% of people will live in cities by 2050. Therefore, it is important to understand human footprints in order to develop policies that will improve the lives in urban and suburban areas. Our study aims at detecting spatial-temporal activity patterns from mobile phone data provided by a telecom service provider. To be more precise we used the activity data set which contains the amount of sent/received SMS, calls, as well as internet usage per radio-base station in defined time-stamps. The case study focus is on the capital city of Serbia, Belgrade, which has have nearly 2 million inhabitants and included the month of February 2020 in the analysis. We applied the biclustering (spectral co-clustering) algorithm on the telecom data to detect locations in the city that behave similarly in the specific time windows. Biclustering is a data mining technique that is being used for finding homogeneous submatrices among rows and columns of a matrix, widely used in text mining and gene expression data analysis.  Although, there are no examples in the literature of the algorithm usage on location-based data for urban application, we have seen the potential due to its ability to detect clusters in a more refined way, during a specific period of time that could not otherwise be detected with global clustering approach. To prepare the data for the algorithm appliance, we normalized each type of activity (SMS/Call In/Out and Internet activity) and aggregated the total activity on each antenna per hour. We transformed the data into the matrix, where rows were presenting the antennas, and columns the hours. The algorithm was applied for each day separately. On average number of discovered biclusters was 5, usually corresponding to regular based activities, such as work, home, commuting, and free time, but also to the city’s nightlife. Our results confirmed that urban spaces are the function of space and time. They revealed different functionalities of the urban and suburban parts in the city. We observed the presence of patterned behavior across the analyzed days. The type of day dictated the spatial-temporal activities that occurred. We distinguished different types of days, such as working days (Monday to Thursday), Fridays, weekends, and holidays. These findings showed the promising potential of the biclustering algorithm and could be utilized by policymakers for precisely detecting activity clusters across space and time that correspond to specific functions of the city.

How to cite: Grujić, N., Brdar, S., Novović, O., Obrenović, N., Govedarica, M., and Crnojević, V.: Biclustering for uncovering spatial-temporal patterns in telecom data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14423, https://doi.org/10.5194/egusphere-egu21-14423, 2021.

EGU21-9749 | vPICO presentations | ESSI1.6

Cluster analysis in the studying of stress relation in the Vrancea-zone

Lili Czirok, Lukács Kuslits, and Katalin Gribovszki

The SE-Carpathians produce significant geodynamic activity due to the current subduction process. The strong seismicity in the Vrancea-zone is its most important indicator. The focus area of these seismic events is relatively small, around 80*100 km and the distribution of their locations is quiet dense.

The authors have carried out cluster analyses of the focal mechanism solutions estimated from local and tele-seismic measurements and stress inversions to support the recent and previously published studies in this region. They have applied different pre-existing clustering methods – e.g. HDBSCAN (hierarchical density-based clustering for applications with noise) and agglomerative hierarchical analysis – considering to the geographical coordinates, focal depths and parameters of the focal mechanism solutions of the used seismic events, as well. Moreover, they have attempted to improve a fully-automated algorithm for the classification of the earthquakes for the estimations. This algorithm does not call for the setting of hyper-parameters, thus the affection of the subjectivity can be reduced significantly and the running time can be also decreased. In all cases, the resulted stress tensors are in close agreement with the earlier presented results.

How to cite: Czirok, L., Kuslits, L., and Gribovszki, K.: Cluster analysis in the studying of stress relation in the Vrancea-zone, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9749, https://doi.org/10.5194/egusphere-egu21-9749, 2021.

EGU21-12324 | vPICO presentations | ESSI1.6 | Highlight

AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series

Christina Pacher, Irene Schicker, Rosmarie DeWit, and Claudia Plant
Both clustering and outlier detection play an important role in meteorology. With clustering large sets of data points, such as numerical weather predicition (NWP) model data or observation sites, are separated into groups based on the characteristics found in the data grouping similar data points in a cluster. Clustering enables one, too, to detect outliers in the data. The resulting clusters are useful in many ways such as atmospheric pattern recognition (e.g. clustering NWP ensemble predictions to estimate the likelihood of the predicted weather patterns), climate applications (grouping point observations for climate pattern recognition), forecasting(e.g. data pool enhancement using data of similar sites for forecasting applications), in urban meteorology, air quality, renewable energy systems, and hydrologogical applications.  
 
Typically, one does not know in advance how many clusters or groups are present in the data. However, for algorithms such as K-means one needs to define how many clusters one wants to have as an outcome. With the proposed novel algorithm AWT,  a modified combination of several well-known clustering algorithms, this is not needed. It chooses the number of clusters automatically based on a user-defined threshold parameter. Furthermore, the algorithm can be used for heterogeneous meteorological input data as well as data sets that exceed the available memory size.
Similar as the classical BIRCH algorithm, our method AWT works on a multi-resolution data structure, an Aggregated Wavelet Tree that is suitable for representing multivariate time series. In contrast to BIRCH, the user does not need to specify the number of clusters K, as that is difficult in our application. Instead, AWT relies on a single threshold parameter for clustering and outlier detection. This threshold corresponds to the highest resolution of the tree. Points that are not in any cluster with respect to the threshold are naturally flagged as outliers.
 
With the recent increasing usage of non-traditional data sources, such as private, smart-home weather station, in NWP  models and other forecasting and applications outlier and clustering methods are useful in pre-processing and filtering these rather novel data sources. Especially in urban areas changes in the surface energy balance caused by urbanization result in temperatures generally being higher in cities than in the surrounding areas. In order to capture the spatial features of this effect data with high spatial resoltion are necessary. Here, these privately owned smart-home weather stations are useful as often only a limited number of official observation sites exist. However, to be able to use these data they need to be pre-processed.  
  
In this work we apply our novel algorithm AWT to crowdsourced data from the city of Vienna. We demonstrate the skill of the algorithm in outlier detection and filtering as well as clustering the data and evaluate it against commonly used algorithms. Furthermore, we show how one could use the algorithm in renewable energy applications.

How to cite: Pacher, C., Schicker, I., DeWit, R., and Plant, C.: AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12324, https://doi.org/10.5194/egusphere-egu21-12324, 2021.

EGU21-12734 | vPICO presentations | ESSI1.6

Statistical downscaling of wind speed time series data based on topographic variables

Wenxuan Hu, Yvonne Scholz, Madhura Yeligeti, Lüder von Bremen, and Marion Schroedter-Homscheidt

Renewable energy sources such as wind energy play a crucial role in most climate change mitigation scenarios because of their ability to significantly reduce energy-related carbon emissions. In order to understand and design future energy systems, detailed modeling of renewable energy sources is important. In the light of making energy system modelling possible at all variability scales of local weather conditions, renewable energy source information with high resolution in both space and time are required.

Nowadays, renewable energy resources data that are widely used among the energy modeling community are reanalysis data such as ERA5, COSMO REA6, and MERRA2. Taking wind speed as an example, reanalysis data can provide long term spatially resolved wind information on any desired height in a physically consistent way. However, their spatial resolution is coarse. In order to obtain a fine spatial resolution data focusing on wind speed, this paper proposes a statistical downscaling method based on reanalysis data, observation data, and the local topography.

While most statistical wind downscaling studies have focused on obtaining site specific data or downscaling probability density functions, this paper focuses on downscaling one-year hourly wind speed time series for Europe to 0.00833 degree X 0.00833 degree (approximately 1km X 1km) resolution. It has been proven by various studies that the local topography influences wind speed. The topographic structure in this study is determined by two metrics: TPI, a topographic position index that compares the elevation of each cell to the mean elevation of the neighborhood areas and Sx, a slope-based, direction-dependent parameter that describes the topography in the upwind direction. The observation data used in this study are MeteoSwiss measurement values which provide the hourly wind speed time series at the station heights. For each weather station with observation data, biases described by the local terrain features are introduced to minimize the root mean square error (RMS) and Kolmogorov-Smirnov D (KSD) statistic between the corrected and the observed wind speed. These biases are then assigned to grid points with the same terrain types as the weather station, which enables downscaling of the wind speed for whole Europe.

The results show that this downscaling method can improve the RMS and KSD for both ERA5 and COSMO REA6, especially at mountain ridges, which indicates that it can not only decrease the bias, but also provide a better match to the observed wind speed distributions.

How to cite: Hu, W., Scholz, Y., Yeligeti, M., von Bremen, L., and Schroedter-Homscheidt, M.: Statistical downscaling of wind speed time series data based on topographic variables, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12734, https://doi.org/10.5194/egusphere-egu21-12734, 2021.

EGU21-13210 | vPICO presentations | ESSI1.6

Spatio-temporal clustering methodologies for point-event natural hazards

Uldis Zandovskis, Bruce D. Malamud, and Davide Pigoli

Natural hazards are inherently spatio-temporal processes. Spatio-temporal clustering methodologies applied to natural hazard data can help distinguish clustering patterns that would not only identify point-event dense regions and time periods, but also provide insight into the hazardous process. Here we review spatio-temporal clustering methodologies applicable to point event datasets representative of natural hazards and we evaluate their performance using both synthetic and real life data. We first present a systematic overview of major spatio-temporal clustering methodologies used in the literature, which include clustering procedures  that are (i) global (providing a single quantitative measure of the degree of clustering in the dataset) and (ii) local (i.e. assigning individual point events to a cluster). A total of seven methodologies from these two groups of clustering procedures are applied to real-world (lightning) and synthetic datasets. For (i) global procedures, we explore Knox, Mantel, Jacquez k-NN tests and spatio-temporal K-functions and for (ii) local procedures we consider spatio-temporal scan statistic, kernel density estimation and density-based clustering method OPTICS. The dataset of 7021 lightning strikes is from 1 and 2 July 2015 over the UK, when a severe three-storm system crossed the region with different convective modes producing each of the storms. The synthetic datasets are representative of various topologies of a point-event natural hazard data with a moving source. We introduce a two-source model with input parameters related to the physical properties of the source. Each source has a set number of points events, initiation point in space and time, movement speed, direction, inter-event time distribution and spatial spread distribution. In addition to a base model of two identical moving sources with a set temporal separation, we produce four different topologies of the data by incrementally varying the speed parameter of the source, spatial spread parameters, direction and initiation points, and angle of two sources. With these five synthetic datasets representative of various two-source models, we evaluate the performance of the methodologies. The performance is assessed based on the ability of each methodology to separate the point events produced by the two sources and the sensitivity of these results to changes in the model input parameters. We further discuss the benefits of combining global and local clustering procedures in the analyses as we gain an initial understanding of the spatial and temporal scales over which clustering is present in the data by using global clustering procedures. This information then helps to inform and limit the choice of input parameters for the local clustering procedures.

How to cite: Zandovskis, U., Malamud, B. D., and Pigoli, D.: Spatio-temporal clustering methodologies for point-event natural hazards, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13210, https://doi.org/10.5194/egusphere-egu21-13210, 2021.

EGU21-7162 | vPICO presentations | ESSI1.6

Wildfire susceptibility assessment: evaluation of the performance of different machine learning algorithms

Andrea Trucchia, Sara Isnardi, Mirko D'Andrea, Guido Biondi, Paolo Fiorucci, and Marj Tonini

Wildfires constitute a complex environmental disaster triggered by several interacting natural and human factors that can affect the biodiversity, species composition and ecosystems, but also human lives, regional economies and environmental health. Therefore, wildfires have become the focus on forestry and ecological research and are receiving considerable attention in forest management. Current advances in automated learning and simulation methods, like machine learning (ML) algorithms, recently aroused great interest in wildfires risk assessment and mapping. This quantitative evaluation is carried out by taking into account two factors: the location and spatial extension of past wildfires events and the geo-environmental and anthropogenic predisposing factors that favored their ignition and spreading. When dealing with risk assessment and predictive mapping for natural phenomena, it is crucial to ascertain the reliability and validity of collected data, as well as the prediction capability of the obtained results. In a previous study (Tonini et al. 2020) authors applied Random Forest (RF) to elaborate wildfire susceptibility mapping for Liguria region (Italy). In the present study, we address to the following outstanding issues, which are still unsolved: (1) the vegetation map included a class labeled “burned area” that masked to true burned vegetation; (2) the implemented model based on RF gave good results, but it needs to be compared with other ML based approaches; (3) to test the predictive capabilities of the model, the last three years of observations were taken, but these are not fully representative of different wildfires regimes, characterizing non-consecutives years. Thus, by improving the analyses, the following results were finally achieved. 1) the class “burned areas” has been reclassified based on expert knowledge, and the type of vegetation correctly assigned. This allowed correctly estimating the relative importance of each vegetation class belonging to this variable. (2) Two additional ML based approach, namely Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM), were tested besides RF and the performance of each model was assessed, as well as the resulting variable ranking and the predicting outputs. This allowed comparing the three ML based approaches and evaluating the pros and cons of each one. (3) The training and testing dataset were selected by extracting the yearly-observations based on a clustering procedure, allowing accounting for the temporal variability of the burning seasons. As result, our models can perform on average better prediction in different situations, by taking into considering years experiencing more or less wildfires than usual. The three ML-based models (RF, SVM and MLP) were finally validated by means of two metrics: i) the Area Under the ROC Curve, selecting the validation dataset by using a 5-folds cross validation procedure; ii) the RMS errors, computed by evaluating the difference between the predicted probability outputs and the presence/absence of an observed event in the testing dataset.

Bibliography:

Tonini, M.; D’Andrea, M.; Biondi, G.; Degli Esposti, S.; Trucchia, A.; Fiorucci, P. A Machine Learning-Based Approach for Wildfire Susceptibility Mapping. The Case Study of the Liguria Region in Italy. Geosciences 202010, 105. https://doi.org/10.3390/geosciences10030105

How to cite: Trucchia, A., Isnardi, S., D'Andrea, M., Biondi, G., Fiorucci, P., and Tonini, M.: Wildfire susceptibility assessment: evaluation of the performance of different machine learning algorithms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7162, https://doi.org/10.5194/egusphere-egu21-7162, 2021.

EGU21-13713 | vPICO presentations | ESSI1.6

RapidAI4EO: Advancing the State-of-the-Art in Continuous Land Monitoring

Giovanni Marchisio, Patrick Helber, Benjamin Bischke, Tim Davis, and Annett Wania

New catalogues of nearly daily or even intraday temporal data will soon dominate the global archives. However, there has been little exploration of artificial intelligence (AI) techniques to leverage the high cadence that is already possible to achieve through the fusion of multiscale, multimodal sensors. Under the sponsorship of the European Union’s Horizon 2020 programme, RapidAI4EO will establish the foundations for the next generation of Copernicus Land Monitoring Service (CLMS) products. Focus is on the CORINE Land Cover programme, which is the flagship of CLMS. 

Specific objectives of the project are to: 1) explore and stimulate the development of new spatiotemporal monitoring applications based on the latest advances in AI and Deep Learning (DL); 2) demonstrate the fusion of Copernicus high resolution satellite imagery and third party very high resolution imagery; 3) provide intensified monitoring of Land Use and Land Cover, and Land Use change at a much higher level of detail and temporal cadence than it is possible today. 

Our strategy is two-fold. The first aspect involves developing vastly improved DL architectures to model the phenomenology inherent in high cadence observations with focus on disentangling phenology from structural change. The second involves providing critical training data to drive advancement in the Copernicus community and ecosystem well beyond the lifetime of this project. To this end we will create the most complete and dense spatiotemporal training sets ever, combining Sentinel-2 with daily, harmonized, cloud-free, gap filled, multispectral 3m time series resulting from fusion of open satellite data with Planet imagery at as many as 500,000 patch locations over Europe. The daily time series will span the entire year 2018, to coincide with the latest release of CORINE. We plan to open source these datasets for the benefit of the entire remote sensing community.

This talk focuses on the description of the datasets whose inspirations comes from the recently released EuroSAT (Helbert et al, 2019) and BigEarthNet corpora (Sumbul et al, 2019). The new corpora will look at the intersection of CORINE 2018 with all the countries in the EU, balancing relative country surface with relative LULC distribution and most notably adding the daily high resolution time series at all locations for the year 2018. Annotations will be based on the CORINE ontology. The higher spatial resolution will support modeling of more LC classes, while the added  temporal dimension should enable disambiguation of land covers across diverse climate zones, as well as an improved understanding of land use.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004356.

How to cite: Marchisio, G., Helber, P., Bischke, B., Davis, T., and Wania, A.: RapidAI4EO: Advancing the State-of-the-Art in Continuous Land Monitoring, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13713, https://doi.org/10.5194/egusphere-egu21-13713, 2021.

EGU21-16051 | vPICO presentations | ESSI1.6

Land-use change effects on biodiversity through mechanistic simulations: A case study with South-Asian mammals

Andre P. Silva, Filip Thorn, Damaris Zurell, and Juliano Cabral

Land-use change remains the main driver of biodiversity loss, and fragmentation and habitat loss are expected to lead to further population declines and species losses . We apply a recently developed R package for a spatially-explicit mechanistic simulation model (RangeShiftR), which incorporates habitat suitability, demographic as well as dispersal processes to understand temporal effects of land-use change (Land-use harmonization scenarios for the 1900-2100 period) on abundance and richness of mammalian species in South-Asia. We then compare land-use scenarios with and without protected areas to understand if current spatial conservation strategies are able to sustain viable populations independently of the land-use scenarios followed. Our approach is innovative in assessing how land-use scenarios can influence animal populations through underlying ecological processes.

How to cite: P. Silva, A., Thorn, F., Zurell, D., and Cabral, J.: Land-use change effects on biodiversity through mechanistic simulations: A case study with South-Asian mammals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16051, https://doi.org/10.5194/egusphere-egu21-16051, 2021.

EGU21-13903 | vPICO presentations | ESSI1.6

Fostering International Collaboration Through a Unified Verification, Validation, and Diagnostics Framework - METplus 

Tara Jensen, Marion Mittermaier, Paul Kucera, and Barbara Brown

Verification and validation activities are critical for the success of modeling and prediction efforts at organizations around the world.  Having reproducible results via a consistent framework is equally important for model developers and users alike.  The Model Evaluation Tools (MET) was developed over a decade ago and expanded to the METplus framework with a view towards providing a consistent platform delivering reproducible results.   

The METplus system is an umbrella verification, validation and diagnostic framework for use by thousands of users from both US and international organizations.  These tools are designed to be highly flexible to allow for quick adaption to meet additional evaluation and diagnostic needs.  A suite of python wrappers have been implemented to facilitate a quick set-up and implementation of the system, and to enhance the pre-existing plotting capabilities.  Recently, several organizations within the National Oceanic and Atmospheric Adminstration (NOAA), the United States Department of Defense (DOD), and international partnerships such as Unified Model (UM) Partnership led by the Met Office have adopted the tools for their use both operationally and for research purposes.  Many of these organizations are also now contributing to METplus development, leading to a more robust and dynamic framework for the entire earth system modeling community to use.

This presentation will provide an update on the current status of METplus and how it is being used in across multiple scales and applications.  It will highlight examples of METplus applied to verification and validation efforts throughout the international community to address a range of temporal (hourly forecasts to subseasonal-to-seasonal) and spatial scales (convection allowing to mesoscale, regional to global, tropical to cryosphere to space).

How to cite: Jensen, T., Mittermaier, M., Kucera, P., and Brown, B.: Fostering International Collaboration Through a Unified Verification, Validation, and Diagnostics Framework - METplus , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13903, https://doi.org/10.5194/egusphere-egu21-13903, 2021.

EGU21-4468 | vPICO presentations | ESSI1.6

Free Evaluation System Framework (Freva) - New Features and Development

Etor E. Lucio-Eceiza, Christopher Kadow, Martin Bergemann, Mahesh Ramadoss, Sebastian Illing, Oliver Kunst, Thomas Schartner, Jens Grieger, Mareike Schuster, Andy Richling, Ingo Kirchner, Henning Rust, Philipp Sommer, Ulrich Cubasch, Uwe Ulbrich, Hannes Thiemann, and Thomas Ludwig

The Free Evaluation System Framework (Freva - freva.met.fu-berlin.de , xces.dkrz.de , www-regiklim.dkrz.de - https://github.com/FREVA-CLINT/Freva) is a software infrastructure for standardized data and tool solutions in Earth system science. Freva runs on high performance computers (HPC) to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, where all its features are accessible via shell and web environment. Freva indexes different data projects into one common search environment by storing the metadata information of the self-describing model, reanalysis and observational data sets in a database. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). This implemented metadata system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Facilitation of the provision and usage of tools and climate data automatically increases the number of scientists working with the data sets and identifying discrepancies. Plugins are also able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-system stores every analysis performed with the evaluation system in a database. Configurations and results of the tools can be shared among scientists via shell or web system. Therefore, plugged-in tools benefit from transparency and reproducibility. Furthermore, the system suggests existing results already produced by other users – saving CPU hours, I/O, disk space and time. An integrated web shell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects on a HPC. Freva efficiently frames the interaction between different technologies thus improving the Earth system modeling science. New Features and aspects of further development and collaboration are discussed.

How to cite: Lucio-Eceiza, E. E., Kadow, C., Bergemann, M., Ramadoss, M., Illing, S., Kunst, O., Schartner, T., Grieger, J., Schuster, M., Richling, A., Kirchner, I., Rust, H., Sommer, P., Cubasch, U., Ulbrich, U., Thiemann, H., and Ludwig, T.: Free Evaluation System Framework (Freva) - New Features and Development, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4468, https://doi.org/10.5194/egusphere-egu21-4468, 2021.

EGU21-4918 | vPICO presentations | ESSI1.6

Novel assessment of model relative humidity with satellite probabilistic estimates

Chloé Radice, Hélène Brogniez, Pierre-Emmanuel Kirstetter, and Philippe Chambon

Remote sensing data are often used to assess model forecasts on multiple scales, generally by confronting past simulations to observations. This paper introduces a novel probabilistic  method that evaluates  tropical atmospheric relative humidity (RH) profiles simulated by the global numerical model for weather forecasting ARPEGE  with respect to probability distributions of finer scale satellite observations.   

The reference RH is taken from the SAPHIR microwave sounder onboard the Megha-Tropiques satellite in operations since 2011. ARPEGE simulates the RH field every 6h hours on a 0.25° grid over 18 vertical levels ranging between 950hPa and 100hPa. The reference probabilistic RH field is retrieved from brightness temperatures measured by SAPHIR with a footprint resolution ranging from 10 km (nadir) to 23 km (edge of swath) on 6 vertical layers, ranging from 950hPa to 100hPa. Footprint scale RH are aggregated (convoluted) over the spatial and temporal scale of comparison to match the model resolution and summarize the patterns over a significant period. Comparison  results will be shown over the April-May-June 2018 period for two configurations of the ARPEGE model (two parametrization schemes for convection). The probabilistic comparison is discussed with respect to a classical deterministic comparison of RH values.

This probabilistic approach allows to keep all the sub-grid information and, by looking at the distribution as a whole, avoids the classical determinist simplification that consists of working with a simple “best” estimate. This method allows a finer assessment by working on a case-by-case basis and allowing a characterisation of specific situations. It provides an added-value by accounting for  additional information in the evaluation of the simulated field, especially for model simulations that are close to the traditional mean.

How to cite: Radice, C., Brogniez, H., Kirstetter, P.-E., and Chambon, P.: Novel assessment of model relative humidity with satellite probabilistic estimates, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4918, https://doi.org/10.5194/egusphere-egu21-4918, 2021.

EGU21-3476 | vPICO presentations | ESSI1.6

Recent developments on the Earth System Model Evaluation Tool

Bouwe Andela, Fakhereh Alidoost, Lukas Brunner, Jaro Camphuijsen, Bas Crezee, Niels Drost, Bettina Gier, Birgit Hassler, Peter Kalverla, Axel Lauer, Saskia Loosveldt-Tomas, Ruth Lorenz, Valeriu Predoi, Mattia Righi, Manuel Schlund, Stef Smeets, Javier Vegas-Regidor, Jost Von Hardenberg, Katja Weigel, and Klaus Zimmermann

The Earth System Model Evaluation Tool (ESMValTool) is a free and open-source community diagnostic and performance metrics tool for the evaluation of Earth system models such as those participating in the Coupled Model Intercomparison Project (CMIP). Version 2 of the tool (Righi et al. 2020, www.esmvaltool.org) features a brand new design composed of a core that finds and processes data according to a ‘recipe’ and an extensive collection of ready-to-use recipes and associated diagnostic codes for reproducing results from published papers. Development and discussion of the tool (mostly) takes place in public on https://github.com/esmvalgroup and anyone with an interest in climate model evaluation is welcome to join there.

 

Since the initial release of version 2 in the summer of 2020, many improvements have been made to the tool. It is now more user friendly with extensive documentation available on docs.esmvaltool.org and a step by step online tutorial. Regular releases, currently planned three times a year, ensure that recent contributions become available quickly while still ensuring a high level of quality control. The tool can be installed from conda, but portable docker and singularity containers are also available.

 

Recent new features include a more user-friendly command-line interface, citation information per figure including CMIP6 data citation using ES-DOC, more and faster preprocessor functions that require less memory, automatic corrections for a larger number of CMIP6 datasets, support for more observational and reanalysis datasets, and more recipes and diagnostics.

 

The tool is now also more reliable, with improved automated testing through more unit tests for the core, as well as a recipe testing service running at DKRZ for testing the scientific recipes and diagnostics that are bundled into the tool. The community maintaining and developing the tool is growing, making the project less dependent on individual contributors. There are now technical and scientific review teams that review new contributions for technical quality and scientific correctness and relevance respectively, two new principal investigators for generating a larger support base in the community, and a newly created user engagement team that is taking care of improving the overall user experience.

How to cite: Andela, B., Alidoost, F., Brunner, L., Camphuijsen, J., Crezee, B., Drost, N., Gier, B., Hassler, B., Kalverla, P., Lauer, A., Loosveldt-Tomas, S., Lorenz, R., Predoi, V., Righi, M., Schlund, M., Smeets, S., Vegas-Regidor, J., Von Hardenberg, J., Weigel, K., and Zimmermann, K.: Recent developments on the Earth System Model Evaluation Tool, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3476, https://doi.org/10.5194/egusphere-egu21-3476, 2021.

EGU21-4805 | vPICO presentations | ESSI1.6

Bringing ESMValTool to the Jupyter Lab

Peter C. Kalverla, Stef Smeets, Niels Drost, Bouwe Andela, Fakhereh Alidoost, and Jaro Camphuijsen

Ease of use can easily become a limiting factor to scientific quality and progress. In order to verify and build upon previous results, the ability to effortlessly access and process increasing data volumes is crucial.

To level the playing field for all researchers, a shared infrastructure had to be developed. In Europe, this effort is coordinated mainly through the IS-ENES projects. The current infrastructure provides access to the data as well as compute resources. This leaves the tools to easily work with the data as the main obstacle for a smooth scientific process. Interestingly, not the scarcity of tools, but rather their abundance can lead to diverging workflows that hamper reproducibility.

The Earth System Model eValuation Tool (ESMValTool) was originally developed as a command line tool for routine evaluation of important analytics workflows. This tool encourages some degree of standardization by factoring out common operations, while allowing for custom analytics of the pre-processed data. All scripts are bundled with the tool. Over time this has grown into a library of so-called ‘recipes’.

In the EUCP project, we are now developing a Python API for the ESMValTool. This allows for interactive exploration, modification, and execution of existing recipes, as well as creation of new analytics. Concomitantly, partners in IS-ENES3 are making their infrastructure accessible through JupyterLab. Through the combination of these technologies, researchers can easily access the data and compute, but also the workflows or methods used by their colleagues - all through the web browser. During the vEGU, we will show how this extended infrastructure can be used to easily reproduce, and build upon, previous results.

How to cite: Kalverla, P. C., Smeets, S., Drost, N., Andela, B., Alidoost, F., and Camphuijsen, J.: Bringing ESMValTool to the Jupyter Lab, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4805, https://doi.org/10.5194/egusphere-egu21-4805, 2021.

EGU21-7724 | vPICO presentations | ESSI1.6

New scientific diagnostics in the ESMValTool – an overview

Lisa Bock, Birgit Hassler, and Axel Lauer and the ESMValTool Develpoment Team

The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest major release (v2.0) of the ESMValTool has been officially introduced in August 2020 as a large community effort, and since then several additional smaller releases have followed.

The diagnostic part of the ESMValTool includes a large collection of standard “recipes” for reproducing peer-reviewed analyses of many variables across ESM compartments including atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation gives an overview on the latest scientific diagnostics and metrics added during the last year including examples of applications of these diagnostics to CMIP6 model data.

How to cite: Bock, L., Hassler, B., and Lauer, A. and the ESMValTool Develpoment Team: New scientific diagnostics in the ESMValTool – an overview, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7724, https://doi.org/10.5194/egusphere-egu21-7724, 2021.

EGU21-15681 | vPICO presentations | ESSI1.6

Model evaluation expectations of European ESM communities: results from a survey

Jerome Servonnat, Eric Guilyardi, Zofia Stott, Kim Serradell, Axel Lauer, Klaus Zimmerman, Fanny Adloff, Marie-Claire Greening, Remi Kazeroni, and Javier Vegas

Developing an Earth system model evaluation tool for a broad user community is a real challenge, as the potential users do not necessarily have the same needs or expectations. While many evaluation tasks across user communities include common steps, significant differences are also apparent, not least the investment by institutions and individuals in bespoke tools. A key question is whether there is sufficient common ground to pursue a community tool with broad appeal and application.

We present the main results of a survey carried out by Assimila for the H2020 IS-ENES3 project to review the model evaluation needs of European Earth System Modelling communities. Interviewing approximately 30 participants among several European institutions, the survey targeted a broad range of users, including model developers, model users, evaluation data providers, and infrastructure providers. The output of the study provides an analysis of  requirements focusing on key technical, standards, and governance aspects.

The study used ESMValTool as a  current benchmark in terms of European evaluation tools. It is a community diagnostics and performance metrics tool for the evaluation of Earth System Models that allows for comparison of single or multiple models, either against predecessor versions or against observations. The tool is being developed in such a way that additional analyses can be added. As a community effort open to both users and developers, it encourages open exchange of diagnostic source code and evaluation results. It is currently used in Coupled Model Intercomparison Projects as well as for the development and testing of “new” models.

A key result of the survey is the widespread support for ESMValTool amongst users, developers, and even those who have taken or promote other approaches. The results of the survey identify priorities and opportunities in the further development of the ESMValTool to ensure long-term adoption of the tool by a broad community.

How to cite: Servonnat, J., Guilyardi, E., Stott, Z., Serradell, K., Lauer, A., Zimmerman, K., Adloff, F., Greening, M.-C., Kazeroni, R., and Vegas, J.: Model evaluation expectations of European ESM communities: results from a survey, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15681, https://doi.org/10.5194/egusphere-egu21-15681, 2021.

ESSI1.12 – Novel methods and applications of satellite and aerial imagery

EGU21-15661 | vPICO presentations | ESSI1.12

Simulating Marine Litter observations from space to support Operations Research

Stephen Emsley, Manuel Arias, Théodora Papadopoulou, and François-Régis Martin-Lauzer

An breadboard for end-to-end (E2E) Marine Litter Optical Performance Simulations (ML-OPSI) is being designed in the frame of the ESA Open Space Innovation Platform (OSIP) Campaign to support Earth Observation (EO) scientists with the design of computational experiments for Operations Research. The ML-OPSI breadboard will estimate Marine Litter signal at Top-Of-Atmosphere (TOA) from a set of Bottom-Of-Atmosphere (BOA) scenarios representing the various case studies by the community (e.g., windrows, frontal areas, river mouths, sub-tropical gyres), coming from synthetic data (computer-simulated) or from real observations. It is a modular, pluggable and extensible framework, promoting re-use and be adapted for different missions, sensors and scenarios.

The breadboard consists of (a) the OPSI components for the simulation i.e. the process of using a model to study the characteristics of the system by manipulating variables and by studying the properties of the model allowing an evaluation to optimise performance and make predictions about the real system; and (b) the Marine Litter model components for the detection of marine litter. It shall consider the changes caused in the water reflectance and properties due to marine litter, exploiting gathered information of plastic polymers, different viewing geometries, and atmospheric conditions as naturally occurring. The modules of the breadboard include a Scenario Builder Module (SB) with maximum spatial resolution and best modelling as possible of the relevant physical properties, which for spectral sensors could include high spatial resolution and high spectral density/resolution BOA radiance simulations in the optical to SWIR bands; a Radiative Transfer Module (RTM) transforming water-leaving to TOA reflectance for varying atmospheric conditions and observational geometries; a Scene Generator Module (SGM) which could use Sentinel-2, Landsat, or PRISMA data as reference or any other instrument as pertinent; a Performance Assessment Module (PAM) for ML detection that takes into account the variability of the atmosphere, the sunlight & skylight at BOA, the sea-surface roughness with trains of wind waves & swells, sea-spray (whitecaps), air bubbles in the mixed layer, marine litter dynamics as well as instrumental noise to assess marine litter detection feasibility.

Marine Litter scenarios of reference shall be built based on in-situ campaigns, to reflect the true littering conditions at each case, both in spatial distribution and composition. The breadboard shall be validated over artificial targets at sea in field campaigns as relevant. This might include spectral measurements from ASD, on-field radiometers, and cameras on UAVs, concomitant with Copernicus Sentinel-2 acquisitions. Combined, they can be used to estimate atmospheric contribution and assess performance of the testes processing chain.

This activity collaborates on the ““Remote Sensing of Marine Litter and Debris” IOCCG taskforce.

How to cite: Emsley, S., Arias, M., Papadopoulou, T., and Martin-Lauzer, F.-R.: Simulating Marine Litter observations from space to support Operations Research, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15661, https://doi.org/10.5194/egusphere-egu21-15661, 2021.

EGU21-16478 | vPICO presentations | ESSI1.12

Synergistic use of SMOS and Sentinel-3 for retrieving spatiotemporally estimates of surface soil moisture and evaporative fraction

Maria Piles, Miriam Pablos Hernandez, Mercè Vall-llossera, Gerard Portal, Ionut Sandric, George P. Petropoulos, and Dionisis Hristopulos

Earth Observation (EO) makes it possible to obtain information on key parameters characterizing interactions among Earth’s system components, such as evaporative fraction (EF) and surface soil moisture (SSM). Notably, techniques utilizing EO data of land surface temperature (Ts) and vegetation index (VI) have shown promise in this regard. The present study presents an implementation of a downscaling method that combined the soil moisture product from SMOS and the Fractional Vegetation Cover provided by Sentinel 3 ESA platform.

The applicability of the investigated technique is demonstrated for a period of two years (2017-2018) using in-situ data acquired from five CarboEurope sites and from all the sites available in the REMEDHUS soil moisture monitoring network, representing a variety of climatic, topographic and environmental conditions. Predicted parameters were compared against co-orbital ground measurements acquired from several European sites belonging to the CarboEurope ground observational network.

Results indicated a close agreement between all the inverted parameters and the corresponding in-situ data. SSM maps predicted from the “triangle”  SSM showed a small bias, but a large scatter. The results of this study provide strong supportive evidence of the potential value of the investigated herein methodology in accurately deriving estimates of key parameters characterising land surface interactions that can meet the needs of fine-scale hydrological applications. Moreover, the applicability of the presented approach demonstrates the added value of the synergy between ESA’s operational products acquired from different satellite sensors, namely in this case SMOS & Sentienl-3. As it is not tight to any particular sensor can also be implemented with technologically advanced EO sensors launched recently or planned to be launched.

In the present work Dr Petropoulos participation has received funding from the European Union’s Horizon 2020 research and innovation programme ENViSIoN under the Marie Skłodowska-Curie grant agreement No 752094.

How to cite: Piles, M., Pablos Hernandez, M., Vall-llossera, M., Portal, G., Sandric, I., Petropoulos, G. P., and Hristopulos, D.: Synergistic use of SMOS and Sentinel-3 for retrieving spatiotemporally estimates of surface soil moisture and evaporative fraction, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16478, https://doi.org/10.5194/egusphere-egu21-16478, 2021.

Automatically extracting buildings from remote sensing images (RSI) plays important roles in urban planning, population estimation, disaster emergency response, etc. With the development of deep learning technology, convolutional neural networks (CNN) with better performance than traditional methods have been widely used in extracting buildings from remote sensing imagery (RSI). But it still faces some problems. First of all, low-level features extracted by shallow layers and abstract features extracted by deep layers of the artificial neural network could not be fully fused. it makes building extraction is often inaccurate, especially for buildings with complex structures, irregular shapes and small sizes. Secondly, there are so many parameters that need to be trained in a network, which occupies a lot of computing resources and consumes a lot of time in the training process. By analyzing the structure of the CNN, we found that abstract features extracted by deep layers with low geospatial resolution contain more semantic information. These abstract features are conducive to determine the category of pixels while not sensitive to the boundaries of the buildings. We found the stride of the convolution kernel and pooling operation reduced the geospatial resolution of feature maps, so, this paper proposed a simple and effective strategy—reduce the stride of convolution kernel contains in one of the layers and reduced the number of convolutional kernels to alleviate the above two bottlenecks. This strategy was used to deeplabv3+net and the experimental results for both the WHU Building Dataset and Massachusetts Building Dataset. Compared with the original deeplabv3+net the result showed that this strategy has a better performance. In terms of WHU building data set, the Intersection over Union (IoU) increased by 1.4% and F1 score increased by 0.9%; in terms of Massachusetts Building Dataset, IoU increased by 3.31% and F1 score increased by 2.3%.

How to cite: Chen, M., Wu, J., and Tian, F.: Reducing the stride of the convolution kernel: a simple and effective strategy to increase the performance of CNN in building extraction from remote sensing image, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10783, https://doi.org/10.5194/egusphere-egu21-10783, 2021.

EGU21-3740 | vPICO presentations | ESSI1.12

Deep learning for extracting water body from Sentinel-2 MSI imagery

Shuren Chou

Deep learning has a good capacity of hierarchical feature learning from unlabeled remote sensing images. In this study, the simple linear iterative clustering (SLIC) method was improved to segment the image into good quality super-pixels. Then, we used the convolutional neural network (CNN) to extract of water bodies from Sentinel-2 MSI data using deep learning technique. In the proposed framework, the improved SLIC method obtained the correct water bodies boundary by optimizing the initial clustering center, designing a dynamic distance measure, and expanding the search space. In addition, it is different from traditional extraction of water bodies methods that cannot achieve multi-level water bodies detection. Experimental results showed that this method had higher detection accuracy and robustness than other methods. This study was able to extract water bodies from remotely sensed images with deep learning and to conduct accuracy assessment.

How to cite: Chou, S.: Deep learning for extracting water body from Sentinel-2 MSI imagery, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3740, https://doi.org/10.5194/egusphere-egu21-3740, 2021.

EGU21-14869 | vPICO presentations | ESSI1.12

A new MODIS-Landsat fusion method to reconstruct Landsat NDVI time-series data

Xiaofang Ling and Ruyin Cao

The Normalized Difference Vegetation Index (NDVI) data provided by the satellite Landsat have rich historical archive data with a spatial resolution of 30 m. However, the Landsat NDVI time-series data are quite discontinuous due to its 16-day revisit cycle, cloud contamination and some other factors. The spatiotemporal data fusion technology has been proposed to reconstruct continuous Landsat NDVI time-series data by blending the MODIS data with the Landsat data. Although a number of spatiotemporal fusion algorithms have been developed during the past decade, most of the existing algorithms usually ignore the effective use of partially cloud-contaminated images. In this study, we presented a new spatiotemporal fusion method, which employed the cloud-free pixels in the partially cloud-contaminated images to improve the performance of MODIS-Landsat data fusion by Correcting the inconsistency between MODIS and Landsat data in Spatiotemporal DAta Fusion (called CSDAF). We tested the new method at three sites covered by different vegetation types, including deciduous forests in the Shennongjia Forestry District of China (SNJ), evergreen forests in Southeast Asia (SEA), and the irrigated farmland in the Coleambally irrigated area of Australia (CIA). Two experiments were designed. In experiment I, we first simulated different cloud coverages in cloud-free Landsat images and then used both CSDAF and the recently developed IFSDAF method to restore these “missing” pixels for quantitative assessments. Results showed that CSDAF performed better than IFSDAF by achieving the smaller average Root Mean Square Error (RMSE) values (0.0767 vs. 0.1116) and the larger average Structural SIMilarity index (SSIM) values (0.8169 vs. 0.7180). In experiment II, we simulated the scenario of “inconsistence” between MODIS and Landsat by simulating different levels of noise on MODIS and Landsat data. Results showed that CSDAF was able to reduce the influence of the inconsistence between MODIS and Landsat data on MODIS-Landsat data fusion to some extent. Moreover, CSDAF is simple and can be implemented on the Google Earth Engine. We expect that CSDAF is potentially to be used to reconstruct Landsat NDVI time-series data at the regional and continental scales.

How to cite: Ling, X. and Cao, R.: A new MODIS-Landsat fusion method to reconstruct Landsat NDVI time-series data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14869, https://doi.org/10.5194/egusphere-egu21-14869, 2021.

EGU21-10374 | vPICO presentations | ESSI1.12

Dynamic evaluation of small urban green areas at local level using GEOBIA

Ana-Maria Popa, Diana Andreea Onose, Ionut Cosmin Sandric, Simona Raluca Gradinaru, and Athanasios Alexandru Gavrilidis

Urban green infrastructure has various benefits known as ecosystem services such as regulating, cultural, provisioning and supporting services. Among the provided benefits there are decrease of air temperature, increasing humidity and mitigating urban heat island as regulating services; human-nature relations as cultural services; improving air quality, carbon sequestration as provisioning services and photosynthesis, nutrient and water cycling as supporting services. The high intensity of the urbanization process across the last decades coupled with weak legislative frameworks resulted both in large areas affected by urban sprawl and densification of the existing urban fabric. Both phenomenon generated loss in open spaces, especially green areas. In the context of the sustainable urbanization promoted by HABITAT Agenda, the knowledge related with the distribution, size and quality of urban green areas represents a priority. The study aim is to identify small urban green areas at local level at different time moments for a dynamic evaluation. We focused on small urban green areas since they are scarcely analysed even if their importance for the urban quality of life Is continuously increasing given the urbanization process. We used satellite imagery acquired by Planet Satellite Constellations, with a spatial resolution of 3.7 m and daily coverage, for extracting green areas. The images were processed using Geographic Object-Based Image Analysis (OBIA) techniques implemented in Esri ArcGIS Pro. The spatial analysis we performed generated information about distribution, surfaces, quality (based on NDVI) and dynamic of small urban green areas. The results are connected with the local level development of the urban areas we analysed, but also with the population consumption pattern for leisure services, housing, transport or other public utilities. The analysis can represent a complementary method for extracting green areas at urban level and can support the data collection for calculating urban sustainability indicators.

How to cite: Popa, A.-M., Onose, D. A., Sandric, I. C., Gradinaru, S. R., and Gavrilidis, A. A.: Dynamic evaluation of small urban green areas at local level using GEOBIA, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10374, https://doi.org/10.5194/egusphere-egu21-10374, 2021.

EGU21-12722 | vPICO presentations | ESSI1.12

SemantiX: a cross-sensor semantic EO data cube to open and leverage AVHRR time-series and essential climate variables with scientists and the public 

Hannah Augustin, Martin Sudmanns, Helga Weber, Andrea Baraldi, Stefan Wunderle, Christoph Neuhaus, Steffen Reichel, Lucas van der Meer, Philipp Hummer, and Dirk Tiede

Long time series of essential climate variables (ECVs) derived from satellite data are key to climate research. SemantiX is a research project to establish, complement and expand Advanced Very High Resolution Radiometer (AVHRR) time series using Copernicus Sentinel-3 A/B imagery, making them and derived ECVs accessible using a semantic Earth observation (EO) data cube. The Remote Sensing Research Group at the University of Bern has one of the longest European times series of AVHRR imagery (1981-now). Data cube technologies are a game changer for how EO imagery are stored, accessed, and processed. They also establish reproducible analytical environments for queries and information production and are able to better represent multi-dimensional systems. A semantic EO data cube is a newly coined concept by researchers at the University of Salzburg referring to a spatio-temporal data cube containing EO data, where for each observation at least one nominal (i.e., categorical) interpretation is available and can be queried in the same instance (Augustin et al. 2019). Offering analysis ready data (i.e., calibrated and orthorectified AVHRR Level 1c data) in a data cube along with semantic enrichment reduces barriers to conducting spatial analysis through time based on user-defined AOIs.

This contribution presents a semantic EO data cube containing selected ECV time series (i.e., snow cover extent, lake surface water temperature, vegetation dynamics) derived from AVHRR imagery (1981-2019), a temporal and spatial subset of AVHRR Level 1c imagery (updated after Hüsler et al. 2011) from 2016 until 2019, and, for the later, semantic enrichment derived using the Satellite Image Automatic Mapper (SIAM). SIAM applies a fully automated, spectral rule-based routine based on a physical-model to assign spectral profiles to colour names with known semantic associations; no user parameters are required, and the result is application-independent (Baraldi et al. 2010). Existing probabilistic cloud masks (Musial et al. 2014) generated by the Remote Sensing Research Group at the University of Bern are also included as additional data-derived information to support spatio-temporal semantic queries. This implementation is a foundational step towards the overall objective of combining climate-relevant AVHRR time series with Sentinel-3 imagery for the Austrian-Swiss alpine region, a European region that is currently experiencing serious changes due to climate change that will continue to create challenges well into the future.

Going forward, this semantic EO data cube will be linked to a mobile citizen science smartphone application. For the first time, scientists in disciplines unrelated to remote sensing, students, as well as interested members of the public will have direct and location-based access to these long EO data time series and derived information. SemantiX runs from August 2020-2022 funded by the Austrian Research Promotion Agency (FFG) under the Austrian Space Applications Programme (ASAP 16) (project #878939) in collaboration with the Swiss Space Office (SSO).

How to cite: Augustin, H., Sudmanns, M., Weber, H., Baraldi, A., Wunderle, S., Neuhaus, C., Reichel, S., van der Meer, L., Hummer, P., and Tiede, D.: SemantiX: a cross-sensor semantic EO data cube to open and leverage AVHRR time-series and essential climate variables with scientists and the public , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12722, https://doi.org/10.5194/egusphere-egu21-12722, 2021.

EGU21-10882 | vPICO presentations | ESSI1.12

Using Sentinel-5P time-series products for Nitrogen Dioxide (NO2) Spatio-Temporal Analysis over Europe During the Coronavirus Pandemic Lockdown

Marina Vîrghileanu, Ionuț Săvulescu, Bogdan-Andrei Mihai, Constantin Nistor, and Robert Dobre

Nitrogen dioxide (NO2) is one of the main air quality pollutants of concern in many urban and industrial areas worldwide. Being emitted by fossil fuel burning activities including mainly road traffic, the NO2 pollution is responsible for population health degradation and secondary pollutants formation as nitric acid and ozone. In the European region, almost 20 countries exceeded in 2017 the NO2 annual limit values imposed by European Commission Directive 2008/50/EC (EEA, 2019). Therefore, NO2 pollution monitoring and regulation is a necessary task to help decision makers to search for a sustainable solution for environmental quality and population health status improvement. In this study, we propose a comparative analysis of the tropospheric NO2 column density spatial configuration over Europe between similar periods from 2019 and 2020, based on ESA Copernicus Sentinel-5P products. Our results highlight the NO2 pollution dynamics over the abrupt transition from a normal condition situation to the COVID-19 outbreak context, characterized by short-time decrease of traffic intensities and industrial activities, this situation being also reflected by the national level statistics referring to COVID-19 cases and economic indicatiors. The validation approach provides high correlation between TROPOMI derived data and independent data from ground-based observations with encouraging values of the R2 ranging between 0.5 and 0.75 in different locations.  

How to cite: Vîrghileanu, M., Săvulescu, I., Mihai, B.-A., Nistor, C., and Dobre, R.: Using Sentinel-5P time-series products for Nitrogen Dioxide (NO2) Spatio-Temporal Analysis over Europe During the Coronavirus Pandemic Lockdown, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10882, https://doi.org/10.5194/egusphere-egu21-10882, 2021.

EGU21-14320 | vPICO presentations | ESSI1.12

Improving the Quality Assessment band in Landsat cloud images for the application of cloud removal  

Boli Yang, Yan Feng, and Ruyin Cao

Cloud contamination is a serious obstacle for the application of Landsat data. Thick clouds can completely block land surface information and lead to missing values. The reconstruction of missing values in a Landsat cloud image requires the cloud and cloud shadow mask. In this study, we raised the issue that the quality of the quality assessment (QA) band in current Landsat products cannot meet the requirement of thick-cloud removal. To address this issue, we developed a new method (called Auto-PCP) to preprocess the original QA band, with the ultimate objective to improve the performance of cloud removal on Landsat cloud images. We tested the new method at four test sites and compared cloud-removed images generated by using three different QA bands, including the original QA band, the modified QA band by a dilation of two pixels around cloud and cloud shadow edges, and the QA band processed by Auto-PCP (“QA_Auto-PCP”). Experimental results, from both actual and simulated Landsat cloud images, show that QA_Auto-PCP achieved the best visual assessment for the cloud-removed images, and had the smallest RMSE values and the largest Structure SIMilarity index (SSIM) values. The improvement for the performance of cloud removal by QA_Auto-PCP is because the new method substantially decreases omission errors of clouds and shadows in the original QA band, but meanwhile does not increase commission errors. Moreover, Auto-PCP is easy to implement and uses the same data as cloud removal without additional image collections. We expect that Auto-PCP can further popularize cloud removal and advance the application of Landsat data.     

 

Keywords: Cloud detection, Cloud shadows, Cloud simulation, Cloud removal, MODTRAN

How to cite: Yang, B., Feng, Y., and Cao, R.: Improving the Quality Assessment band in Landsat cloud images for the application of cloud removal  , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14320, https://doi.org/10.5194/egusphere-egu21-14320, 2021.

The climate is strongly affected by interaction with clouds. To reduce major errors in climate predictions, this interaction requires a much finer understanding of cloud physics than current knowledge. Current knowledge is based on empirical remote sensing data that is analyzed under the assumption that the atmosphere and clouds are made of very broad and uniform layers. To help to overcome this problem, 3D scattering computed tomography (CT) has been suggested as a way to study clouds. 

CT is a powerful way to recover the inner structure of three dimensional (3D) volumetric heterogeneous objects. CT has extensive use in many research and operational domains. Aside from its common usage in medicine, CT is used for sensing geophysical terrestrial structures, atmospheric pollution and fluid dynamics. CT requires imaging from multiple directions and in nearly all CT approaches, the object is considered static during image acquisition. However, in many cases, the object changes while multi-view images are acquired sequentially. Thus, an effort has been invested to expand 3D CT to four-dimensional (4D) spatiotemporal CT. This effort has been directed at linear CT modalities. Since linear CT is computationally easier to handle, it has been a popular method for medical imaging. However, these linear CT modalities do not apply to clouds: clouds constitute a scattering medium, and therefore radiative transfer is non-linear in the clouds’ content.

This work focuses on the challenge of 4D scattering CT of clouds. Scattering CT of clouds requires high-resolution multi-view images from space. There are spaceborne and high-altitude systems that may provide such data, for example AirMSPI, MAIA, HARP and AirHARP. An additional planned system is the CloudCT formation, funded by the ERC. However, these systems are costly. Deploying them in large numbers to simultaneously acquire images of the same clouds from many angles can be impractical. Therefore, the platforms are planned to move above the clouds: a sequence of images is taken, in order to span and sample a wide angular breadth. However, the clouds evolve while the angular span is sampled.

We pose conditions under which this task can be performed. These regard temporal sampling and angular breadth, in relation to the correlation time of the evolving cloud. Then, we generalize scattering CT. The generalization seeks spatiotemporal recovery of the cloud extinction field in high resolution (10m), using data taken by a small number of moving cameras. We present an optimization-based method to reach this, and then demonstrate the method both in rigorous simulations and on real data.

How to cite: Ronen, R., Schechner, Y. Y., and Eytan, E.: Spatiotemporal tomography based on scattered multiangular signals and its use for resolving evolving clouds using moving platforms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10892, https://doi.org/10.5194/egusphere-egu21-10892, 2021.

EGU21-12831 | vPICO presentations | ESSI1.12

The potential of monitoring traffic conditions up to 15 times a day using sub-meter resolution EO images

Refiz Duro, Georg Neubauer, and Alexandra-Ioana Bojor

Urbanization and the trend of people moving to cities often leads to problematic traffic conditions, which can be very challenging for traffic management. It can hamper the flow of people and goods, negatively affecting businesses through delays and the inability to estimate travel times and thus plan, as well as the environment and health of population due to increased fuel consumption and subsequent air pollution. Many cities have a policy and rules to manage traffic, ranging from standard traffic lights to more dynamic and adaptable solutions involving in-road sensors or cameras to actively modify the duration of traffic lights, or even more sophisticated IoT solutions to monitor and manage the conditions on a city-wide scale. The core to these technologies and to decision making processes is the availability of reliable data on traffic conditions, and better yet real-time data. Thus, a lot of cities are still coping with the lack of good spatial and temporal data coverage, as many of these solutions are requiring not only changes to the infrastructure, but also large investments.

One approach is to exploit the current and the forthcoming advancements made available by Earth Observation (EO) satellite technologies. The biggest advantage is EOs great spatial coverage ranging from a few km² to 100 km² per image on a spatial resolution down to 0.3m, thus allowing for a quick, city-spanning data collection. Furthermore, the availability of imaging sensors covering specific bands allows the constituent information within an image to be separated and the information to be leveraged.

In this respect, we present the findings of our work on multispectral image sets collected on three occasions in 2019 using very high resolution WorldView-3 satellite. We apply a combination of machine learning and PCA methods to detect vehicles and devise their kinematic properties (e.g., movement, direction, speed), only possible with satellites with a specific design allowing for short time lags between imaging in different spectral bands. As these data basically constitute a time-series, we will discuss how the results presented fully apply to the forthcoming WorldView-Legion constellation of satellites providing up to 15 revisits per day, and thus near-real time traffic monitoring and its impact on the environment.

How to cite: Duro, R., Neubauer, G., and Bojor, A.-I.: The potential of monitoring traffic conditions up to 15 times a day using sub-meter resolution EO images, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12831, https://doi.org/10.5194/egusphere-egu21-12831, 2021.

EGU21-13727 | vPICO presentations | ESSI1.12

ICEComb − A New Software Tool for Satellite Laser Altimetry Data Processing and Visualisation

Bruno Silva, Luiz Guerreiro Lopes, and Pedro Campos

Processing, handling and visualising the large data volume produced by satellite altimetry missions is a challenging task. A reference tool for the visualisation of satellite laser altimetry data is the OpenAltimetry platform, a tool that provides altimetry-specific data from the Ice, Cloud, and land Elevation Satellite (ICESat) and ICESat-2 satellite missions through a web-based interactive interface. However, by focusing only on altimetry data, that tool leaves out access to many other equally important information existing in the data products from both missions.

The main objective of the work reported here was the development of a new web-based tool, called ICEComb, that offers end users the ability to access all the available data from both satellite missions, visualise and interact with them on a geographic map, store the data records locally, and process and explore data in an efficient, detailed and meaningful way, thus providing an easy-to-use software environment for satellite laser altimetry data analysis and interpretation.

The proposed tool is intended to be mainly used by researchers and scientists to aid their work using ICESat and ICESat-2 data, offering users a ready-to-use system to rapidly access the raw collected data in a visually engaging way, without the need to have prior understanding of the format, structure and parameters of the data products. In addition, the architecture of the ICEComb tool was developed with possible future expansion in mind, for which well-documented and standard languages were used in its implementation. This allows, e.g., to extend its applicability to data from other satellite laser altimetry missions and integrate models that can be coupled with ICESat and ICESat-2 data, thus expanding and enriching the context of studies carried out with such data.

The use of the ICEComb tool is illustrated and demonstrated by its application to ICESat/GLAS measurements over Lake Mai-Ndombe, a large and shallow freshwater lake located within the Ngiri-Tumba-Maindombe area, one of the largest Ramsar wetlands of international importance, situated in the Cuvette Centrale region of the Congo Basin.

Keywords: Laser altimetry, ICESat/GLAS, software tool design, data visualization, Congo Basin.

Acknowledgement. This work was partially supported by the Portuguese Foundation for Science and Technology (FCT) through LARSyS − FCT Pluriannual funding 2020−2023.

How to cite: Silva, B., Guerreiro Lopes, L., and Campos, P.: ICEComb − A New Software Tool for Satellite Laser Altimetry Data Processing and Visualisation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13727, https://doi.org/10.5194/egusphere-egu21-13727, 2021.

EGU21-13762 | vPICO presentations | ESSI1.12

Landsat thermal infrared to detect sub-canopy riparian flooding

Emanuel Storey, Witold Krajewski, and Efthymios Nikolopoulos

Satellite based flood detection can enhance understanding of risk to humans and infrastructures, geomorphic processes, and ecological effects.  Such application of optical satellite imagery has been mostly limited to the detection of water exposed to sky, as plant canopies tend to obstruct water visibility in short electromagnetic wavelengths.  This case study evaluates the utility in multi-temporal thermal infrared observations from Landsat 8 as a basis for detecting sub-canopy fluvial inundation resulting in ambient temperature change.

We selected three flood events of 2016 and 2019 along sections of the Mississippi, Cedar, and Wapsipinicon Rivers located in Iowa, Minnesota, and Wisconsin, United States.  Classification of sub-canopy water involved logical, threshold-exceedance criteria to capture thermal decline within channel-adjacent vegetated zones.  Open water extent in the floods was mapped based on short-wave infrared thresholds determined parametrically from baseline (non-flooded) observations.  Map accuracy was evaluated using higher-resolution (0.5–5.0 m) synchronic optical imagery.

Results demonstrate improved ability to detect sub-canopy inundation when thermal infrared change is incorporated: sub-canopy flood class accuracy was comparable to that of open water in previous studies.  The multi-temporal open-water mapping technique yielded high accuracy as compared to similar studies.  This research highlights the utility of Landsat thermal infrared data for monitoring riparian inundation and for validating other remotely sensed and simulated flood maps.

How to cite: Storey, E., Krajewski, W., and Nikolopoulos, E.: Landsat thermal infrared to detect sub-canopy riparian flooding, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13762, https://doi.org/10.5194/egusphere-egu21-13762, 2021.

The use of remote sensing in mineral detection and lithological mapping has become a generally accepted augmentative tool in exploration. With the advent of multispectral sensors (e.g. ASTER, Landsat, Sentinel and PlanetScope) having suitable wavelength coverage and bands in the Shortwave Infrared (SWIR) and Thermal Infrared (TIR) regions, multispectral sensors have become increasingly efficient at routine lithological discrimination and mineral potential mapping. It is with this paradigm in mind that this project sought to evaluate and discuss the detection and mapping of vanadium bearing magnetite, found in discordant bodies and magnetite layers, on the Eastern Limb of the Bushveld Complex. The Bushveld Complex hosts the world’s largest resource of high-grade primary vanadium in magnetitite layers, so the wide distribution of magnetite, its economic importance, and its potential as an indicator of many important geological processes warranted the delineation of magnetite.

 

The detection and mapping of the vanadium bearing magnetite was evaluated using specialized traditional, and advanced machine learning algorithms. Prior to this study, few studies had looked at the detection and exploration of magnetite using remote sensing, despite remote sensing tools having been regularly applied to diverse aspects of geosciences. Maximum Likelihood, Minimum Distance to Means, Artificial Neural Networks, Support Vector Machine classification algorithms were assessed for their respective ability to detect and map magnetite using the PlanetScope data in ENVI, QGIS, and Python. For each classification algorithm, a thematic landcover map was attained and the accuracy assessed using an error matrix, depicting the user's and producer's accuracies, as well as kappa statistics.

 

The Maximum Likelihood Classifier significantly outperformed the other techniques, achieving an overall classification accuracy of 84.58% and an overall kappa value of 0.79. Magnetite was accurately discriminated from the other thematic landcover classes with a user’s accuracy of 76.41% and a producer’s accuracy of 88.66%. The erroneous classification of some mining activity pixels as magnetite in the Maximum Likelihood was inherent to all classification algorithms. The overall results of this study illustrated that remote sensing techniques are effective instruments for geological mapping and mineral investigation, especially in iron oxide mineralization in the Eastern Limb of Bushveld Complex. 

 

How to cite: Twala, M., Roberts, J., and Munghemezulu, C.: Use of multispectral remote sensing data to map magnetite bodies in the Bushveld Complex, South Africa: a case study of Roossenekal, Limpopo., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7932, https://doi.org/10.5194/egusphere-egu21-7932, 2021.

EGU21-7895 | vPICO presentations | ESSI1.12

Application of optical and radar data for lineaments mapping in Kerdous inlier of the Anti Atlas belt, Morocco

Amine Jellouli, Abderrazak El Harti, Zakaria Adiri, Mohcine Chakouri, Jaouad El Hachimi, and El Mostafa Bachaoui

Lineament mapping is an important step for lithological and hydrothermal alterations mapping. It is considered as an efficient research task which can be a part of structural investigation and mineral ore deposits identification. The availability of optical as well as radar remote sensing data, such as Landsat 8 OLI, Terra ASTER and ALOS PALSAR data, allows lineaments mapping at regional and national scale. The accuracy of the obtained results depends strongly on the spatial and spectral resolution of the data. The aim of this study was to compare Landsat 8 OLI, Terra ASTER, and radar ALOS PALSAR satellite data for automatic and manual lineaments extraction. The module Line of PCI Geomatica software was applied on PC1 OLI, PC3 ASTER and HH and HV polarization images to automatically extract geological lineaments. However, the manual extraction was achieved using the RGB color composite of the directional filtered images N - S (0°), NE - SW (45°) and E - W (90°) of the OLI panchromatic band 8. The obtained lineaments from automatic and manual extraction were compared against the faults and photo-geological lineaments digitized from the existing geological map of the study area. The extracted lineaments from PC1 OLI and ALOS PALSAR polarizations images showed the best correlation with faults and photo-geological lineaments. The results indicate that the lineaments extracted from HH and HV polarizations of ALOS PALSAR radar data used in this study, with 1499 and 1507 extracted lineaments, were more efficient for structural lineament mapping, as well as the PC1 OLI image with 1057 lineaments.

Keywords Remote Sensing . OLI. ALOS PALSAR . ASTER . Kerdous Inlier . Anti Atlas

How to cite: Jellouli, A., El Harti, A., Adiri, Z., Chakouri, M., El Hachimi, J., and Bachaoui, E. M.: Application of optical and radar data for lineaments mapping in Kerdous inlier of the Anti Atlas belt, Morocco, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7895, https://doi.org/10.5194/egusphere-egu21-7895, 2021.

EGU21-6438 | vPICO presentations | ESSI1.12

Improving the Classification Accuracy of Fragmented Cropland by using an Advanced Classification Algorithm

Dr. Shreedevi Moharana, Dr. BVNP Kambhammettu, Mr. Syam Chintala, Ms. Arjangi Sandhya Rani, and Dr. Ram Avtar

Fragmented crop land and marginal landholdings play an important role to classify the landuse and adopt different cropping and management practices. Here the implementation crop classification algorithms are very much difficult and produce results with lower accuracy. Static imagery captured in the optical bands are often contaminated with cloud cover and fail to detect the phenological as well as the structural changes happening during the crop growth. This is very common and most typical in Indian climatic condition. Here, during monsoon period capturing temporal satellite images of the crop periods is a very challenging task. Therefore, the present study aims at application of a novel crop classification algorithm that utilizes the temporal patterns of synthetic aperture radar (SAR) datasets from Sentinel-1 in mapping of landuse of an agriculture system, that is fragmented, small and heterogeneous in nature. Here we used different polarization of Sentinel-1 datasets and developed the temporal crop patterns of different crops grown in semi-arid region of India. Further, an advanced classification algorithm such as time weighted dynamic time wrapping (TWDTW) is employed to classify the cropland with a higher accuracy. Pixel based image analysis was carried out and tested their applicability for cropland mapping. In-situ data sets are collected from the study site to validate the exhibited results from classification outputs. The overall accuracy of the pixel based TWDTW method performed very good results with accuracy of 63 %. The Kappa coefficient is found to be 0.58. The findings confirmed that the pixel based TWDTW algorithm has the potential to delineate the croplands, which were subjected to varying irrigation treatments and management practices, using sentinel-1 datasets.

Keywords: crop classification, landuse, image analysis, Sentinel-1, TWDTW

How to cite: Moharana, Dr. S., Kambhammettu, Dr. B., Chintala, Mr. S., Sandhya Rani, Ms. A., and Avtar, Dr. R.: Improving the Classification Accuracy of Fragmented Cropland by using an Advanced Classification Algorithm, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6438, https://doi.org/10.5194/egusphere-egu21-6438, 2021.

ESSI1.19 – Stability and Accuracy of Earth satellite measurements through calibration and validation

EGU21-1327 | vPICO presentations | ESSI1.19

Community geometric standards for remote sensing products

Guoqing (Gary) Lin, Robert Wolfe, Bin Tan, and Jaime Nickeson

We have developed a set of geometric standards for assessing earth observing data products derived from space-borne remote sensors.  We have worked with the European Space Agency (ESA) Earthnet Data Assessment Pilot (EDAP) project to provide a set of guidelines to assess geometric performance in data products from commercial electronic-optical remote sensors aboard satellites such as those from Planet Labs. The guidelines, or the standards, are based on performance from a few NASA procured sensors, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) sensors, the Visible Infrared Imaging Radiometer Suite (VIIRS) sensors and the Advanced Baseline Imager (ABI) sensors. The standards include sensor spatial response, absolute positional accuracy, and band-to-band co-registration. They are tiered in “basic”, “intermediate” and “goal” criteria. These are important geometric factors affecting scientific use of remote sensing data products. We also discuss possible approaches achieving the highest goal in geometric performance standards.

How to cite: Lin, G. (., Wolfe, R., Tan, B., and Nickeson, J.: Community geometric standards for remote sensing products, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1327, https://doi.org/10.5194/egusphere-egu21-1327, 2021.

EGU21-7919 | vPICO presentations | ESSI1.19

Calibration and Validation of Infrared Sounders with Moon and Mercury

Martin Burgdorf, Stefan A. Buehler, Viju John, Thomas Müller, and Marc Prange

Serendipitous observations of airless bodies of the inner solar system provide a unique means to the calibration of instruments on meteorological research satellites, because the physical properties of their surfaces change very little, even on large time scales. We investigated how certain instrumental effects can be characterised with observations of the Moon and Mercury. For this we identified and analysed intrusions of the Moon in the deep space views of HIRS/2, /3, and /4 (High-resolution Infrared Sounder) on various satellites in polar orbits and as well some images obtained with SEVIRI (Spinning Enhanced Visible Infra-Red Imager) on MSG-3 and -4 (Meteosat Second Generation), which had Mercury standing close to the Earth in the rectangular field of view.

A full-disk, infrared Moon model was developed that describes how the lunar flux density depends on phase angle and wavelength. It is particularly helpful for inter-calibration, checks of the photometric consistency of the sounding channels, and the calculation of an upper limit on the non-linearity of the shortwave channels of HIRS. In addition, we used the Moon to determine the co-registration of the different spectral channels.

Studies of the channel alignment are also presented for SEVIRI, an infrared sounder with an angular resolution about a hundred times better than HIRS. As we wanted to check the image quality of this instrument with a quasi-point source as well, we replaced here the Moon with Mercury. We found the typical smearing of the point spread function in the scan direction and occasionally a nearby ghost image, which is three to four times fainter than the main image of the planet. Both effects cause additional uncertainties of the photometric calibration.  

How to cite: Burgdorf, M., Buehler, S. A., John, V., Müller, T., and Prange, M.: Calibration and Validation of Infrared Sounders with Moon and Mercury, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7919, https://doi.org/10.5194/egusphere-egu21-7919, 2021.

EGU21-8481 | vPICO presentations | ESSI1.19

Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) Newly Released V5.2 Validation of Ozone and Water Vapor Data

Susan Kizer, David Flittner, Marilee Roell, Robert Damadeo, Carrie Roller, Dale Hurst, Emrys Hall, Allen Jordan, Patrick Cullis, Bryan Johnson, and Richard Querel

The Stratospheric Aerosol and Gas Experiment III (SAGE III) instrument installed on the International Space Station (ISS) has completed over three and a half years of data collection and production of science data products. The SAGE III/ISS is a solar and lunar occultation instrument that scans the light from the Sun and Moon through the limb of the Earth’s atmosphere to produce vertical profiles of aerosol, ozone, water vapor, and other trace gases. It continues the legacy of previous SAGE instruments dating back to the 1970s to provide data continuity of stratospheric constituents critical for assessing trends in the ozone layer. This presentation shows the validation results of comparing SAGE III/ISS ozone and water vapor vertical profiles from the newly released v5.2 science product with those of in situ and satellite data .

How to cite: Kizer, S., Flittner, D., Roell, M., Damadeo, R., Roller, C., Hurst, D., Hall, E., Jordan, A., Cullis, P., Johnson, B., and Querel, R.: Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) Newly Released V5.2 Validation of Ozone and Water Vapor Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8481, https://doi.org/10.5194/egusphere-egu21-8481, 2021.

EGU21-8702 | vPICO presentations | ESSI1.19

Atmospheric Correction Inter-comparison eXercise: the second implementation 

Georgia Doxani, Eric F. Vermote, Sergii Skakun, Ferran Gascon, and Jean-Claude Roger

The atmospheric correction inter-comparison exercise (ACIX) is an international initiative to benchmark various state-of-the-art atmospheric correction (AC) processors. The first inter-comparison exercise initiated in 2016 with the collaboration of European Space Agency (ESA) and National Aeronautics and Space Administration (NASA) in the frame of the CEOS WGCV (Committee on Earth Observation Satellites, Working Group on Calibration & Validation). The evolution of the participating processors and the increasing interest of AC community to repeat and improve such experiment stimulated the continuation of ACIX and its second implementation (ACIX-II). In particular, 12 AC developer teams from Europe and USA participated in ACIX-II over land sites. In this presentation the benchmarking protocol, i.e. test sites, input data, inter-comparison metrics, etc. will be briefly described and some representative results of ACIX-II will be presented. The inter-comparison outputs varied depending on the sensors, products and sites, demonstrating the strengths and weaknesses of the corresponding processors. In continuation of ACIX-I achievements, the outcomes of the second one are expected to provide an enhanced standardised approach to inter-compare AC processing products, i.e. Aerosol Optical Thickness (AOT), Water Vapour (WV) and Surface Reflectance (SR), and quantitively assessed their quality when in situ measurements are available.

How to cite: Doxani, G., Vermote, E. F., Skakun, S., Gascon, F., and Roger, J.-C.: Atmospheric Correction Inter-comparison eXercise: the second implementation , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8702, https://doi.org/10.5194/egusphere-egu21-8702, 2021.

The NASA/ESA Joint Program Planning Group (JPPG) subgroup on satellite calibration/validation was created to facilitate coordinated efforts between ESA, NASA, and their respective investigator communities to enhance calibration and/or validation activities for current and/or future satellite missions. The cooperation enabled through this activity includes airborne campaigns, use of surface-based measurements, and satellite-to-satellite intercomparisons. Numerous examples of such activities exist over the ten years of the JPPG. In this talk, examples of calibration/validation focused activities, accomplishments, and future plans will be presented. A particular focus will be on how the COVID-19 pandemic has affected field work planned for 2020 and 2021.  The JPPG subgroup also includes joint European-US studies of satellite results that integrate the results of both parties’ observational capabilities, and the status of those activities will be presented as well.

How to cite: Kaye, J. and Davidson, M.: Satellite Calibration/Validation and Related Activities Carried out through NASA/ESA Joint Program Planning Group Subgroup, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8731, https://doi.org/10.5194/egusphere-egu21-8731, 2021.

EGU21-9030 | vPICO presentations | ESSI1.19

Characterization of the in-flight spectral response function of Geostationary Environment Monitoring Spectrometer (GEMS) retrieved using observed solar irradiance

Mina Kang, Myoung-Hwan Ahn, Dai Ho Ko, Jhoon Kim, Dennis Nicks, Mijin Eo, Yeeun Lee, Kyung-Jung Moon, and Dong-Won Lee

The successful launch of Geostationary Environment Monitoring Spectrometer (GEMS) onboard the Geostationary Korea Multipurpose Satellite 2B (GK-2B) opens up a new possibility to provide daily air quality information for trace gases and aerosols over East Asia with high spatiotemporal resolution. As a part of major efforts to calibrate and validate the performance of the GEMS, accurate characterization of the spectral response functions (SRFs) is critical. The characteristics of preflight SRFs examined in terms of shape, width, skewness, and kurtosis vary smoothly along both the spectral and spatial direction thanks to highly symmetrical optic system of GEMS. While the preflight SRFs are determined with high accuracy, there is possibility of changes of in-flight SRFs during the harsh launch processes and/or operations over the mission lifetime. Thus, it is important to verify the in-flight SRFs after launch and to continue monitoring of their variability over time to assure the reliable trace gases retrievals. Here, we retrieve the in-flight SRFs for all spectral and spatial domain of the GEMS using spectral fitting of observed daily solar measurement and high-resolution solar reference spectrum. A variety of analytic model functions including hybrid form of Gaussian and flat-topped function, asymmetric super Gaussian, Voigt function are tested to determine the best representative function for GEMS SRF. The SRFs retrieved from early solar irradiances measured during the in-orbit tests agree well with the preflight SRFs indicating that no significant change occurred during the launch process. Continuous monitoring of the in-flight SRF is planned, using daily solar irradiances to investigate the temporal variation along with spectral and spatial directions. The detailed results of the in-flight SRF retrieval are to be presented.

How to cite: Kang, M., Ahn, M.-H., Ko, D. H., Kim, J., Nicks, D., Eo, M., Lee, Y., Moon, K.-J., and Lee, D.-W.: Characterization of the in-flight spectral response function of Geostationary Environment Monitoring Spectrometer (GEMS) retrieved using observed solar irradiance, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9030, https://doi.org/10.5194/egusphere-egu21-9030, 2021.

EGU21-9360 | vPICO presentations | ESSI1.19

Characterization of GEMS level 1B based on inter-comparison using the visible channel of AMI

Yeeun Lee, Myoung-Hwan Ahn, Mijin Eo, Mina Kang, Kyung-jung Moon, Dai-Ho Ko, Jhoon Kim, and Dong-won Lee

             The Geostationary Korean Multi-Purpose Satellite (GK-2) program consisting of GK-2A and GK-2B provides consistent monitoring information in the Asia Pacific region, including the Korean peninsula. The Geostationary Environment Monitoring Spectrometer (GEMS) onboard GK-2B in particular provides information on the atmospheric composition and aerosol properties, retrieved from the calibrated radiance (Level 1B) with high spectral resolution in 300-500 nm. GEMS started its extended validation measurement after the in-orbit test (IOT) in October following the launch of the satellite in February 2020. One of issues found during the IOT is that GEMS shows a spatial dependence in the measured solar irradiance along the north-south direction, albeit the solar irradiance does not have such a dependency. Thus, such a dependence should be from the optical system or the solar diffuser which is placed in front of the scan mirror. To clarify the root cause of the dependence, we utilize inter-comparison of the Earth measurement between GEMS and the Advanced Meteorological Imager (AMI), a multi-channel imager onboard GK-2A for meteorological monitoring. As the spectral range of GEMS fully covers the spectral response function (SRF) of the AMI visible channel having a central wavelength of 470 nm, spectral matching is properly done by convolving the SRF with the hyperspectral data of GEMS. By taking advantage of the fact that the position of GK-2A and GK-2B is maintained within a 0.5 degree square box centered at 128.2°E, match-up data set for the inter-comparison is prepared by temporal and spatial collocation. To reduce spatio-temporal mis-match and increase the signal to noise, zonal mean is applied to the collocated data. Results show that the north-south dependence occurs in the comparison of reflectance, the ratio between the earth radiance and solar irradiance, while not in the comparison of radiance. This indicates the dependence occurs due to the characteristics of the solar diffuser, not because of optical system. It is further deduced that dependence of diffuser transmittance on the solar azimuth angle is the main cause of the north-south dependency which was not characterized during the pre-flight ground test.

How to cite: Lee, Y., Ahn, M.-H., Eo, M., Kang, M., Moon, K., Ko, D.-H., Kim, J., and Lee, D.: Characterization of GEMS level 1B based on inter-comparison using the visible channel of AMI, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9360, https://doi.org/10.5194/egusphere-egu21-9360, 2021.

EGU21-10066 | vPICO presentations | ESSI1.19

LIME: the Lunar Irradiance Model of the European Space Agency

Sarah Taylor, Stefan Adriaensen, Carlos Toledano, África Barreto, Emma Woolliams, and Marc Bouvet

Absolute calibration of Earth observation sensors is key to ensuring long term stability and interoperability, essential for long term global climate records and forecasts. The Moon provides a photometrically stable calibration source, within the range of the Earth radiometric levels, and is free from atmospheric interference. However, to use this ideal calibration source, one must model the variation of its disk integrated irradiance resulting from changes in Sun-Earth-Moon geometries.

LIME, the Lunar Irradiance Model of the European Space Agency, is a new lunar irradiance model developed from ground-based observations acquired using a lunar photometer operating from the Izaña Atmospheric Observatory and Teide Peak, Tenerife. Approximately 300 lunar observations acquired between March 2018 and October 2020 currently contribute to the model, which builds on the widely-used ROLO (Robotic Lunar Observatory) model.

This presentation will outline the strategy used to derive LIME. First, the instrument was calibrated traceably to SI and characterised to determine its thermal sensitivity and its linearity over the wide dynamic range required. Second, the instrument was installed at the observatory, and nightly observations over a two-hour time window were extrapolated to provide top-of-atmosphere lunar irradiance using the Langley plot method. Third, these observations were combined to derive the model. Each of these steps includes a metrologically rigorous uncertainty analysis.

Comparisons to several EO sensors will be presented including Proba-V, Pleiades and Sentinel 3A and 3B, as well as a comparison to GIRO, the GSICS implementation   of the ROLO model. Initial results indicate LIME predicts 3% - 5% higher disk integrated lunar irradiance than the GIRO/ROLO model for the visible and near-infrared channels. The model has an expanded (k = 2) absolute radiometric uncertainty of ~2%, and it is expected that planned observations until at least 2024 will further constrain the model in subsequent updates.

How to cite: Taylor, S., Adriaensen, S., Toledano, C., Barreto, Á., Woolliams, E., and Bouvet, M.: LIME: the Lunar Irradiance Model of the European Space Agency, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10066, https://doi.org/10.5194/egusphere-egu21-10066, 2021.

EGU21-13302 | vPICO presentations | ESSI1.19

New and improved data from the Pandonia Global Network for satellite validation

Alexander Cede, Martin Tiefengraber, Manuel Gebetsberger, Michel Van Roozendael, Henk Eskes, Christophe Lerot, Diego Loyola, Nicolas Theys, Isabelle De Smedt, Nader Abuhassan, Thomas Hanisco, Angelika Dehn, Jonas Von Bismarck, Stefano Casadio, Luke Valin, and Barry Lefer

The worldwide operating Pandonia Global Network (PGN) is measuring atmospheric trace gases at high temporal resolution with the purpose of air quality monitoring and satellite validation. It is an activity carried out jointly by NASA and ESA as part of their “Joint Program Planning Group Subgroup” on calibration and validation and field activities, with additional collaboration from other institutions, most notably a strongly growing participation of the US Environmental Protection Agency (EPA). The more than 50 official PGN instruments are homogeneously calibrated and their data are centrally processed in real-time. Since 2019, total NO2 column amounts from the PGN are uploaded daily to the ESA Atmospheric Validation Data Centre (EVDC), where they are used for operational validation of Sentinel 5P (S5P) retrievals. During 2020, a new processor version 1.8 has been developed, which produces improved total NO2 column amounts and also the following new PGN products: total columns of O3, SO2 and HCHO based on direct sun observations and tropospheric columns, surface concentrations and tropospheric profiles of NO2 and HCHO based on sky observations. In this presentation we show some first examples of comparisons of the new PGN products with S5P data. Compared to the total NO2 columns from the previous processor version 1.7, the 1.8 data use better estimations for the effective NO2 temperature and the air mass factor. The effect of this improvement on the comparison with S5P retrievals is shown for some remote and high-altitude PGN sites. The new PGN total O3 column algorithm also retrieves the effective O3 temperature, which is a rather unique feature for ground-based direct sun retrievals. This allows us to analyze whether potential differences to satellite O3 columns might be influenced by the O3 temperature. Including the O3 temperature in the spectral fitting has also allowed the retrieval of accurate total SO2 columns. This PGN data product is of particular interest for satellite validation, as ground-based total SO2 column amounts are hardly measured by other instrumentation. An initial comparison of the PGN SO2 columns with S5P retrievals at selected PGN sites around the world is shown. PGN total HCHO columns from direct sun measurements are now possible for those PGN instruments, where the hardware parts made of Delrin, which outgasses HCHO, have been replaced by Nylon pieces. An initial comparison to HCHO retrievals from S5P is shown for locations with these upgraded instruments. Another new feature in the 1.8 PGN data is that they come with comprehensive uncertainty estimations, separated in the output files as independent, structured, common and total uncertainty.

How to cite: Cede, A., Tiefengraber, M., Gebetsberger, M., Van Roozendael, M., Eskes, H., Lerot, C., Loyola, D., Theys, N., De Smedt, I., Abuhassan, N., Hanisco, T., Dehn, A., Von Bismarck, J., Casadio, S., Valin, L., and Lefer, B.: New and improved data from the Pandonia Global Network for satellite validation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13302, https://doi.org/10.5194/egusphere-egu21-13302, 2021.

EGU21-14222 | vPICO presentations | ESSI1.19

Air-LUSI: Supporting Advancement of the Moon as a Reference for Earth Observations from Space

Kevin Turpie, Steven Brown, John Woodward, Thomas Stone, Andrew Gadsden, Steven Grantham, Thomas Larason, Stephen Maxwell, Andrew Cataford, and Andrew Newton

To monitor global environments from space, satellites must be calibrated accurately and consistently across time, missions and instruments.  This requires the use of a stable, common reference that is continuously accessible to Earth observing satellites, whether they make up series of missions spanning long periods of time or comprise constellations acquiring many simultaneous observations across the planet.  The Moon can serve well as such a common reference.  Its surface reflectance is stable to within one part in 108.  It is theorized that its radiant output with time changes repeatedly and very predictably with viewing and illumination geometry.  In addition, it has a radiant flux more comparable to the Earth’s surface than the Sun and can be viewed directly by the instrument.  Currently, to predict the lunar irradiance given an illumination and viewing geometry, the United States Geological Survey (USGS) has developed the Robotic Lunar Observatory (ROLO) Model of exo-atmospheric lunar spectral irradiance. The USGS ROLO model represents the current most precise knowledge of lunar spectral irradiance and is used frequently as a relative calibration standard by space-borne Earth-observing sensors.  Current knowledge of the Moon's spectral irradiance is thought to be limited to 5-10% uncertainty.  However, monitoring changing Earth environments calls for an absolute lunar reference with higher accuracy. 

The development of the ROLO model and subsequent attempts to better characterize the lunar spectral irradiance cycle were based on observations made from the Earth surface.  This requires applying corrections to remove effects of the atmosphere, which limits the accuracy.  The Airborne LUnar Spectral Irradiance (Air-LUSI) system was developed to make highly accurate, SI-traceable measurements of lunar spectral irradiance from NASA’s ER-2 aircraft flying at 21 km, above 95% of the atmosphere.  To that end, the air-LUSI system employs an autonomous, robotic telescope system that tracks the Moon in flight and a stable spectrometer housed in an enclosure providing a robustly controlled environment.  During November 2019, the Air-LUSI system was demonstrated with flights on five consecutive nights acquiring observations of the Moon at lunar phases of 10°, 21°, 34°, 46°, and 59°.  Air-LUSI is now ready for operational use.  This paper provides an overview of this new capability and how it, along with other efforts underway, can help transform how we monitor the Earth from space.

How to cite: Turpie, K., Brown, S., Woodward, J., Stone, T., Gadsden, A., Grantham, S., Larason, T., Maxwell, S., Cataford, A., and Newton, A.: Air-LUSI: Supporting Advancement of the Moon as a Reference for Earth Observations from Space, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14222, https://doi.org/10.5194/egusphere-egu21-14222, 2021.

EGU21-14656 | vPICO presentations | ESSI1.19

Toward a Climate and Calibration Observatory in space: NASA CLARREO Pathfinder and ESA TRUTHS

Nigel Fox, Yolanda Shea, Thorsten Fehr, Fleming Gary, Constantine Lukashin, Peter Pilewskie, John Remedios, and Paul Smith

The number, range and criticality of applications of Earth viewing optical sensors is increasing rapidly.  Not only from national/international space agencies but also through the launch of commercial constellations such as those of planet and the concept of Analysis Ready Data (ARD) reducing the skill needed for utilisation of the data.  However, no one organisation can provide all the tools necessary, and the need for a coordinated holistic earth observing system has never been greater. Achieving this vision has led to international initiatives coordinated by bodies such as the Committee on Earth Observation Satellites (CEOS and Global Space Inter-Calibration System (GISCS) of WMO to establish strategies to facilitate interoperability and the understanding and removal of bias through post-launch Calibration and Validation. 

In parallel, the societal challenge resulting from climate change has been a major stimulus for significantly improved accuracy and trust of satellite data. Instrumental biases and uncertainty must be sufficiently small to minimise the multi-decadal timescales needed to detect small trends and attribute their cause, enabling them to become unequivocally accepted as evidence. 

Although there have been many advances in the pre-flight SI-traceable calibration of optical sensors, in the last decade, unpredictable degradation in performance from both launch and operational environment remains a major difficulty.  Even with on-board calibration systems, uncertainties of less than a few percent are rarely achieved and maintained and the evidential link to SI-traceability is weak. For many climate observations the target uncertainty needs to be improved ten-fold. 

However, this decade will hopefully see the launch of two missions providing spectrally resolved observations of the Earth at optical wavelengths, CLARREO Pathfinder on the International Space Station from NASA [1] and TRUTHS from ESA [2] to change this paradigm.  Both payloads are explicitly designed to achieve uncertainties close to the ideal observing system, commensurate with the needs of climate, with robust SI-Traceability evidenced in space.  Not only can they make high accuracy climate quality observations of the Earth and in the case of TRUTHS also the Sun, but they will also transfer their SI-traceable uncertainty to other sensors.  In this way creating the concept of a ‘metrology laboratory in space’, providing a ‘gold standard’ reference to anchor and improve the calibration of other sensors. The two missions achieve their traceability in orbit through differing methods but will use synergistic approaches for establishing in-flight cross-calibrations.  This paper will describe these strategies and illustrate the benefit through examples where improved accuracy has the most impact on the Earth observing system.

The complementarity and international value of these missions has ensured a strong partnership during early development phases of the full CLARREO mission and that of the NPL conceived TRUTHS. Following a proposal by the UK Space Agency  and subsequent adoption into the ESA EarthWatch program this partnership is further strengthened with the ESA team and a vision that together the two missions can lay the foundation of a framework for a future sustainable international climate and calibration observatory to the benefit of the global Earth Observing community.

References

[1]  https://clarreo-pathfinder.larc.nasa.gov/

[2] https://www.npl.co.uk/earth-observation/truths

How to cite: Fox, N., Shea, Y., Fehr, T., Gary, F., Lukashin, C., Pilewskie, P., Remedios, J., and Smith, P.: Toward a Climate and Calibration Observatory in space: NASA CLARREO Pathfinder and ESA TRUTHS, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14656, https://doi.org/10.5194/egusphere-egu21-14656, 2021.

EGU21-15144 | vPICO presentations | ESSI1.19

The Joint ESA-NASA Tropical Campaign Activity – Aeolus Calibration/Validation and Science in the Topics

Thorsten Fehr, Gail Skofronick-Jackson, Vassilis Amiridis, Jonas von Bismarck, Shuyi Chen, Cyrille Flamant, Rob Koopman, Christian Lemmerz, Griša Močnik, Tommaso Parrinello, Aaron Piña, and Anne Grete Straume

The Tropics are covering around 40% of the globe and are home to approximately 40% of the world population. However, numerical weather prediction (NWP) for this region still remains challenging due to the lack of basic observations and incomplete understanding of atmospheric processes, also affecting extratropical storm developments. As a result, the largest impact of the ESA’s Aeolus satellite observations on NWP is expected in the Tropics where only a very limited number of wind profile observations from the ground can be performed.

An especially important case relating to the predictability of tropical weather system is the outflow of Saharan dust, its interaction with cloud micro-physics and the overall impact on the development of tropical storms over the Atlantic Ocean. The region of the coast of West Africa uniquely allows the study of the Saharan Aerosol layer, African Easterly Waves and Jets, Tropical Easterly Jet, as well as the deep convection in ITCZ and their relation to the formation of convective systems and the transport of dust.

Together with international partners, ESA and NASA are currently implementing a joint Tropical campaign from July to August 2021 with its base in Cape Verde. The campaign objective is to provide information on the validation and preparation of the ESA missions Aeolus and EarthCARE, respectively, as well as supporting a range of related science objectives for the investigation in the interactions between African Easterly and other tropical waves with the mean flow, dust and their impact on the development of convective systems; the structure and variability of the marine boundary layer in relation to initiation and lifecycle of the convective cloud systems within and across the ITCZ; and impact of wind, aerosol, clouds, and precipitation effects on long range dust transport and air quality over the western Atlantic.

The campaign comprises a unique combination of both strong airborne and ground-based elements collocated on Cape Verde. The airborne component with wind and aerosol lidars, cloud radars, in-situ instrumentation and additional observations includes the NASA DC-8 with science activities coordinated by the U. of Washington, the German DLR Falcon-20, the French Safire Falcon-20 with activities led by LATMOS, and the Slovenian Aerovizija Advantic WT-10 light aircraft in cooperation with the U. Novo Gorica. The ground-based component led by the National Observatory of Athens is a collaboration of more than 25 European teams providing in-situ and remote sensing aerosol and cloud measurements with a wide range of lidar, radar and radiometer systems, as well as drone observatins by the Cyprus Institute.

In preparation for the field campaign, the NASA and ESA management and science teams are closely collaborating with regular coordination meetings, in particular in coordinating the shift of the activity by one year due to the COVID-19 pandemic. The time gained has been used to further consolidate the planning, and in particular with a dry-run campaign organized by NASA with European participation where six virtual flights were conducted in July 2020.

 This paper will present a summary of the campaign preparation activities and the consolidated plan for the 2021 Tropical campaign.

How to cite: Fehr, T., Skofronick-Jackson, G., Amiridis, V., von Bismarck, J., Chen, S., Flamant, C., Koopman, R., Lemmerz, C., Močnik, G., Parrinello, T., Piña, A., and Straume, A. G.: The Joint ESA-NASA Tropical Campaign Activity – Aeolus Calibration/Validation and Science in the Topics, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15144, https://doi.org/10.5194/egusphere-egu21-15144, 2021.

EGU21-15161 | vPICO presentations | ESSI1.19

Pi-MEP Salinity – an ESA-NASA Platform for sustained satellite surface salinity validation 

Roberto Sabia, Sebastien Guimbard, Nicolas Reul, Tony Lee, Julian Schanze, Nadya Vinogradova, David Le Vine, Fred Bingham, Fabrice Collard, Klaus Scipal, and Henri Laur

The Pilot Mission Exploitation Platform (Pi-MEP) for Salinity (www.salinity-pimep.org) has been released operationally in 2019 to the broad oceanographic community, in order to foster satellite sea surface salinity validation and exploitation activities.

Specifically, the Platform aims at enhancing salinityvalidation, by allowing systematic inter-comparison of various EO datasets with a broad suite of in-situ data, and also at enabling oceanographic process studies by capitalizing on salinity data in synergy with additional spaceborne estimates.

 

Despite Pi-MEP was originally conceived as an ESA initiative to widen the uptake of the Soil Moisture and Ocean Salinity (SMOS) mission data over ocean, a project partnership with NASA was devised soon after the operational deployment, and an official collaboration endorsed within the ESA-NASA Joint Program Planning Group (JPPG).

 

The Salinity Pi-MEP has therefore become a reference hub for SMOS, SMAP and Aquarius satellite salinity missions, which are assessed in synergy with additional thematic datasets (e.g., precipitation, evaporation, currents, sea level anomalies, ocean color, sea surface temperature). 

Match-up databases of satellite/in situ (such as Argo, TSG, moorings, drifters) data and corresponding validation reports at different spatiotemporal scales are systematically generated; furthermore, recently-developed dedicated tools allow data visualization, metrics computation and user-driven features extractions.

 

The Platform is also meant to monitor salinity in selected oceanographic “case studies”, ranging from river plumes monitoring to SSS characterization in challenging regions, such as high latitudes or semi-enclosed basins.

 

The two Agencies are currently collaborating to widen the Platform features on several technical aspects - ranging from a triple-collocation software implementation to a sustained exploitation of data from the SPURS-1/2 campaigns. In this context, an upgrade of the satellite/in-situ match-up methodology has been recently agreed, resulting into a redefinition of the validation criteria that will be subsequently implemented in the Platform.

 

A further synthesis of the three satellites salinity algorithms, models and auxiliary data handling is at the core of the ESA Climate Change Initiative (CCI) on Salinity and of ESA-NASA further collaboration.

How to cite: Sabia, R., Guimbard, S., Reul, N., Lee, T., Schanze, J., Vinogradova, N., Le Vine, D., Bingham, F., Collard, F., Scipal, K., and Laur, H.: Pi-MEP Salinity – an ESA-NASA Platform for sustained satellite surface salinity validation , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15161, https://doi.org/10.5194/egusphere-egu21-15161, 2021.

EGU21-15166 | vPICO presentations | ESSI1.19

NASA-ESA Cooperation on the SBG and CHIME Hyperspectral Satellite Missions: a roadmap for the joint Working Group on Cal/Val activities

Valentina Boccia, Jennifer Adams, Kurtis J. Thome, Kevin R. Turpie, Raymond Kokaly, Marc Bouvet, Robert O. Green, and Michael Rast

Imaging spectroscopy has been identified by ESA, NASA and other international space agencies as key to addressing a number of most important scientific and environmental management objectives. To implement the critical EU- and related policies for the management of natural resources, assets and benefits, and to achieve the objectives outlined by NASA’s Decadal Survey in ecosystem science, hydrology and geology, high fidelity imaging spectroscopy data with global coverage and high spatial resolution are required. As such, ESA’s CHIME (Copernicus Hyperspectral Imaging Mission for the Environment) and NASA’s SBG (Surface Biology and Geology) satellite missions aim to provide imaging spectroscopy data at global coverage at regular intervals of time with high spatial resolution.

However, the scientific and applied objectives motivate more spatial coverage and more rapid revisit than any one agency’s observing system can provide. With the development of SBG and CHIME, the mid-to-late 2020s will see more global coverage spectroscopic observing systems, whereby these challenging needs can be more fully met by a multi-mission and multi-Agency synergetic approach, rather than by any single observing system.

Therefore, an ESA-NASA cooperation on imaging spectroscopy space missions was seen as a priority for collaboration, specifically given the complementarity of mission objectives and measurement targets of the SBG and CHIME. Such cooperation is now being formalized as part of the ESA-NASA Joint Program Planning Group activities.

Among the others, calibration and validation activities (Cal/Val) are fundamental for imaging spectroscopy while the satellites are in-orbit and operating. They determine the quality and integrity of the data provided by the spectrometers and become even more crucial when data from different satellites, carrying different imaging sensors, are used by users worldwide in a complementary and synergetic manner, like it will be the case for CHIME and SBG data. Indeed, Cal/Val activities not only have enormous downstream impacts on the accuracy and reliability of the products, but also facilitate cross-calibration and interoperability among several imaging spectrometers, supporting their synergistic use. Accordingly, within the context of this cooperation, a Working Group (WG) on Calibration/Validation has been set up, aiming to establish a roadmap for future SBG-CHIME coordination activities and collaborative studies.

This contribution aims to outline the key areas of cooperation between SBG and CHIME in terms of Calibration and Validation, and present the establishment of a roadmap between the two missions, focusing on the following topics:

  • Establishing an end-to-end cal/val strategy for seamless data products across missions, including transfer standards;
  • Measurement Networks and commonly recognised Cal/Val reference sites;
  • Status of atmospheric radiative transfer and atmospheric–correction procedures;
  • Standardisation and Quality Control of reference data sets;
  • Definition and implementation of joint airborne spectroscopy campaigns, such as the executed 2018 and planned 2021 campaigns, to simulate both missions and exercise the capabilities needed for eventual interoperability (incl. data collection, calibration, data product production);
  • Continuous validation throughout the lifetime of products;
  • Identifying other opportunities for efficiency and success through cooperation on calibration and validation, downlink capabilities and shared algorithms (e.g. compression and on-board data reduction).

How to cite: Boccia, V., Adams, J., Thome, K. J., Turpie, K. R., Kokaly, R., Bouvet, M., Green, R. O., and Rast, M.: NASA-ESA Cooperation on the SBG and CHIME Hyperspectral Satellite Missions: a roadmap for the joint Working Group on Cal/Val activities, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15166, https://doi.org/10.5194/egusphere-egu21-15166, 2021.

In this paper, the authors propose to describe the methodologies developed for the validation of Very High-Resolution (VHR) optical missions within the Earthnet Data Assessment Pilot (EDAP) Framework.  The use of surface-based, drone, airborne, and/or space-based observations to build calibration reference is playing a fundamental role in the validation process. A rigorous validation process must compare mission data products with independent reference data suitable for the satellite measurements. As a consequence, one background activity within EDAP is the collection, the consolidation of reference data of various nature depending on the validation methodology.

The validation methodologies are conventionally divided into three categories; i.e. validations of the measurement, the geometry and the image quality. The validation of the measurement requires an absolute calibration reference. This latter on is built up by using either in situ measurements collected with RadCalNet[1] stations or by using space based observations performed with “gold” mission (Sentinel-2, Landsat-8) over Pseudo Invariant Calibration Site (PICS). For the geometric validation, several test sites have been set up. A test site is equipped with data from different reference sources. The full usability of a test site is not systematic. It depends on the validation metrics and the specifications of the sensor, particularly the spatial resolution and image acquisition geometry. Some existing geometric sites are equipped with Ground Control Point (GCP) set surveyed by using Global Navigation Satellite System (GNSS) devices in the field.  In some cases, the GCP set comes in support to the refinement of an image observed with drones in order to produce a raster reference, subsequently used to validate the internal geometry of images under assessment. Besides, a limiting factor in the usage of VHR optical ortho-rectified data is the accuracy of the Digital Surface Model (DSM) / Digital Terrain Model (DTM). In order to separate errors due to terrain elevation and error due to the sensor itself, some test sites are also equipped with very accurate Light Detection and Ranging (LIDAR) data.

The validation of image quality address all aspect related to the spatial resolution and is strongly linked to both the measurement and the geometry. The image quality assessments are performed with both qualitative and quantitative approaches. The quantitative approach relies on the analysis of artificial ground target images and lead to the estimate of Modulation Transfer Function (MTF) together with additional image quality parameters such as Signal to Noise Ratio (SNR). On the other hand, the qualitative approach assesses the interpretability of input images and leads to a rating scaling[2] which is strongly related to the sensor Ground Resolution Distance (GRD). This visual inspection task required a database including very detailed image of man-made objects. This database is considered within EDAP as a reference.

[1] https://www.radcalnet.org

[2] https://fas.org/irp/imint/niirs.htm

How to cite: Saunier, S.: Reference Data and Methods for Validation of Very High Resolution Optical Data Within ESA / EDAP Project, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15501, https://doi.org/10.5194/egusphere-egu21-15501, 2021.

ESSI2.2 – Find, access, share and use data across the globe: Infrastructure solutions for Earth System Sciences

EGU21-8458 | vPICO presentations | ESSI2.2

The ICOS Carbon Portal as example of a  FAIR community data repository supporting scientific workflows

Alex Vermeulen, Margareta Hellström, Oleg Mirzov, Ute Karstens, Claudio D'Onofrio, and Harry Lankreijer

The Integrated Carbon Observation System (ICOS) provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO2 partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.
As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.

In this presentation we will introduce the ICOS operational data repository named ICOS Carbon Portal that is based on the linked open data approach. All metadata is modelled in an ontology coded in OWL and based on a RDF triple store that is available through an open SparQL endpoint. The repository supports versioning, collections and models provenance through a simplified Prov-O ontology. All data objects are ingested under strict control for the identified data types on provision of the correct and sufficient (provenance) metadata, data format and data integrity. All data, including raw data, is stored in the long term trusted repository  B2SAFE with two replicates. On top of the triple store and SparQL endpoint we have built a series of services, APIs and graphical interfaces that allow machines to machine and user interaction with the data and metadata. Examples are a full faceted search with connected data cart and download facility, preview of higher level data products (time series of  point observations and spatial data), and cloud computing services like eddy covariance data processing and on demand atmospheric footprint calculations, all connected to the observational data from ICOS.  Another interesting development is the community support for scientific workflows using Jupyter notebook services that connect to our repository through a dedicated python library for direct metadata and data access.

How to cite: Vermeulen, A., Hellström, M., Mirzov, O., Karstens, U., D'Onofrio, C., and Lankreijer, H.: The ICOS Carbon Portal as example of a  FAIR community data repository supporting scientific workflows, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8458, https://doi.org/10.5194/egusphere-egu21-8458, 2021.

EGU21-15394 | vPICO presentations | ESSI2.2

EPOS-Norway Portal

Jan Michalek, Kuvvet Atakan, Christian Rønnevik, Helga Indrøy, Lars Ottemøller, Øyvind Natvik, Tor Langeland, Ove Daae Lampe, Gro Fonnes, Jeremy Cook, Jon Magnus Christensen, Ulf Baadshaug, Halfdan Pascal Kierulf, Bjørn-Ove Grøtan, Odleiv Olesen, John Dehls, and Valerie Maupin

The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aim of the Norwegian EPOS e‑infrastructure is to integrate data from the seismological and geodetic networks, as well as the data from the geological and geophysical data repositories. Among the six EPOS-N project partners, four institutions are providing data – University of Bergen (UIB), - Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR.

In this contribution, we present the EPOS-Norway Portal as an online, open access, interactive tool, allowing visual analysis of multidimensional data. It supports maps and 2D plots with linked visualizations. Currently access is provided to more than 300 datasets (18 web services, 288 map layers and 14 static datasets) from four subdomains of Earth science in Norway. New datasets are planned to be integrated in the future. EPOS-N Portal can access remote datasets via web services like FDSNWS for seismological data and OGC services for geological and geophysical data (e.g. WMS). Standalone datasets are available through preloaded data files. Users can also simply add another WMS server or upload their own dataset for visualization and comparison with other datasets. This portal provides unique way (first of its kind in Norway) for exploration of various geoscientific datasets in one common interface. One of the key aspects is quick simultaneous visual inspection of data from various disciplines and test of scientific or geohazard related hypothesis. One of such examples can be spatio-temporal correlation of earthquakes (1980 until now) with existing critical infrastructures (e.g. pipelines), geological structures, submarine landslides or unstable slopes.  

The EPOS-N Portal is implemented by adapting Enlighten-web, a server-client program developed by NORCE. Enlighten-web facilitates interactive visual analysis of large multidimensional data sets, and supports interactive mapping of millions of points. The Enlighten-web client runs inside a web browser. An important element in the Enlighten-web functionality is brushing and linking, which is useful for exploring complex data sets to discover correlations and interesting properties hidden in the data. The views are linked to each other, so that highlighting a subset in one view automatically leads to the corresponding subsets being highlighted in all other linked views.

How to cite: Michalek, J., Atakan, K., Rønnevik, C., Indrøy, H., Ottemøller, L., Natvik, Ø., Langeland, T., Lampe, O. D., Fonnes, G., Cook, J., Christensen, J. M., Baadshaug, U., Kierulf, H. P., Grøtan, B.-O., Olesen, O., Dehls, J., and Maupin, V.: EPOS-Norway Portal, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15394, https://doi.org/10.5194/egusphere-egu21-15394, 2021.

EGU21-15205 | vPICO presentations | ESSI2.2

SIOS Data Management System: distributed data system for Earth System Science

Dariusz Ignatiuk, Øystein Godøy, Lara Ferrighi, Inger Jennings, Christiane Hübner, Shridhar Jawak, and Heikki Lihavainen

Svalbard Integrated Arctic Earth Observing System (SIOS) is an international consortium to develop and maintain a regional observing system in Svalbard and the associated waters. SIOS brings together the existing infrastructure and data of its members into a multidisciplinary network dedicated to answering Earth System Science (ESS) questions related to global change. The Observing System is built around “SIOS core data” – long-term data series collected by SIOS partners. SIOS Data Management System (SDMS) is dedicated to harvesting information on historical and current datasets from collaborating thematic and institutional data centres and making them available to users. A central data access portal is linked to the data repositories maintained by SIOS partners, which manage and distribute data sets and their associated metadata. The integrity of the information and harmonisation of data is based on internationally accepted protocols assuring interoperability of data, standardised documentation of data through the use of metadata and standardised interfaces by data systems through the discovery of metadata. By these means, SDMS is working towards FAIR data compliance (making data findable, accessible, interoperable and reusable), among other initiatives through the H2020 funded ENVRI-FAIR project (http://envri.eu/envri-fair/).

How to cite: Ignatiuk, D., Godøy, Ø., Ferrighi, L., Jennings, I., Hübner, C., Jawak, S., and Lihavainen, H.: SIOS Data Management System: distributed data system for Earth System Science, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15205, https://doi.org/10.5194/egusphere-egu21-15205, 2021.

EGU21-9400 | vPICO presentations | ESSI2.2

The brokering framework empowering WMO Hydrological Observing System (WHOS)

Enrico Boldrini, Paolo Mazzetti, Fabrizio Papeschi, Roberto Roncella, Mattia Santoro, Massimiliano Olivieri, Stefano Nativi, Silvano Pecora, Igor Chernov, and Claudio Caponi

The WMO Commission of Hydrology (CHy) is realizing the WMO Hydrological Observing System (WHOS), a software (and human) framework with the aim of improving sharing of hydrological data and knowledge worldwide.

National Hydrological Services (NHS) are already sharing on the web (both archived and near real time) data collected in each country, using disparate publication services. WHOS is leveraging the Discovery and Access Broker (DAB) technology developed and operated in its cloud infrastructure by CNR-IIA to realize WHOS-broker, a key component of WHOS architecture. WHOS-broker is in charge of harmonizing the available and heterogeneous metadata, data and services making the already published information more accessible to scientists (e.g. modelers), decision makers and general public worldwide. 

WHOS-broker supports many service interfaces and API that hydrological application builders already can leverage, example given OGC SOS, OGC CSW, OGC WMS, ESRI Feature Service, CUAHSI WaterOneFlow, DAB REST API, USGS RDB, OAI-PMH/WIGOS, THREDDS. New API and service protocols are continuously added to support new applications, being WHOS-broker a modular and flexible framework with the aim of enabling interoperability and assuring it as the standards will change/evolve through time. 

Three target programmes have already benefited from WHOS:

  • La Plata river basin: hydro and meteo data from Argentina, Bolivia, Brazil, Paraguay, Uruguay are harmonized and shared by WHOS-broker to the benefit of different applications, one of them is the Plata Basin Hydrometeorological Forecasting and Early Warning System (PROHMSAT-Plata model, developed by HRC), based on CUAHSI WaterOneFlow and experts from the five countries.
  • Arctic-HYCOS: hydro data from Canada, Finland, Greenland, Iceland, Norway, Russia, United States are harmonized and shared by WHOS-broker to the benefit of different applications, one of them is the WMO HydroHub Arctic portal, based on ESRI technologies.
  • Dominican Republic: hydro and meteo data of Dominican Republic published by different originators is being harmonized by WHOS-broker to the benefit of different applications, one of them is the Met data explorer application developed by BYU based on THREDDS catalog service.

The three programmes should act as a driving force for more to follow, by demonstrating possible applications that can be built on top of WHOS.

The public launch of WHOS official homepage at WMO is expected by mid 2021, will include:

  • A dedicated web portal based on Water Data Explorer application developed by BYU
  • Results from the three programs
  • Detailed information on how to access WHOS data by using one of the many WHOS-broker service interfaces
  • An online training course for data providers interested in WHOS
  • The WHOS Hydro Ontology, leveraged by WHOS-broker in order to both semantically augment user queries and harmonize results (e.g. in case of synonyms of the same concept in different languages).

How to cite: Boldrini, E., Mazzetti, P., Papeschi, F., Roncella, R., Santoro, M., Olivieri, M., Nativi, S., Pecora, S., Chernov, I., and Caponi, C.: The brokering framework empowering WMO Hydrological Observing System (WHOS), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9400, https://doi.org/10.5194/egusphere-egu21-9400, 2021.

EGU21-15148 | vPICO presentations | ESSI2.2

Teams Win: The European Datacube Federation

Peter Baumann

Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework in particular for regular and irregular spatio-temporal datacubes. Services grounding on such common understanding have proven more uniform to access and handle, implementing a principle of "minimal surprise" for users visiting different portals while using their favourite clients. Data combination and fusion benefits from canonical metadata allowing automatic alignment, e.g, between 2D DEMs, 3D satellite image time series, 4D atmospheric data, etc.

The EarthServer datacube federation s showing the way towards unleashing in full the potential of pixels for supporting the UN Sustainable Development Goals, local governance, and also businesses. EarthServer is an open, free, transparent, and democratic network of data centers offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on the open OGC standards, an ecosystem of data has been established that is available to users as a single pool, without the need for any coding skills (such as python). A specific unique capability is location-transparency: clients can fire their query against any of the mebers, and the federation nodes will figure out the optimal work distribution irrespective of data location.

The underlying datacube engine, rasdaman, enables all datacube access, analytics, and federation. Query evaluation is optimized automatically applying highly efficient intelligent, rule-based methods in homogeneous and heterogeneous mashups, up to satellite on-board deployments as done in the ORBiDANSe project. Users perceive one single, common information space accessible through a wide spectrum of open-source and proprietary clients.

In our talk we present technology, services, and governance of this unique line-up of data centers. A demo will show distributed datacube fusion live.

 

How to cite: Baumann, P.: Teams Win: The European Datacube Federation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15148, https://doi.org/10.5194/egusphere-egu21-15148, 2021.

EGU21-23 | vPICO presentations | ESSI2.2

Towards Developing Community Guidelines for Sharing and Reusing Quality Information of Earth Science Datasets

Carlo Lacagnina, Ge Peng, Robert R. Downs, Hampapuram Ramapriyan, Ivana Ivanova, David F. Moroni, Yaxing Wei, Lucy Bastin, Nancy A. Ritchey, Gilles Larnicol, Lesley A. Wyborn, Chung-Lin Shie, Ted Habermann, Anette Ganske, Sarah M. Champion, Mingfang Wu, Irina Bastrakova, Dave Jones, and Gary Berg-Cross

The knowledge of data quality and the quality of the associated information, including metadata, is critical for data use and reuse. Assessment of data and metadata quality is key for ensuring credible available information, establishing a foundation of trust between the data provider and various downstream users, and demonstrating compliance with requirements established by funders and federal policies.

Data quality information should be consistently curated, traceable, and adequately documented to provide sufficient evidence to guide users to address their specific needs. The quality information is especially important for data used to support decisions and policies, and for enabling data to be truly findable, accessible, interoperable, and reusable (FAIR).

Clear documentation of the quality assessment protocols used can promote the reuse of quality assurance practices and thus support the generation of more easily-comparable datasets and quality metrics. To enable interoperability across systems and tools, the data quality information should be machine-actionable. Guidance on the curation of dataset quality information can help to improve the practices of various stakeholders who contribute to the collection, curation, and dissemination of data.

This presentation outlines a global community effort to develop international guidelines to curate data quality information that is consistent with the FAIR principles throughout the entire data life cycle and inheritable by any derivative product.

How to cite: Lacagnina, C., Peng, G., Downs, R. R., Ramapriyan, H., Ivanova, I., Moroni, D. F., Wei, Y., Bastin, L., Ritchey, N. A., Larnicol, G., Wyborn, L. A., Shie, C.-L., Habermann, T., Ganske, A., Champion, S. M., Wu, M., Bastrakova, I., Jones, D., and Berg-Cross, G.: Towards Developing Community Guidelines for Sharing and Reusing Quality Information of Earth Science Datasets, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-23, https://doi.org/10.5194/egusphere-egu21-23, 2021.

EGU21-10547 | vPICO presentations | ESSI2.2

Data flow, harmonization, and quality control

Brenner Silva, Philipp Fischer, Sebastian Immoor, Rudolf Denkmann, Marion Maturilli, Philipp Weidinger, Steven Rehmcke, Tobias Düde, Norbert Anselm, Peter Gerchow, Antonie Haas, Christian Schäfer-Neth, Angela Schäfer, Stephan Frickenhaus, and Roland Koppe and the Computing and Data Centre of the Alfred-Wegener-Institute

Earth system cyberinfrastructures include three types of data services: repositories, collections, and federations. These services arrange data by their purpose, level of integration, and governance.  For instance, registered data of uniform measurements fulfill the goal of publication but do not necessarily flow in an integrated data system. The data repository provides the first and high level of integration that strongly depends on the standardization of incoming data. One example here is the framework Observation to Archive and Analysis (O2A) that is operational and continuously developed at the Alfred-Wegener-Institute, Bremerhaven. A data repository is one of the components of the O2A framework and much of its functionality depends on the standardization of the incoming data. In this context, we focus on the development of a modular approach to provide the standardization and quality control for the monitoring of the near real-time data. Two modules are under development. First, the driver module transforms different tabular data to a common format. Second, the quality control module that runs the quality tests on the ingested data. Both modules rely on the sensor operator and on the data scientist, two actors that interact with both ends of the ingest component of the O2A framework (http://data.awi.de/o2a-doc). We demonstrate the driver and the quality control modules in the data flow within Digital Earth showcases that also connect repositories and federated databases to the end-user. The end-user is the scientist, who works closely in the development approach to ensure applicability. The result is the proven benefit of harmonizing data and metadata of multiple sources, easy integration and rapid assessment of the ingested data. Further, we discuss concepts and current development that aim at the enhanced monitoring and scientific workflow.

How to cite: Silva, B., Fischer, P., Immoor, S., Denkmann, R., Maturilli, M., Weidinger, P., Rehmcke, S., Düde, T., Anselm, N., Gerchow, P., Haas, A., Schäfer-Neth, C., Schäfer, A., Frickenhaus, S., and Koppe, R. and the Computing and Data Centre of the Alfred-Wegener-Institute: Data flow, harmonization, and quality control, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10547, https://doi.org/10.5194/egusphere-egu21-10547, 2021.